Showing posts with label kubernetes. Show all posts
Showing posts with label kubernetes. Show all posts

Saturday

Convert Docker Compose to Kubernetes Orchestration

If you already have a Docker Compose based application. And you may want to orchestrate the containers with Kubernetes. If you are new to Kubernetes then you can search various articles in this blog or Kubernetes website.

Here's a step-by-step plan to migrate your Docker Compose application to Kubernetes:


Step 1: Create Kubernetes Configuration Files

Create a directory for your Kubernetes configuration files (e.g., k8s-config).

Create separate YAML files for each service (e.g., api.yaml, pgsql.yaml, mongodb.yaml, rabbitmq.yaml).

Define Kubernetes resources (Deployments, Services, Persistent Volumes) for each service.


Step 2: Define Kubernetes Resources

Deployment YAML Example (api.yaml)

YAML

apiVersion: apps/v1

kind: Deployment

metadata:

  name: api-deployment

spec:

  replicas: 1

  selector:

    matchLabels:

      app: api

  template:

    metadata:

      labels:

        app: api

    spec:

      containers:

      - name: api

        image: <your-docker-image-name>

        ports:

        - containerPort: 8000

Service YAML Example (api.yaml)

YAML

apiVersion: v1

kind: Service

metadata:

  name: api-service

spec:

  selector:

    app: api

  ports:

  - name: http

    port: 8000

    targetPort: 8000

  type: ClusterIP

Repeat this process for other services (pgsql, mongodb, rabbitmq).


Step 3: Configure Persistent Storage

Create Persistent Volume Claims (PVCs) for databases.

Define StorageClass for dynamic provisioning.

Persistent Volume Claim YAML Example (pgsql-pvc.yaml)

YAML

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: pgsql-pvc

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 5Gi

StorageClass YAML Example (storage-class.yaml)

YAML

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: local-storage

provisioner: kubernetes.io/no-provisioner

volumeBindingMode: WaitForFirstConsumer


Step 4: Deploy to Kubernetes

Create namespace: kubectl create namespace <namespace-name>

Apply configuration files: kubectl apply -f <config-file>.yaml

Verify deployments: kubectl get deployments -n <namespace-name>

Verify pods: kubectl get pods -n <namespace-name>

Verify services: kubectl get svc -n <namespace-name>

Step 5: Expose Services

Create Ingress resources for external access.

Configure Ingress Controller (e.g., NGINX).

Ingress YAML Example (ingress.yaml)

YAML

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

  name: api-ingress

spec:

  rules:

  - host: <your-domain>

    http:

      paths:

      - path: /

        backend:

          serviceName: api-service

          servicePort: 8000


Step 6: Monitor and Maintain

Set up monitoring tools (e.g., Prometheus, Grafana).

Implement logging solutions (e.g., Fluentd, Elasticsearch).

Regularly update images and configurations.

Implementation Tools

kubectl: Kubernetes command-line tool.

kustomize: Configuration management tool.

Helm: Package manager for Kubernetes.

Additional Tips

Test locally with Minikube or Kind before deploying to production.

Use environment variables for configuration.

Implement rolling updates for zero-downtime deployments.

This migration plan provides a solid foundation. Adjust configurations according to your specific requirements.


Say I have a Docker Compose yaml file eg.

# version: '3.8'


services:

  pgsql:

    container_name: pgsql

    image: postgres:15-alpine

    volumes:

      - postgres-data-volume:/var/lib/postgresql/data/

      - ./app/init.sql:/docker-entrypoint-initdb.d/init.sql

    ports:

      - 5433:5432

    environment:

      - POSTGRES_USER=postgres

      - POSTGRES_PASSWORD=password123

      - POSTGRES_DB=lcnc_db_dev

    networks:

      - app-network

    healthcheck:

      test: ["CMD-SHELL", "pg_isready -U postgres"]

      interval: 5s

      retries: 5


  # adminer:

  #   image: adminer

  #   container_name: pgsql-adminer

  #   ports:

  #     - 8080:8080

  #   depends_on:

  #     - pgsql

  #   networks:

  #     - app-network


  app:

    build:

      context: ./app  # Change context to the app folder

      dockerfile: Dockerfile  # Dockerfile name remains the same

    container_name: app

    env_file:

      - ./.env.dev

    environment:

      - PYTHONPATH=/app

    volumes:

      - ./app:/app

      - ./app/generated_files:/app/generated_files  # Mount for generated backend files 

      - ./app/dart_files:/app/dart_files

    ports:

      - "8001:8001"

    depends_on:

      pgsql:

        condition: service_healthy

      redis:

        condition: service_healthy

    networks:

      - app-network

    deploy:

      resources:

        limits:

          cpus: '2.00'

          memory: 4G


  redis:

    image: redis:alpine

    container_name: redis

    ports:

      - "6379:6379"

    networks:

      - app-network

    healthcheck:

      test: ["CMD", "redis-cli", "ping"]

      interval: 5s

      timeout: 3s

      retries: 3

    deploy:

      resources:

        limits:

          cpus: '1.00'

          memory: 2G

    restart: always


  redis_insight:

    image: redis/redisinsight:latest

    container_name: redis_insight

    ports:

      - "5540:5540"

    networks:

      - app-network

    depends_on:

      redis:

        condition: service_healthy


  celery_worker:

    build:

      context: ./app

      dockerfile: Dockerfile.celery-worker

    container_name: celery_worker

    environment:

      - PYTHONPATH=/app

      - BROKER_URL=redis://redis:6379/0

      - RESULT_BACKEND=redis://redis:6379/0

      - PREFETCH_MULTIPLIER=1

      - MAX_MEMORY_PER_CHILD=512

    depends_on:

      - app

      - redis

      - pgsql

    volumes:

      - ./app:/app

      - ./app/generated_files:/app/generated_files

      - ./app/dart_files:/app/dart_files

    networks:

      - app-network

    restart: unless-stopped


volumes:

  postgres-data-volume:  # Named volume for PostgreSQL data


networks:

  app-network:

    driver: bridge


Here's the equivalent Kubernetes configuration for the provided Docker Compose file:

Step 1: Create Namespace

Bash
kubectl create namespace dev-namespace

Step 2: PostgreSQL Deployment and Service

YAML
# pgsql-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgsql-deployment
  namespace: dev-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pgsql
  template:
    metadata:
      labels:
        app: pgsql
    spec:
      containers:
      - name: pgsql
        image: postgres:15-alpine
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: postgres-data-volume
          mountPath: /var/lib/postgresql/data/
        - name: init-sql-volume
          mountPath: /docker-entrypoint-initdb.d/
        env:
        - name: POSTGRES_USER
          value: "postgres"
        - name: POSTGRES_PASSWORD
          value: "password123"
        - name: POSTGRES_DB
          value: "lcnc_db_dev"
      volumes:
      - name: postgres-data-volume
        persistentVolumeClaim:
          claimName: postgres-pvc
      - name: init-sql-volume
        configMap:
          name: init-sql-cm
  strategy:
    type: Recreate

# pgsql-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: pgsql-service
  namespace: dev-namespace
spec:
  selector:
    app: pgsql
  ports:
  - name: pgsql
    port: 5432
    targetPort: 5432
  type: ClusterIP

# Create ConfigMap for init.sql
kubectl create configmap init-sql-cm --from-file=init.sql -n dev-namespace

# Create Persistent Volume Claim
# postgres-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: dev-namespace
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Step 3: Redis Deployment and Service

YAML
# redis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-deployment
  namespace: dev-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:alpine
        ports:
        - containerPort: 6379
  strategy:
    type: Recreate

# redis-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: dev-namespace
spec:
  selector:
    app: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
  type: ClusterIP

Step 4: App Deployment and Service

YAML
# app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
  namespace: dev-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
      - name: app
        image: <your-docker-image-name>
        ports:
        - containerPort: 8001
        volumeMounts:
        - name: app-volume
          mountPath: /app
        - name: generated-files-volume
          mountPath: /app/generated_files
        - name: dart-files-volume
          mountPath: /app/dart_files
        env:
        - name: PYTHONPATH
          value: "/app"
  strategy:
    type: Recreate
  volumes:
  - name: app-volume
    persistentVolumeClaim:
      claimName: app-pvc
  - name: generated-files-volume
    emptyDir: {}
  - name: dart-files-volume
    emptyDir: {}

# app-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-service
  namespace: dev-namespace
spec:
  selector:
    app: app
  ports:
  - name: app
    port: 8001
    targetPort: 8001
  type: ClusterIP

# Create Persistent Volume Claim
# app-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-pvc
  namespace: dev-namespace
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Step 5: Celery Worker Deployment

YAML
# celery-worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-worker-deployment
  namespace: dev-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: celery-worker
  template:
    metadata:
      labels:
        app: celery-worker
    spec:
      containers:
      - name: celery-worker
        image: <your-docker-image-name>
        volumeMounts:
        - name: app-volume
          mountPath: /app
        - name: generated-files-volume
          mountPath: /app/generated_files
        - name: dart-files-volume
          mountPath: /app/dart_files
        env:
        - name: PYTHONPATH
          value: "/app"
        - name: BROKER_URL
          value: "redis://redis-service:6379/0"
        - name: RESULT_BACKEND
          value: "redis://redis-service:6379/0"
  strategy:
    type: Recreate
  volumes:
  - name: app-volume
    persistentVolumeClaim:
      claimName: app-pvc
  - name: generated-files-volume
    emptyDir: {}
  - name: dart-files-volume
    emptyDir: {}

Step 6: Apply Kubernetes Configuration

Bash
kubectl apply -f pgsql-deployment.yaml
kubectl apply -f pgsql-service.yaml
kubectl apply -f redis-deployment.yaml
kubectl apply -f redis-service.yaml
kubectl apply -f app-deployment.yaml
kubectl apply -f app-service.yaml
kubectl apply -f celery-worker-deployment.yaml

Step 7: Verify Deployments

Bash
kubectl get deployments -n dev-namespace
kubectl get pods -n dev-namespace
kubectl get svc -n dev-namespace
This Kubernetes configuration mirrors the Docker Compose file. Adjust image names, resource requests and other parameters as necessary.

Additional Tips

Use Secret resources for sensitive data (e.g., passwords).
Implement Horizontal Pod Autoscaling (HPA) for dynamic scaling.
Monitor cluster performance with Prometheus and Grafana.

Here are examples of applying Kubernetes in Google Cloud and Azure:

Google Cloud (GKE)

Step 1: Create a GKE Cluster

Create a new project: gcloud projects create <project-name>
Enable Kubernetes Engine API: gcloud services enable container.googleapis.com
Create a cluster: gcloud container clusters create <cluster-name> --zone <zone> --num-nodes 3

Step 2: Deploy Application

Create Deployment YAML file (e.g., deployment.yaml)
Apply Deployment: kubectl apply -f deployment.yaml
Expose Service: kubectl expose deployment <deployment-name> --type LoadBalancer --port 80

Step 3: Verify Deployment

Get Cluster credentials: gcloud container clusters get-credentials <cluster-name> --zone <zone>
Verify pods: kubectl get pods
Verify services: kubectl get svc

GKE Example Commands
Bash
# Create project and enable API
gcloud projects create my-project
gcloud services enable container.googleapis.com

# Create GKE cluster
gcloud container clusters create my-cluster --zone us-central1-a --num-nodes 3

# Deploy application
kubectl apply -f deployment.yaml

# Expose service
kubectl expose deployment my-app --type LoadBalancer --port 80

# Verify deployment
gcloud container clusters get-credentials my-cluster --zone us-central1-a
kubectl get pods
kubectl get svc


Azure (AKS)

Step 1: Create AKS Cluster

Create resource group: az group create --name <resource-group> --location <location>
Create AKS cluster: az aks create --resource-group <resource-group> --name <cluster-name> --node-count 3

Step 2: Deploy Application

Create Deployment YAML file (e.g., deployment.yaml)
Apply Deployment: kubectl apply -f deployment.yaml
Expose Service: kubectl expose deployment <deployment-name> --type LoadBalancer --port 80

Step 3: Verify Deployment

Get Cluster credentials: az aks get-credentials --resource-group <resource-group> --name <cluster-name>
Verify pods: kubectl get pods
Verify services: kubectl get svc
AKS Example Commands
Bash
# Create resource group and AKS cluster
az group create --name my-resource-group --location eastus
az aks create --resource-group my-resource-group --name my-aks-cluster --node-count 3

# Deploy application
kubectl apply -f deployment.yaml

# Expose service
kubectl expose deployment my-app --type LoadBalancer --port 80

# Verify deployment
az aks get-credentials --resource-group my-resource-group --name my-aks-cluster
kubectl get pods
kubectl get svc

Additional Tips
Use managed identities for authentication.
Implement network policies for security.
Monitor cluster performance with Azure Monitor or Google Cloud Monitoring.

Kubernetes Deployment YAML Example
YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: <your-docker-image-name>
        ports:
        - containerPort: 80

Wednesday

Airflow and Kubeflow Differences

photo by pixabay

Here's a breakdown of the key differences between Kubeflow and Airflow, specifically in the context of machine learning pipelines, with a focus on Large Language Models (LLMs):

Kubeflow vs. Airflow for ML Pipelines (LLMs):

Core Focus:

  • Kubeflow: Kubeflow is a dedicated platform for machine learning workflows. It provides a comprehensive toolkit for building, deploying, and managing end-to-end ML pipelines, including functionalities for experiment tracking, model training, and deployment.
  • Airflow: Airflow is a general-purpose workflow orchestration platform. While not specifically designed for ML, it can be used to automate various tasks within an ML pipeline.

Strengths for LLMs:

  • Kubeflow:
    • ML-centric features: Kubeflow offers built-in features specifically beneficial for LLMs, such as Kubeflow Pipelines for defining and managing complex training workflows, Kubeflow Notebook for interactive development, and KFServing for deploying trained models.
    • Scalability: Kubeflow is designed to handle large-scale deployments on Kubernetes, making it suitable for training and running computationally expensive LLM models.
    • Integration with TensorFlow/PyTorch: Kubeflow integrates seamlessly with popular deep learning frameworks like TensorFlow and PyTorch, commonly used for building LLMs.
  • Airflow:
    • Flexibility: Airflow's flexibility allows for integrating various tools and libraries needed for LLM pipelines, such as version control systems (e.g., Git) for code management and custom Python scripts for specific LLM training tasks.
    • Scheduling and Monitoring: Airflow excels at scheduling tasks within the pipeline and monitoring their execution, ensuring timely execution and providing visibility into the training process.

Considerations:

  • Complexity: Kubeflow has a steeper learning curve due to its ML-specific features and reliance on Kubernetes. Airflow, however, might require additional customization for LLM workflows.
  • Community and Resources: Kubeflow has a growing community focused on machine learning, but Airflow has a broader and more established user base. This can impact the availability of resources and support.

Overall:

  • Kubeflow is a strong choice if you prioritize a comprehensive, scalable, and ML-focused platform for building and managing LLM pipelines.
  • Airflow is a viable option if you need a flexible and customizable workflow orchestration tool, especially if you already have an Airflow setup for other tasks and want to integrate LLM training within it.

Additional Notes:

  • Both Kubeflow and Airflow can be used with managed cloud services offered by major cloud providers (e.g., Google Cloud AI Platform, Amazon SageMaker) that simplify deployment and management of these platforms.
  • There are also other platforms specifically designed for large language models, such as Hugging Face Transformers Hub, which offer functionalities for training, deploying, and sharing LLM models.

The best choice between Kubeflow and Airflow depends on your specific needs, project complexity, and existing infrastructure. Consider the factors mentioned above to make an informed decision for your LLM pipeline.

To know more about Airflow click here. To know more about Kubeflow click here.

Hope this will help you. Also here my Github repo for some examples.