To deploy model espcially LLM based application in Azure can be daunting task manually. We can automate the deployment pipeline with Kubeflow.
I am providing one example of an end-to-end machine learning deployment pipeline using Kubeflow on Azure. This example will cover setting up a Kubeflow pipeline, training a model, and deploying the model.
Prerequisites:
1. Azure Account: You need an Azure account.
2. Azure Kubernetes Service (AKS): You need a Kubernetes cluster. You can create an AKS cluster via the Azure portal or CLI.
3. Kubeflow: You need Kubeflow installed on your AKS cluster. Follow the [Kubeflow on Azure documentation](https://www.kubeflow.org/docs/azure/aks/) to set this up.
Step 1: Setting Up the Environment
First, ensure you have the Azure CLI and kubectl installed and configured.
```sh
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Install kubectl
az aks install-cli
# Log in to Azure
az login
# Set the subscription (if you have multiple subscriptions)
az account set --subscription "<your-subscription-id>"
# Get credentials for your AKS cluster
az aks get-credentials --resource-group <resource-group-name> --name <aks-cluster-name>
```
Step 2: Deploying Kubeflow on AKS
Follow the official Kubeflow deployment guide for Azure AKS:
[Deploy Kubeflow on Azure AKS](https://www.kubeflow.org/docs/azure/aks/)
Step 3: Creating a Kubeflow Pipeline
We'll create a simple pipeline that trains and deploys a machine learning model.
Pipeline Definition
Create a file `pipeline.py`:
```python
import kfp
from kfp import dsl
from kfp.components import create_component_from_func
def train_model() -> str:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import joblib
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
clf = LogisticRegression()
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(f"Model accuracy: {accuracy}")
model_path = "/model.pkl"
joblib.dump(clf, model_path)
return model_path
train_model_op = create_component_from_func(
train_model, base_image='python:3.8-slim'
)
@dsl.pipeline(
name='Iris Training Pipeline',
description='A pipeline to train and deploy an Iris classification model.'
)
def iris_pipeline():
train_task = train_model_op()
if __name__ == '__main__':
kfp.compiler.Compiler().compile(iris_pipeline, 'iris_pipeline.yaml')
```
Step 4: Deploying the Pipeline
Upload the pipeline to your Kubeflow instance.
```sh
pip install kfp
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='iris_pipeline.yaml', pipeline_name='Iris Training Pipeline')
```
Step 5: Running the Pipeline
Once the pipeline is uploaded, you can run it via the Kubeflow dashboard or programmatically.
```python
# Run the pipeline
experiment = kfp_client.create_experiment('Iris Experiment')
run = kfp_client.run_pipeline(experiment.id, 'iris_pipeline_run', 'iris_pipeline.yaml')
```
Step 6: Deploying the Model
Assuming the trained model is saved in a storage bucket, you can create a deployment pipeline to deploy the model to Azure Kubernetes Service (AKS).
Model Deployment Component
Create a file `deploy.py`:
```python
from kubernetes import client, config
def deploy_model(model_path: str):
config.load_kube_config()
# Define deployment specs
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="iris-model-deployment"),
spec=client.V1DeploymentSpec(
replicas=1,
selector={'matchLabels': {'app': 'iris-model'}},
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={'app': 'iris-model'}),
spec=client.V1PodSpec(containers=[client.V1Container(
name="iris-model",
image="mydockerhub/iris-model:latest",
ports=[client.V1ContainerPort(container_port=80)]
)])
)
)
)
# Create deployment
apps_v1 = client.AppsV1Api()
apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
deploy_model_op = create_component_from_func(
deploy_model, base_image='python:3.8-slim'
)
@dsl.pipeline(
name='Iris Deployment Pipeline',
description='A pipeline to deploy an Iris classification model.'
)
def iris_deploy_pipeline(model_path: str):
deploy_task = deploy_model_op(model_path)
if __name__ == '__main__':
kfp.compiler.Compiler().compile(iris_deploy_pipeline, 'iris_deploy_pipeline.yaml')
```
Step 7: Running the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client.upload_pipeline(pipeline_package_path='iris_deploy_pipeline.yaml', pipeline_name='Iris Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('Iris Deployment Experiment')
run = kfp_client.run_pipeline(experiment.id, 'iris_deploy_pipeline_run', 'iris_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})
```
Conclusion
This end-to-end example demonstrates setting up a Kubeflow pipeline on Azure, training a model, and deploying it to AKS. Customize the `model_path`, Docker image, and other specifics as needed for your actual use case.
Deploying a Large Language Model (LLM) involves a few additional steps compared to a general machine learning model. Here’s how you can set up an end-to-end deployment pipeline for an LLM using Kubeflow on Azure, similar to the previous example.
Prerequisites
Ensure you have the necessary tools and environment set up as mentioned in the previous steps, including an Azure account, AKS cluster, and Kubeflow.
Step 1: Setting Up the Environment
Use the same steps as before to install Azure CLI, kubectl, and configure your environment.
Step 2: Deploying Kubeflow on AKS
Follow the official Kubeflow deployment guide for Azure AKS:
[Deploy Kubeflow on Azure AKS](https://www.kubeflow.org/docs/azure/aks/)
Step 3: Creating a Kubeflow Pipeline for LLM
Let's create a pipeline that fine-tunes a Hugging Face LLM and deploys it.
Pipeline Definition
Create a file `llm_pipeline.py`:
```python
import kfp
from kfp import dsl
from kfp.components import create_component_from_func
def train_llm() -> str:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import torch
# Load dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
# Load model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets.set_format("torch")
# Define training arguments
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
weight_decay=0.01,
)
# Create Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
)
# Train model
trainer.train()
# Save model
model_path = "/model"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)
return model_path
train_llm_op = create_component_from_func(
train_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
name='LLM Training Pipeline',
description='A pipeline to train and deploy a Large Language Model.'
)
def llm_pipeline():
train_task = train_llm_op()
if __name__ == '__main__':
kfp.compiler.Compiler().compile(llm_pipeline, 'llm_pipeline.yaml')
```
Step 4: Deploying the Pipeline
Upload the pipeline to your Kubeflow instance.
```sh
pip install kfp
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='llm_pipeline.yaml', pipeline_name='LLM Training Pipeline')
```
Step 5: Running the Pipeline
Once the pipeline is uploaded, run it via the Kubeflow dashboard or programmatically.
```python
# Run the pipeline
experiment = kfp_client.create_experiment('LLM Experiment')
run = kfp_client.run_pipeline(experiment.id, 'llm_pipeline_run', 'llm_pipeline.yaml')
```
Step 6: Deploying the Model
Create a deployment pipeline to deploy the LLM to Azure Kubernetes Service (AKS).
Model Deployment Component
Create a file `deploy_llm.py`:
```python
from kubernetes import client, config
def deploy_llm(model_path: str):
config.load_kube_config()
# Define deployment specs
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="llm-deployment"),
spec=client.V1DeploymentSpec(
replicas=1,
selector={'matchLabels': {'app': 'llm'}},
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={'app': 'llm'}),
spec=client.V1PodSpec(containers=[client.V1Container(
name="llm",
image="mydockerhub/llm:latest",
ports=[client.V1ContainerPort(container_port=80)],
volume_mounts=[client.V1VolumeMount(mount_path="/model", name="model-volume")]
)],
volumes=[client.V1Volume(
name="model-volume",
persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(claim_name="model-pvc")
)])
)
)
)
# Create deployment
apps_v1 = client.AppsV1Api()
apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
deploy_llm_op = create_component_from_func(
deploy_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
name='LLM Deployment Pipeline',
description='A pipeline to deploy a Large Language Model.'
)
def llm_deploy_pipeline(model_path: str):
deploy_task = deploy_llm_op(model_path)
if __name__ == '__main__':
kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')
```
Step 7: Running the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('LLM Deployment Experiment')
run = kfp_client.run_pipeline(experiment.id, 'llm_deploy_pipeline_run', 'llm_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})
```
Conclusion
This example demonstrates how to create a Kubeflow pipeline for training and deploying a Large Language Model (LLM) on Azure Kubernetes Service (AKS). Adjust the `model_path`, Docker image, and other specifics as needed for your actual use case. The steps involve setting up the pipeline, running the training, and deploying the trained model, all within the Kubeflow framework.
To deploy containerized LLMs with Kubeflow on Azure, you'll need to follow these steps:
1. Containerize Your LLM: Create a Docker image of your LLM application.
2. Push the Docker Image to a Container Registry: Push the Docker image to Azure Container Registry (ACR) or Docker Hub.
3. Create a Kubeflow Pipeline for Deployment: Define a Kubeflow pipeline to deploy your LLM application using the Docker image.
4. Run the Deployment Pipeline: Execute the pipeline to deploy your LLM application on AKS.
Step 1: Containerize Your LLM
Create a Dockerfile for your LLM application.
Example Dockerfile
```Dockerfile
# Use an official Python runtime as a parent image
FROM python:3.11-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
```
Example `app.py`
```python
from flask import Flask, request, jsonify
from transformers import AutoModelForCausalLM, AutoTokenizer
app = Flask(__name__)
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
inputs = tokenizer.encode(data['text'], return_tensors='pt')
outputs = model.generate(inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return jsonify({'response': response})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
```
Build and Push Docker Image
```sh
# Build the Docker image
docker build -t mydockerhub/llm:latest .
# Push the Docker image to Docker Hub or ACR
docker push mydockerhub/llm:latest
```
Step 2: Push Docker Image to Azure Container Registry
If you prefer to use ACR:
```sh
# Log in to Azure
az login
# Create an ACR if you don't have one
az acr create --resource-group <your-resource-group> --name <your-registry-name> --sku Basic
# Log in to the ACR
az acr login --name <your-registry-name>
# Tag the Docker image with the ACR login server name
docker tag mydockerhub/llm:latest <your-registry-name>.azurecr.io/llm:latest
# Push the Docker image to ACR
docker push <your-registry-name>.azurecr.io/llm:latest
```
Step 3: Create a Kubeflow Pipeline for Deployment
Create a deployment pipeline to deploy the containerized LLM.
Deployment Component
Create a file `deploy_llm.py`:
```python
from kubernetes import client, config
from kfp.components import create_component_from_func
from kfp import dsl
def deploy_llm(image: str):
config.load_kube_config()
deployment = client.V1Deployment(
metadata=client.V1ObjectMeta(name="llm-deployment"),
spec=client.V1DeploymentSpec(
replicas=1,
selector={'matchLabels': {'app': 'llm'}},
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(labels={'app': 'llm'}),
spec=client.V1PodSpec(containers=[client.V1Container(
name="llm",
image=image,
ports=[client.V1ContainerPort(container_port=80)]
)])
)
)
)
service = client.V1Service(
metadata=client.V1ObjectMeta(name="llm-service"),
spec=client.V1ServiceSpec(
selector={'app': 'llm'},
ports=[client.V1ServicePort(protocol="TCP", port=80, target_port=80)]
)
)
apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()
apps_v1.create_namespaced_deployment(namespace="default", body=deployment)
core_v1.create_namespaced_service(namespace="default", body=service)
deploy_llm_op = create_component_from_func(
deploy_llm, base_image='python:3.8-slim'
)
@dsl.pipeline(
name='LLM Deployment Pipeline',
description='A pipeline to deploy a containerized LLM.'
)
def llm_deploy_pipeline(image: str):
deploy_task = deploy_llm_op(image=image)
if __name__ == '__main__':
kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')
```
Step 4: Run the Deployment Pipeline
Upload and run the deployment pipeline.
```sh
# Upload the deployment pipeline
kfp_client = kfp.Client()
kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')
# Run the deployment pipeline
experiment = kfp_client.create_experiment('LLM Deployment Experiment')
run = kfp_client.run_pipeline(
experiment.id,
'llm_deploy_pipeline_run',
'llm_deploy_pipeline.yaml',
params={'image': '<your-registry-name>.azurecr.io/llm:latest'}
)
```
Conclusion
By following these steps, you can deploy a containerized LLM using Kubeflow on Azure. This process involves containerizing your LLM application, pushing the Docker image to a container registry, creating a deployment pipeline in Kubeflow, and running the pipeline to deploy your LLM application on Azure Kubernetes Service (AKS). Adjust the specifics as needed for your actual use case.
You can get more help here. Also you can get many Machine Learning and LLM notebooks including few for Kubeflow here.