Showing posts with label git. Show all posts
Showing posts with label git. Show all posts

Saturday

Reproducibility of Python

Ensuring the reproducibility of Python statistical analysis is crucial in research and scientific computing. Here are some ways to achieve reproducibility:

1. Version Control

Use version control systems like Git to track changes in your code and data.

2. Documentation

Document your code, methods, and results thoroughly.

3. Virtual Environments

Use virtual environments like conda or virtualenv to manage dependencies and ensure consistent package versions.

4. Seed Values

Set seed values for random number generators to ensure reproducibility of simulations and modeling results.

5. Data Management

Use data management tools like Pandas and NumPy to ensure data consistency and integrity.

6. Testing

Write unit tests and integration tests to ensure code correctness and reproducibility.

7. Containerization

Use containerization tools like Docker to package your code, data, and dependencies into a reproducible environment.

8. Reproducibility Tools

Utilize tools like Jupyter Notebook, Jupyter Lab, and Reproducible Research Tools to facilitate reproducibility.


Details these steps:


1. Use a Fixed Random Seed:

    ```python

    import numpy as np

    import random


    np.random.seed(42)

    random.seed(42)

    ```

2. Document the Environment:

    - List all packages and their versions.

    ```python

    import sys

    print(sys.version)

    

    !pip freeze > requirements.txt

    ```

3. Organize Code in Scripts or Notebooks:

    - Keep the analysis in well-documented scripts or Jupyter Notebooks.

4. Version Control:

    - Use version control systems like Git to track changes.

    ```bash

    git init

    git add .

    git commit -m "Initial commit"

    ```

5. Data Management:

    - Ensure data used in analysis is stored and accessed consistently.

    - Use data versioning tools like DVC (Data Version Control).

6. Environment Management:

    - Use virtual environments or containerization (e.g., `virtualenv`, `conda`, Docker).

    ```bash

    python -m venv env

    source env/bin/activate

    ```

7. Automated Tests:

    - Write tests to check the integrity of your analysis.

    ```python

    def test_mean():

        assert np.mean([1, 2, 3]) == 2

    ```

8. Detailed Documentation:

    - Provide clear and detailed documentation of your workflow.


By following these steps, you can ensure that your Python statistical analysis is reproducible.

Sunday

Github Action

 

Photo by Aleksandr Neplokhov at pexel

Let’s first clarify the difference between workflow and CI/CD and discuss what GitHub Actions do.

  1. Workflow:

    • A workflow is a series of automated steps that define how code changes are built, tested, and deployed.
    • Workflows can include various tasks such as compiling code, running tests, and deploying applications.
    • Workflows are defined in a YAML file (usually named .github/workflows/workflow.yml) within your repository.
    • They are triggered by specific events (e.g., push to a branch, pull request, etc.).
    • Workflows are not limited to CI/CD; they can automate any process in your development workflow
  2. CI/CD (Continuous Integration/Continuous Deployment):

    • CI/CD refers to the practice of automating the process of integrating code changes, testing them, and deploying them to production.
    • Continuous Integration (CI) focuses on automatically building and testing code changes whenever they are pushed to a repository.
    • Continuous Deployment (CD) extends CI by automatically deploying code changes to production environments.
    • CI/CD pipelines ensure that code is consistently tested and deployed, reducing manual effort and minimizing errors.
  3. GitHub Actions:

    • GitHub Actions is a feature within GitHub that enables you to automate workflows directly from your GitHub repository.
    • Key advantages of using GitHub Actions for CI/CD pipelines include:
      • Simplicity: GitHub Actions simplifies CI/CD pipeline setup. You define workflows in a YAML file within your repo, and it handles the rest.
      • Event Triggers: You can respond to any webhook on GitHub, including pull requests, issues, and custom webhooks from integrated apps.
      • Community-Powered: Share your workflows publicly or access pre-built workflows from the GitHub Marketplace.
      • Platform Agnostic: GitHub Actions works with any platform, language, and cloud provider

In summary, GitHub Actions provides a flexible and integrated way to define workflows, including CI/CD pipelines, directly within your GitHub repository. It’s a powerful tool for automating tasks and improving your development process! 😊


GitHub Actions, Jenkins, and GitLab CI/CD are all popular tools for automating software development workflows, but they serve different purposes and have distinct features. Let’s briefly compare them:

  1. GitHub Actions:

    • Event-driven CI/CD tool integrated with GitHub repositories.
    • Workflow files are written in YAML.
    • Provides free runners hosted on Microsoft Azure for building, testing, and deploying applications.
    • Has a marketplace with pre-made actions for various tasks.
    • Beginner-friendly and easy to set up1.
    • Well-suited for startups and small companies.
  2. Jenkins:

  3. GitLab CI/CD:

    • Integrated with GitLab repositories.
    • Uses .gitlab-ci.yml files for defining pipelines.
    • Provides shared runners or allows self-hosted runners.
    • Strong integration with GitLab features.
    • Well-suited for teams using GitLab for source control and project management.

Will GitHub Actions “kill” Jenkins and GitLab? Not necessarily. Each tool has its strengths and weaknesses, and the choice depends on your specific needs, existing workflows, and team preferences. Some organizations even use a combination of these tools to cover different use cases3. Ultimately, it’s about finding the right fit for your development process4. 😊


You can see one Github Action implemented for the demo here hope this will help to start. 


Monday

OTA Architecture

 



                                    Photo by Pixabay

Developing an end-to-end Over-the-Air (OTA) update architecture for IoT devices in equipment like

escalators and elevators involves several components. This architecture ensures that firmware updates

can be delivered seamlessly and securely to the devices in the field. Here's an outline of the architecture

with explanations and examples:

1. Device Firmware: - The IoT devices (escalators, elevators) have embedded firmware that needs to be updated over the air. - Example: The firmware manages the operation of the device, and we want to update it to fix bugs or

add new features. 2. Update Server: - A central server responsible for managing firmware updates and distributing them to the devices. - Example: A cloud-based server that hosts the latest firmware versions. 3. Update Package: - The firmware update packaged as a binary file. - Example: A compressed file containing the updated firmware for the escalator controller. 4. Device Management System: - A system to track and manage IoT devices, including their current firmware versions. - Example: A cloud-based device management platform that keeps track of each escalator's firmware

version. 5. Communication Protocol: - A secure and efficient protocol for communication between the devices and the update server. - Example: MQTT (Message Queuing Telemetry Transport) for lightweight and reliable communication. 6. Authentication and Authorization: - Security mechanisms to ensure that only authorized devices can receive and install firmware updates. - Example: Token-based authentication, where devices need valid tokens to request updates. 7. Rollback Mechanism: - A mechanism to rollback updates in case of failures or issues. - Example: Keeping a backup of the previous firmware version on the device. 8. Deployment Strategy: - A strategy to deploy updates gradually to minimize the impact on operations. - Example: Rolling deployment where updates are deployed to a subset of devices first, and if successful,

expanded to others. 9. Update Trigger: - Mechanism to initiate the update process on devices. - Example: A scheduled time for updates or an event-triggered update based on certain conditions. 10. Logging and Monitoring: - Comprehensive logging and monitoring to track the update process and identify any issues. - Example: Logging each update attempt, monitoring device status during updates. 11. Edge Computing (Optional): - For large-scale deployments, edge computing can be used to distribute updates more efficiently. - Example: Edge devices in the facility can act as local update servers, reducing the load on the central

server. 12. Network Considerations: - Ensuring that the devices have reliable and secure connectivity for downloading updates. - Example: Using secure protocols like HTTPS for update downloads. Explanation: The architecture ensures that firmware updates can be securely and efficiently delivered to IoT devices.

The update process is orchestrated, logged, and monitored to maintain the reliability and security of the

devices in the field.

The deployment strategy and rollback mechanism add resilience to the update process. Example Scenario: Let's consider an example where an escalator management company wants to update the firmware of all

escalators to improve energy efficiency. The central server hosts the updated firmware, and the device

management system tracks the current firmware version on each escalator. Using a secure communication

protocol, the escalators request updates, and the deployment strategy ensures a smooth transition. If any

issues arise during the update, the rollback mechanism reverts the escalator to the previous firmware

version.

Today, industrial companies seek to ingest, store, and analyze IoT data closer to the point of generation.

This enhances predictive maintenance, improves quality control, ensures worker safety, and more.

Industrial Edge computing, focusing on stationary edge gateways in industrial environments, plays a

crucial role in connecting Operational Technology (OT) systems with the cloud. This whitepaper outlines

design considerations for industrial IoT architectures using the industrial edge, addressing low latency,

bandwidth utilization, offline operation, and regulatory compliance. The edge gateway serves as an

intermediary processing node, integrating industrial assets with the AWS Cloud, and addressing security

challenges for less-capable OT systems without authentication, authorization, and encryption support.

The following section examines key imperatives in edge computing. This architecture provides a structured approach to managing OTA updates for IoT devices, ensuring they

stay up-to-date, secure, and efficient.


Below are a few nice articles about Azure, AWS for IoT and OTA

Azure IoT

AWS IoT

DevOps Steps in Cloud


Step 1: Container Image Build

1. In your source code repository (e.g., Git), include a Dockerfile that specifies how to build your application into a container image.

2. Configure your CI/CD tool (e.g., AWS CodeBuild, Jenkins) to build the Docker image using the Dockerfile. This can be done by executing a `docker build` command within your CI/CD script.

3. Ensure that the Docker image is built with the necessary dependencies and configurations.

Step 2: Container Registry

4. Choose a container registry service to store your Docker images. Common choices include:

   - AWS Elastic Container Registry (ECR) if you're using AWS.

   - Docker Hub for public images.

   - Other cloud providers' container registries (e.g., Google Container Registry, Azure Container Registry).

Step 3: Pushing Images

5. After building the Docker image, tag it with a version or unique identifier.

6. Use the `docker push` command to push the image to the selected container registry.

Step 4: Deployment

7. In your CD pipeline, integrate the deployment of the Docker container. This will depend on your application's architecture:

   - For Kubernetes-based applications, you can use Kubernetes manifests (YAML files) to define your application deployment. Update these manifests with the new Docker image version and apply them using `kubectl`.

   - For AWS-based applications, you can use services like Amazon ECS, Amazon EKS, or AWS Fargate for container orchestration. Update your task or service definition to use the new Docker image version.

Step 5: Automation and Rollback

8. Ensure that your CI/CD pipeline includes automation for container image tagging and deployment.

9. Implement rollback strategies, such as keeping the previous version of the container image in the registry to easily roll back in case of issues.

Step 6: Security

10. Pay attention to container image security. Scan container images for vulnerabilities using tools like Clair, Trivy, or AWS ECR image scanning.

11. Use container image signing and security policies to ensure that only trusted images are deployed.

Step 7: Monitoring

12. Implement monitoring and logging for your containerized application and infrastructure. Tools like Prometheus, Grafana, and cloud provider monitoring services can help.

Step 8: Integration with DevOps Pipeline

13. Integrate these container-related steps into your overall DevOps pipeline. For example, you can trigger the pipeline whenever changes are pushed to your Git repository.

Step 9: Documentation and Training

14. Ensure that your team is trained in containerization best practices and the use of your CI/CD pipeline.

With these steps, you can fully automate the build, registration (push), and deployment of Docker container images as part of your DevOps pipeline. This allows developers to focus on writing code, while the pipeline takes care of packaging and deploying applications consistently.

Tuesday

Delete Large Files From Your Git History Without Using Git LFS

 


If you want to delete large files from your Git history without using Git LFS, you can use the `git filter-branch` command along with the `--tree-filter` option to remove the files from your Git history. This process will rewrite the repository's history and remove the specified files.

Here's how you can do it:

1. Backup Your Repository:

   Before proceeding, make sure to create a backup of your repository to avoid data loss in case something goes wrong.

2. Identify Large Files:

   Identify the large files that you want to remove from the Git history, such as `data/hail-2015.csv`.

3. Run the `git filter-branch` Command:

   Use the `git filter-branch` command with the `--tree-filter` option to remove the large files from your Git history. Replace `data/hail-2015.csv` with the actual file path you want to remove.


   ```bash

   git filter-branch --force --index-filter \

   "git rm --cached --ignore-unmatch data/hail-2015.csv" \

   --prune-empty --tag-name-filter cat -- --all

   ```

This command will rewrite the Git history to exclude the specified file. Please note that this command will take some time to complete, especially for large repositories.

4. Clean Up Unreachable Objects:

   After running the `git filter-branch` command, there might be unreachable objects left in your repository. To remove them, run the following command:


   ```bash

   git reflog expire --expire=now --all && git gc --prune=now --aggressive

   ```

5. Force Push to Update Remote Repository:

   Since you've rewritten the Git history, you'll need to force push the changes to the remote repository:


   ```bash

   git push --force origin main

   ```


Replace `main` with the name of your branch if it's different.

Please use caution when performing these actions, especially if your repository is shared with others. Rewriting history can affect collaborators, so it's important to communicate with your team and coordinate this process.

Photo by Christina Morillo

Saturday

ML Ops in Azure


Setting up MLOps (Machine Learning Operations) in Azure involves creating a continuous integration and continuous deployment (CI/CD) pipeline to manage machine learning models efficiently. Below, I'll provide a step-by-step guide to creating an MLOps pipeline in Azure using Azure Machine Learning Services, Azure DevOps, and Azure Kubernetes Service (AKS) as an example. This example assumes you already have an Azure subscription and some knowledge of Azure services. You can check out for FREE learning resources at https://learn.microsoft.com/en-us/training/azure/


Step 1: Prepare Your Environment

Before you start, make sure you have the following:

- An Azure subscription.

- An Azure DevOps organization.

- Azure Machine Learning Workspace set up.


Step 2: Create an Azure DevOps Project

1. Go to Azure DevOps (https://dev.azure.com/) and sign in.

2. Create a new project that will host your MLOps pipeline.


Step 3: Set Up Your Azure DevOps Repository

1. In your Azure DevOps project, create a Git repository to store your machine learning project code.


Step 4: Create an Azure Machine Learning Experiment

1. Go to Azure Machine Learning Studio (https://ml.azure.com/) and sign in.

2. Create a new experiment or use an existing one to develop and train your machine learning model. This experiment will be the core of your MLOps pipeline.


Step 5: Create an Azure DevOps Pipeline

1. In your Azure DevOps project, go to Pipelines > New Pipeline.

2. Select the Azure Repos Git as your source repository.

3. Configure your pipeline to build and package your machine learning code. You may use a YAML pipeline script to define build and packaging steps.


Example YAML pipeline script (`azure-pipelines.yml`):

yaml
trigger: 
- main 
pool: 
    vmImage: 'ubuntu-latest' 
steps: 
- script: 'echo Your build and package commands here'

4. Commit this YAML file to your Azure DevOps repository.


Step 6: Create an Azure Kubernetes Service (AKS) Cluster

1. In the Azure portal, create an AKS cluster where you'll deploy your machine learning model. Note down the AKS cluster's connection details.


Step 7: Configure Azure DevOps for CD

1. In your Azure DevOps project, go to Pipelines > Releases.

2. Create a new release pipeline to define your CD process.


Step 8: Deploy to AKS

1. In your release pipeline, add a stage to deploy your machine learning model to AKS.

2. Use Azure CLI or kubectl commands in your release pipeline to deploy the model to your AKS cluster.


Example PowerShell Script to Deploy Model (`deploy-model.ps1`):

# Set Azure context and AKS credentials

az login --service-principal -u <your-service-principal-id> -p <your-service-principal-secret> --tenant <your-azure-tenant-id>

az aks get-credentials --resource-group <your-resource-group> --name <your-aks-cluster-name>

# Deploy the model using kubectl

kubectl apply -f deployment.yaml


3. Add this PowerShell script to your Azure DevOps release pipeline stage.


Step 9: Trigger CI/CD

1. Whenever you make changes to your machine learning code, commit and push the changes to your Azure DevOps Git repository.

2. The CI/CD pipeline will automatically trigger a build and deployment process.


Step 10: Monitor and Manage Your MLOps Pipeline

1. Monitor the CI/CD pipeline in Azure DevOps to track build and deployment status.

2. Use Azure Machine Learning Studio to manage your models, experiment versions, and performance.


This is a simplified example of setting up MLOps in Azure. In a real-world scenario, you may need to integrate additional tools and services, such as Azure DevTest Labs for testing, Azure Databricks for data processing, and Azure Monitor for tracking model performance. The exact steps and configurations can vary depending on your specific requirements and organization's needs.


However, if you are using say Python Flask REST API server application for users to interact. Then you can use the following changes.

To integrate your Flask application, which serves the machine learning models, into the same CI/CD pipeline as your machine learning models, you can follow these steps. Combining them into the same CI/CD pipeline can help ensure that your entire application, including the Flask API and ML models, stays consistent and updated together.


Step 1: Organize Your Repository

In your Git repository, organize your project structure so that your machine learning code and Flask application code are in separate directories, like this:


```

- my-ml-project/

  - ml-model/

    - model.py

    - requirements.txt

  - ml-api/

    - app.py

    - requirements.txt

  - azure-pipelines.yml

```


Step 2: Configure Your CI/CD Pipeline

Modify your `azure-pipelines.yml` file to include build and deploy steps for both your machine learning code and Flask application.

yaml
trigger: 
- main 
pr: 
- '*' 
pool: 
vmImage: 'ubuntu-latest' 
stages: 
- stage: Build 
    jobs: 
    - job: Build_ML_Model 
        steps: 
        - script:
            cd my-ml-project/ml-model 
            pip install -r requirements.txt 
            # Add any build steps for your ML model code here 
        displayName: 'Build ML Model' 
- job: Build_Flask_App 
    steps: 
    - script:
         cd my-ml-project/ml-api 
         pip install -r requirements.txt 
         # Add any build steps for your Flask app here 
    displayName: 'Build Flask App' 
- stage: Deploy 
    jobs: 
    - job: Deploy_ML_Model 
        steps: - script:
         # Add deployment steps for your ML model here 
            displayName: 'Deploy ML Model' 
    - job: Deploy_Flask_App 
        steps: 
        - script:
         # Add deployment steps for your Flask app here 
        displayName: 'Deploy Flask App'


Step 3: Update Your Flask Application

Whenever you need to update your Flask application or machine learning models, make changes to the respective code in your Git repository.


Step 4: Commit and Push Changes

Commit and push your changes to the Git repository. This will trigger the CI/CD pipeline.


Step 5: Monitor and Manage Your CI/CD Pipeline

Monitor the CI/CD pipeline in Azure DevOps to track the build and deployment status of both your machine learning code and Flask application.


By integrating your Flask application into the same CI/CD pipeline, you ensure that both components are updated and deployed together. This approach simplifies management and maintains consistency between your ML models and the API serving them.


Photo by ThisIsEngineering