Showing posts with label github. Show all posts
Showing posts with label github. Show all posts

Friday

Chatbot and Local CoPilot with Local LLM, RAG, LangChain, and Guardrail

 




Chatbot Application with Local LLM, RAG, LangChain, and Guardrail
I've developed a chatbot application designed for informative and engaging conversationAs you already aware that Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from large language models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly improve the value of generative AI systems.

Developers must consider a variety of factors when building a RAG pipeline: from LLM response benchmarking to selecting the right chunk size.

In tapplication demopost, I demonstrate how to build a RAG pipeline uslocal LLM which can be converted to ing NVIDIA AI Endpoints for LangChain. FirI have you crdeate a vector storeconnecting with one of the Hugging Face dataset though we can by downding web p or can use any pdf etc easily.aThen and generating their embeddings using SentenceTransformer or you can use the NVIDIA NeMo Retriever embedding microservice and searching for similarity using FAISS. I then showcase two different chat chains for querying the vector store. For this example, I use local LangChain chain and a Python FastAPI based REST API services which is running in different thread within the Jupyter Notebook environment itself. At last I have preapred a small but beautiful front end with HTML, Bootstrap and Ajax as a Chat Bot front end to interact by users. However you can use the NVIDIA Triton Inference Server documentation, though the code can be easily modified to use any other soueok.

Introducing ChoiatBot Local CoPilot: Your Customizable Local Copilot Agent

ChoiatBot offers a revolutionary approach to personalized chatbot solutions, developed to operate entirely on CPU-based systems without the need for an internet connection. This ensures not only enhanced privacy but also unrestricted accessibility, making it ideal for environments where data security is paramount.

Key Features and Capabilities

ChoiatBot stands out with its ability to be seamlessly integrated with diverse datasets, allowing users to upload and train the bot with their own data and documents. This customization empowers businesses and individuals alike to tailor the bot's responses to specific needs, ensuring a truly personalized user experience.

Powered by the google/flan-t5-small model, ChoiatBot leverages state-of-the-art technology known for its robust performance across various benchmarks. This model's impressive few-shot learning capabilities, as evidenced by achievements like 75.2% on the five-shot MMLU benchmark, ensure that ChoiatBot delivers accurate and contextually relevant responses even with minimal training data.

The foundation of ChoiatBot's intelligence lies in its training on the "Wizard-of-Wikipedia" dataset, renowned for its groundbreaking approach to knowledge-grounded conversation generation. This dataset not only enriches the bot's understanding but also enhances its ability to provide nuanced and informative responses based on a broad spectrum of topics.

Performance and Security

One of ChoiatBot's standout features is its ability to function offline, offering unparalleled data security and privacy. This capability is particularly advantageous for sectors dealing with sensitive information or operating in environments with limited internet connectivity. By eliminating reliance on external servers, ChoiatBot ensures that sensitive data remains within the user's control, adhering to the strictest security protocols.

Moreover, ChoiatBot's implementation on CPU-based systems underscores its efficiency and accessibility. This approach not only reduces operational costs associated with cloud-based solutions but also enhances reliability by mitigating risks related to internet disruptions or server downtimes.

Applications and Use Cases

ChoiatBot caters to a wide array of applications, from customer support automation to educational tools and personalized assistants. Businesses can integrate ChoiatBot into their customer service frameworks to provide instant responses and streamline communication channels. Educational institutions can leverage ChoiatBot to create interactive learning environments where students can receive tailored explanations and guidance.

For developers and data scientists, ChoiatBot offers a versatile platform for experimenting with different datasets and fine-tuning models. The provided code, along with detailed documentation on usage, encourages innovation and facilitates the adaptation of advanced AI capabilities to specific project requirements.

Conclusion

In conclusion, ChoiatBot represents a leap forward in AI-driven conversational agents, combining cutting-edge technology with a commitment to user privacy and customization. Whether you are looking to enhance customer interactions, optimize educational experiences, or explore the frontiers of AI research, ChoiatBot stands ready as your reliable local copilot agent, empowering you to harness the full potential of AI in your endeavors. Discover ChoiatBot today and unlock a new era of intelligent, personalized interactions tailored to your unique needs and aspirations:

Development Environment:
Operating System: Windows 10 (widely used and compatible)
Hardware: CPU (no NVIDIA GPU required, making it accessible to a broader audience)
Language Model:
Local LLM (Large Language Model): This provides the core conversational caUsed Google Flan 5 small LLM.f using a CPU)
Hugging Face Dataset: You've leveraged a small dataset from Hugging Face, a valuable resource for pre-trained models and datasets. This enables you to fine-tune the LLM for your specific purposes.
Data Processing and Training:
LagChain (if applicable): If you're using LagChain, it likely facilitates data processing and training pipelines for your LLM, streamlining the development process.
Guardrails (Optional):
NVIDIA Nemo Guardrail Library (if applicable): While Guardrail is typically used with NVIDIA GPUs, it's possible you might be employing a CPU-compatible version or alternative library for safety and bias mitigation.
Key Features:

Dataset Agnostic: This chatbot can be trained on various datasets, allowing you to customize its responses based on your specific domain or requirements.
General Knowledge Base: The initial training with a small Wikipedia dataset provides a solid foundation for general knowledge and information retrieval.
High Accuracy: You've achieved impressive accuracy in responses, suggesting effective training and data selection.
Good Quality Responses: The chatbot delivers informative and well-structured answers, enhancing user experience and satisfaction.
Additional Considerations:

Fine-Tuning Dataset: Consider exploring domain-specific datasets from Hugging Face or other sources to further enhance the chatbot's expertise in your chosen area.
Active Learning: If you're looking for continuous learning and improvement, investigate active learning techniques where the chatbot can identify informative data points to refine its responses.
User Interface: While this response focuses on the backend, a well-designed user interface (text-based, graphical, or voice) can significantly improve ushatbot application's capabilities!

Development Environment:
Operating System: Windows 10 (widely used and compatible)
Hardware: CPU (no NVIDIA GPU required, making it accessible to a broader audience)
Language Model:
Local LLM (Large Language Model): This provides the core conversational caUsed Google Flan 5 small LLM.f using a CPU)
Hugging Face Dataset: You've leveraged a small dataset from Hugging Face, a valuable resource for pre-trained models and datasets. This enables you to fine-tune the LLM for your specific purposes.
Data Processing and Training:
LagChain (if applicable): If you're using LagChain, it likely facilitates data processing and training pipelines for your LLM, streamlining the development process.
Guardrails (Optional):
NVIDIA Nemo Guardrail Library (if applicable): While Guardrail is typically used with NVIDIA GPUs, it's possible you might be employing a CPU-compatible version or alternative library for safety and bias mitigation.
Key Features:

Dataset Agnostic: This chatbot can be trained on various datasets, allowing you to customize its responses based on your specific domain or requirements.
General Knowledge Base: The initial training with a small Wikipedia dataset provides a solid foundation for general knowledge and information retrieval.
High Accuracy: You've achieved impressive accuracy in responses, suggesting effective training and data selection.
Good Quality Responses: The chatbot delivers informative and well-structured answers, enhancing user experience and satisfaction.
Additional Considerations:

Fine-Tuning Dataset: Consider exploring domain-specific datasets from Hugging Face or other sources to further enhance the chatbot's expertise in your chosen area.
Active Learning: If you're looking for continuous learning and improvement, investigate active learning techniques where the chatbot can identify informative data points to refine its responses.
User Interface: While this response focuses on the backend, a well-designed user interface (text-based, graphical, or voice) can significantly improve ushatbot application's capabilities!
Introducing ChoiatBot Local CoPilot: Your Customizable Local Copilot Agent

ChoiatBot offers a revolutionary approach to personalized chatbot solutions, developed to operate entirely on CPU-based systems without the need for an internet connection. This ensures not only enhanced privacy but also unrestricted accessibility, making it ideal for environments where data security is paramount.

Key Features and Capabilities

ChoiatBot stands out with its ability to be seamlessly integrated with diverse datasets, allowing users to upload and train the bot with their own data and documents. This customization empowers businesses and individuals alike to tailor the bot's responses to specific needs, ensuring a truly personalized user experience.

Powered by the google/flan-t5-small model, ChoiatBot leverages state-of-the-art technology known for its robust performance across various benchmarks. This model's impressive few-shot learning capabilities, as evidenced by achievements like 75.2% on the five-shot MMLU benchmark, ensure that ChoiatBot delivers accurate and contextually relevant responses even with minimal training data.

The foundation of ChoiatBot's intelligence lies in its training on the "Wizard-of-Wikipedia" dataset, renowned for its groundbreaking approach to knowledge-grounded conversation generation. This dataset not only enriches the bot's understanding but also enhances its ability to provide nuanced and informative responses based on a broad spectrum of topics.

Performance and Security

One of ChoiatBot's standout features is its ability to function offline, offering unparalleled data security and privacy. This capability is particularly advantageous for sectors dealing with sensitive information or operating in environments with limited internet connectivity. By eliminating reliance on external servers, ChoiatBot ensures that sensitive data remains within the user's control, adhering to the strictest security protocols.

Moreover, ChoiatBot's implementation on CPU-based systems underscores its efficiency and accessibility. This approach not only reduces operational costs associated with cloud-based solutions but also enhances reliability by mitigating risks related to internet disruptions or server downtimes.

Applications and Use Cases

ChoiatBot caters to a wide array of applications, from customer support automation to educational tools and personalized assistants. Businesses can integrate ChoiatBot into their customer service frameworks to provide instant responses and streamline communication channels. Educational institutions can leverage ChoiatBot to create interactive learning environments where students can receive tailored explanations and guidance.

For developers and data scientists, ChoiatBot offers a versatile platform for experimenting with different datasets and fine-tuning models. The provided code, along with detailed documentation on usage, encourages innovation and facilitates the adaptation of advanced AI capabilities to specific project requirements.

Conclusion

In conclusion, ChoiatBot represents a leap forward in AI-driven conversational agents, combining cutting-edge technology with a commitment to user privacy and customization. Whether you are looking to enhance customer interactions, optimize educational experiences, or explore the frontiers of AI research, ChoiatBot stands ready as your reliable local copilot agent, empowering you to harness the full potential of AI in your endeavors. Discover ChoiatBot today and unlock a new era of intelligent, personalized interactions tailored to your unique needs and aspirations.

You can use my code to customize with your dataset and build and local copilot and chatbot agent yourself even without GPU :).


Sunday

Github Action

 

Photo by Aleksandr Neplokhov at pexel

Let’s first clarify the difference between workflow and CI/CD and discuss what GitHub Actions do.

  1. Workflow:

    • A workflow is a series of automated steps that define how code changes are built, tested, and deployed.
    • Workflows can include various tasks such as compiling code, running tests, and deploying applications.
    • Workflows are defined in a YAML file (usually named .github/workflows/workflow.yml) within your repository.
    • They are triggered by specific events (e.g., push to a branch, pull request, etc.).
    • Workflows are not limited to CI/CD; they can automate any process in your development workflow
  2. CI/CD (Continuous Integration/Continuous Deployment):

    • CI/CD refers to the practice of automating the process of integrating code changes, testing them, and deploying them to production.
    • Continuous Integration (CI) focuses on automatically building and testing code changes whenever they are pushed to a repository.
    • Continuous Deployment (CD) extends CI by automatically deploying code changes to production environments.
    • CI/CD pipelines ensure that code is consistently tested and deployed, reducing manual effort and minimizing errors.
  3. GitHub Actions:

    • GitHub Actions is a feature within GitHub that enables you to automate workflows directly from your GitHub repository.
    • Key advantages of using GitHub Actions for CI/CD pipelines include:
      • Simplicity: GitHub Actions simplifies CI/CD pipeline setup. You define workflows in a YAML file within your repo, and it handles the rest.
      • Event Triggers: You can respond to any webhook on GitHub, including pull requests, issues, and custom webhooks from integrated apps.
      • Community-Powered: Share your workflows publicly or access pre-built workflows from the GitHub Marketplace.
      • Platform Agnostic: GitHub Actions works with any platform, language, and cloud provider

In summary, GitHub Actions provides a flexible and integrated way to define workflows, including CI/CD pipelines, directly within your GitHub repository. It’s a powerful tool for automating tasks and improving your development process! 😊


GitHub Actions, Jenkins, and GitLab CI/CD are all popular tools for automating software development workflows, but they serve different purposes and have distinct features. Let’s briefly compare them:

  1. GitHub Actions:

    • Event-driven CI/CD tool integrated with GitHub repositories.
    • Workflow files are written in YAML.
    • Provides free runners hosted on Microsoft Azure for building, testing, and deploying applications.
    • Has a marketplace with pre-made actions for various tasks.
    • Beginner-friendly and easy to set up1.
    • Well-suited for startups and small companies.
  2. Jenkins:

  3. GitLab CI/CD:

    • Integrated with GitLab repositories.
    • Uses .gitlab-ci.yml files for defining pipelines.
    • Provides shared runners or allows self-hosted runners.
    • Strong integration with GitLab features.
    • Well-suited for teams using GitLab for source control and project management.

Will GitHub Actions “kill” Jenkins and GitLab? Not necessarily. Each tool has its strengths and weaknesses, and the choice depends on your specific needs, existing workflows, and team preferences. Some organizations even use a combination of these tools to cover different use cases3. Ultimately, it’s about finding the right fit for your development process4. 😊


You can see one Github Action implemented for the demo here hope this will help to start. 


Monday

DevOps Steps in Cloud


Step 1: Container Image Build

1. In your source code repository (e.g., Git), include a Dockerfile that specifies how to build your application into a container image.

2. Configure your CI/CD tool (e.g., AWS CodeBuild, Jenkins) to build the Docker image using the Dockerfile. This can be done by executing a `docker build` command within your CI/CD script.

3. Ensure that the Docker image is built with the necessary dependencies and configurations.

Step 2: Container Registry

4. Choose a container registry service to store your Docker images. Common choices include:

   - AWS Elastic Container Registry (ECR) if you're using AWS.

   - Docker Hub for public images.

   - Other cloud providers' container registries (e.g., Google Container Registry, Azure Container Registry).

Step 3: Pushing Images

5. After building the Docker image, tag it with a version or unique identifier.

6. Use the `docker push` command to push the image to the selected container registry.

Step 4: Deployment

7. In your CD pipeline, integrate the deployment of the Docker container. This will depend on your application's architecture:

   - For Kubernetes-based applications, you can use Kubernetes manifests (YAML files) to define your application deployment. Update these manifests with the new Docker image version and apply them using `kubectl`.

   - For AWS-based applications, you can use services like Amazon ECS, Amazon EKS, or AWS Fargate for container orchestration. Update your task or service definition to use the new Docker image version.

Step 5: Automation and Rollback

8. Ensure that your CI/CD pipeline includes automation for container image tagging and deployment.

9. Implement rollback strategies, such as keeping the previous version of the container image in the registry to easily roll back in case of issues.

Step 6: Security

10. Pay attention to container image security. Scan container images for vulnerabilities using tools like Clair, Trivy, or AWS ECR image scanning.

11. Use container image signing and security policies to ensure that only trusted images are deployed.

Step 7: Monitoring

12. Implement monitoring and logging for your containerized application and infrastructure. Tools like Prometheus, Grafana, and cloud provider monitoring services can help.

Step 8: Integration with DevOps Pipeline

13. Integrate these container-related steps into your overall DevOps pipeline. For example, you can trigger the pipeline whenever changes are pushed to your Git repository.

Step 9: Documentation and Training

14. Ensure that your team is trained in containerization best practices and the use of your CI/CD pipeline.

With these steps, you can fully automate the build, registration (push), and deployment of Docker container images as part of your DevOps pipeline. This allows developers to focus on writing code, while the pipeline takes care of packaging and deploying applications consistently.

Tuesday

Delete Large Files From Your Git History Without Using Git LFS

 


If you want to delete large files from your Git history without using Git LFS, you can use the `git filter-branch` command along with the `--tree-filter` option to remove the files from your Git history. This process will rewrite the repository's history and remove the specified files.

Here's how you can do it:

1. Backup Your Repository:

   Before proceeding, make sure to create a backup of your repository to avoid data loss in case something goes wrong.

2. Identify Large Files:

   Identify the large files that you want to remove from the Git history, such as `data/hail-2015.csv`.

3. Run the `git filter-branch` Command:

   Use the `git filter-branch` command with the `--tree-filter` option to remove the large files from your Git history. Replace `data/hail-2015.csv` with the actual file path you want to remove.


   ```bash

   git filter-branch --force --index-filter \

   "git rm --cached --ignore-unmatch data/hail-2015.csv" \

   --prune-empty --tag-name-filter cat -- --all

   ```

This command will rewrite the Git history to exclude the specified file. Please note that this command will take some time to complete, especially for large repositories.

4. Clean Up Unreachable Objects:

   After running the `git filter-branch` command, there might be unreachable objects left in your repository. To remove them, run the following command:


   ```bash

   git reflog expire --expire=now --all && git gc --prune=now --aggressive

   ```

5. Force Push to Update Remote Repository:

   Since you've rewritten the Git history, you'll need to force push the changes to the remote repository:


   ```bash

   git push --force origin main

   ```


Replace `main` with the name of your branch if it's different.

Please use caution when performing these actions, especially if your repository is shared with others. Rewriting history can affect collaborators, so it's important to communicate with your team and coordinate this process.

Photo by Christina Morillo

Friday

Review Pull Request

 

Being leading diversified team for years. Besides Software Architect for more than 13 years for several different type of projects. I use to review several pull requests each day.

Sometimes it is tremendously time consuming. Especially when you have to tackle the not clear commit messages, SCRUM bug reports or work log are not clearly defined the issues or features.

Here is an example of an ML code review with a pull request from GitHub:

Title: Add a new model to the librar

Author: John Doe

Reviewer: Jane Doe

Description:

This pull request adds a new model to the library. The model is a simple linear regression model that can be used to predict house prices.

Changes:

* Added the new model to the library.

* Added unit tests for the new model.

* Updated the documentation to include the new model.

Review:

I have reviewed the pull request and I have found no major issues. The code is well-written and the unit tests are comprehensive. I recommend that the pull request be approved.

Approve/Reject:

Approved

Here are some additional things to keep in mind when reviewing ML code:

  • Make sure that the code is well-documented. The documentation should be clear and concise, and it should explain how the code works.
  • Test the code thoroughly. The unit tests should be comprehensive and they should cover all possible scenarios.
  • Consider the impact of the changes. Think about how the changes will affect other parts of the code.
  • Be constructive in your feedback. The goal of a code review is to help the author improve the code, not to criticize them.

Here are some tips on how to review pull requests:

  1. Start by reading the commit messages. The commit messages should provide a clear and concise overview of the changes that have been made.
  2. Run the code. This is the most important step in the review process. You should run the code to make sure that it compiles and runs without errors.
  3. Check for style violations. The code should follow the project’s style guide. You can use a linter to help you check for style violations.
  4. Test the code. You should write unit tests to test the code. This will help you to make sure that the code works as expected.
  5. Review the documentation. If the code includes documentation, you should review the documentation to make sure that it is accurate and up-to-date.
  6. Leave comments. If you find any problems with the code, you should leave comments on the pull request. This will help the author to fix the problems.
  7. Approve or reject the pull request. Once you have reviewed the pull request, you should approve or reject it. If you approve the pull request, the code will be merged into the main branch. If you reject the pull request, the author will need to make changes to the code before it can be approved.
  8. Be constructive in your feedback. The goal of a pull request review is to help the author improve the code, not to criticize them.
  9. Be specific. Don’t just say “the code is bad.” Explain what is wrong with the code and how it can be improved.
  10. Be helpful. If you can, offer suggestions on how the code can be improved.
  11. Be timely. Don’t leave pull requests hanging for days or weeks.

There are a few ways to make the review test automatic as much as possible for eg. Python.

  • Use a linter. A linter is a tool that can be used to check for style violations in Python code. There are many different linters available, such as flake8 and pylint.
  • Write unit tests. Unit tests are small, isolated tests that can be used to verify the functionality of a piece of code. Unit tests can be run automatically, which can help to make the review process more efficient.
  • Use a continuous integration (CI) server. A CI server is a tool that can be used to run unit tests and other automated checks on code changes. CI servers can be configured to run automatically whenever a new pull request is submitted.

Here are some specific tools and libraries that you can use to automate the review test for Python:

  • Flake8: A linter that can be used to check for style violations in Python code.
  • Pylint: A linter that can be used to check for style violations and potential bugs in Python code.
  • Unittest: A library that can be used to write unit tests for Python code.
  • Travis CI: A CI server that can be used to run unit tests and other automated checks on code changes.
  • CircleCI: Another CI server that can be used to run unit tests and other automated checks on code changes.

Thank you.