Skip to main content

Posts

Showing posts with the label cpu

Leveraging CUDA for General Parallel Processing Application

  Photo by SevenStorm JUHASZIMRUS by pexel Differences Between CPU-based Multi-threading and Multi-processing CPU-based Multi-threading : - Concept: Uses multiple threads within a single process. - Shared Memory: Threads share the same memory space. - I/O Bound Tasks: Effective for tasks that spend a lot of time waiting for I/O operations. - Global Interpreter Lock (GIL): In Python, the GIL can be a limiting factor for CPU-bound tasks since it allows only one thread to execute Python bytecode at a time. CPU-based Multi-processing : - Concept: Uses multiple processes, each with its own memory space. - Separate Memory: Processes do not share memory, leading to more isolation. - CPU Bound Tasks: Effective for tasks that require significant CPU computation since each process can run on a different CPU core. - No GIL: Each process has its own Python interpreter and memory space, so the GIL is not an issue. CUDA with PyTorch : - Concept: Utilizes the GPU for parallel computation. - Massi...

Chatbot and Local CoPilot with Local LLM, RAG, LangChain, and Guardrail

  Chatbot Application with Local LLM, RAG, LangChain, and Guardrail I've developed a chatbot application designed for informative and engaging conversationAs you already aware that Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from large language models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly improve the value of generative AI systems. Developers must consider a variety of factors when building a RAG pipeline: from LLM response benchmarking to selecting the right chunk size. In tapplication demopost, I demonstrate how to build a RAG pipeline uslocal LLM which can be converted to ing NVIDIA AI Endpoints for LangChain. FirI have you crdeate a vector storeconnecting with one of the ...

How to Run LLaMA in Your Laptop

  The LLaMA open model is a large language model that requires significant computational resources and memory to run. While it's technically possible to practice with the LLaMA open model on your laptop, there are some limitations and considerations to keep in mind: You can find details about this LLM model here Hardware requirements: The LLaMA open model requires a laptop with a strong GPU (Graphics Processing Unit) and a significant amount of RAM (at least 16 GB) to run efficiently. If your laptop doesn't meet these requirements, you may experience slow performance or errors. Model size: The LLaMA open model is a large model, with over 1 billion parameters. This means that it requires a significant amount of storage space and memory to load and run. If your laptop has limited storage or memory, you may not be able to load the model or may experience performance issues. Software requirements: To run the LLaMA open model, you'll need to install specific software and librari...

Dig Into CPU and GPU

  Photo by Nana Dua Let first recap what is CPU and GPU.                              Image courtesy: researchgate Central Processing Unit (CPU) The Central Processing Unit (CPU) is the brain of a computer, responsible for carrying out most of the computational tasks. It's like the conductor of an orchestra, coordinating and executing instructions from various programs and applications. CPUs are designed to handle general-purpose tasks, such as running web browsers, editing documents, and playing games. They excel at both sequential and parallel processing. Graphics Processing Unit (GPU) The Graphics Processing Unit (GPU) is a specialized processor designed to handle the computationally intensive tasks involved in graphics rendering and image processing. Unlike CPUs, GPUs are designed for parallel processing, capable of handling multiple instructions simultaneously. This makes them ...