Posts

Showing posts with the label cpu

AI Laptops And Data Centers Fiasco

Image
                                                                generated by Meta AI Yes, low‑power AI laptops (AI PCs with efficient NPUs/SoCs) are strongly aligned with where the market is going and will be a major part of the future of personal computing. ​ Why low‑power AI laptops matter AI PCs integrate dedicated neural processing units (NPUs) so they can run AI workloads on‑device instead of sending everything to the cloud, which reduces latency, preserves privacy, and cuts energy use. Vendors like Microsoft, Qualcomm, Intel, AMD and ARM OEMs are standardizing on this model, with “Copilot+” or similar labels now tied to minimum on‑device AI performance at relatively low power budgets. ​ At the silicon level, ARM‑based and NPU‑heavy designs deliver high TOPS at far lower watts than traditional x86‑only CPUs, enabling 15–2...

OLLama and Gemma3 Tiny Test On CPU

Image
Have you ever tested the tiny LLM Gemma3:1B with OLLama on your laptop or system that lacks a GPU? You can build a fairly powerful GenAI application; however, it can be a little slow due to CPU processing.  Steps: Download and install ollama if not already there in your system:  go to https://ollama.com/download and get the installation command Check the ollama running by `ollama --version` Now pull the Gemma LLM:  Go to https://ollama.com/library/gemma3 Run: `ollama pull  gemma3:1b` Run Ollama server with LLM if not already running Check the list: `ollama list` Run: `ollama serve` Install the pip lib  Run: `pip install ollama` Run: `pip install "jupyter-ai[ollama]` To stop the ollama server Run: `ps aux | grep ollama` Run: `kill <PID>` Run: `sudo systemctl stop ollama` That all. Now got to your jupyter notebook. If not running run by command: `jupyter lab` or `jupyter notebook` You can test by running my eg. notebook here  https://github.co...

Is Moore's Law Dead

Image
                                                  image just for representation only generated by gemini 1. Moore's Law: This is an observation made by Intel co-founder Gordon Moore in 1965, stating that the number of transistors on a microchip doubles approximately every two years (he later revised it from one year). This observation has largely held true for decades and has been a driving force behind the exponential growth in computing power. Is it ending? The consensus in the industry is that Moore's Law, in its traditional sense of simply shrinking transistors and doubling their density at minimal cost, is indeed slowing down and approaching its physical and economic limits. Here's why: Physical Limits: Transistors are already at an atomic scale (some are just a few nanometers wide), and it's becoming increasingly difficult to make them smal...

Leveraging CUDA for General Parallel Processing Application

Image
  Photo by SevenStorm JUHASZIMRUS by pexel Differences Between CPU-based Multi-threading and Multi-processing CPU-based Multi-threading : - Concept: Uses multiple threads within a single process. - Shared Memory: Threads share the same memory space. - I/O Bound Tasks: Effective for tasks that spend a lot of time waiting for I/O operations. - Global Interpreter Lock (GIL): In Python, the GIL can be a limiting factor for CPU-bound tasks since it allows only one thread to execute Python bytecode at a time. CPU-based Multi-processing : - Concept: Uses multiple processes, each with its own memory space. - Separate Memory: Processes do not share memory, leading to more isolation. - CPU Bound Tasks: Effective for tasks that require significant CPU computation since each process can run on a different CPU core. - No GIL: Each process has its own Python interpreter and memory space, so the GIL is not an issue. CUDA with PyTorch : - Concept: Utilizes the GPU for parallel computation. - Massi...

Chatbot and Local CoPilot with Local LLM, RAG, LangChain, and Guardrail

Image
  Chatbot Application with Local LLM, RAG, LangChain, and Guardrail I've developed a chatbot application designed for informative and engaging conversationAs you already aware that Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from large language models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly improve the value of generative AI systems. Developers must consider a variety of factors when building a RAG pipeline: from LLM response benchmarking to selecting the right chunk size. In tapplication demopost, I demonstrate how to build a RAG pipeline uslocal LLM which can be converted to ing NVIDIA AI Endpoints for LangChain. FirI have you crdeate a vector storeconnecting with one of the ...

How to Run LLaMA in Your Laptop

Image
  The LLaMA open model is a large language model that requires significant computational resources and memory to run. While it's technically possible to practice with the LLaMA open model on your laptop, there are some limitations and considerations to keep in mind: You can find details about this LLM model here Hardware requirements: The LLaMA open model requires a laptop with a strong GPU (Graphics Processing Unit) and a significant amount of RAM (at least 16 GB) to run efficiently. If your laptop doesn't meet these requirements, you may experience slow performance or errors. Model size: The LLaMA open model is a large model, with over 1 billion parameters. This means that it requires a significant amount of storage space and memory to load and run. If your laptop has limited storage or memory, you may not be able to load the model or may experience performance issues. Software requirements: To run the LLaMA open model, you'll need to install specific software and librari...

Dig Into CPU and GPU

Image
  Photo by Nana Dua Let first recap what is CPU and GPU.                              Image courtesy: researchgate Central Processing Unit (CPU) The Central Processing Unit (CPU) is the brain of a computer, responsible for carrying out most of the computational tasks. It's like the conductor of an orchestra, coordinating and executing instructions from various programs and applications. CPUs are designed to handle general-purpose tasks, such as running web browsers, editing documents, and playing games. They excel at both sequential and parallel processing. Graphics Processing Unit (GPU) The Graphics Processing Unit (GPU) is a specialized processor designed to handle the computationally intensive tasks involved in graphics rendering and image processing. Unlike CPUs, GPUs are designed for parallel processing, capable of handling multiple instructions simultaneously. This makes them ...