Posts

TensorRT

🧠 When to Use TensorRT Use TensorRT only for inference , not for training or fine-tuning. It provides: Lower latency Faster throughput Reduced memory footprint ⚙️ Requirements To use TensorRT: GPU with Tensor Cores (Volta, Turing, Ampere, etc.) ONNX model format (you need to export your model to ONNX first) Install: pip install nvidia-pyindex pip install tensorrt 🔥 Hugging Face + TensorRT You can export HuggingFace models using transformers.onnx : transformers-cli env # check installation transformers onnx --model=codellama/CodeLlama-7B-Instruct-hf --feature=causal-lm ./onnx/ Then optimize it via TensorRT with onnxruntime or trtexec . ⚠️ Kaggle Note Kaggle does not support TensorRT , as it lacks: root access for TensorRT driver-level installations required NVIDIA runtime permissions ✅ Use Locally or on Cloud (AWS/GCP/Colab Pro+ with CUDA support) Let me know if you want a step-by-step ONNX → TensorRT pipeline . To run inference with...

OPEA (Open Platform for Enterprise AI)

Image
                                                                               opea.dev Recently, I have tried to deploy my multi-agent application. Which I developed on my laptop. However, I wanted to deploy it in a production-grade environment for my office's R&D POC project. Let me break down why I chose OPEA. OPEA (Open Platform for Enterprise AI) is an open-source framework designed to help you build and deploy production-grade AI applications, including multi-agent systems. 1 While Docker Compose is excellent for local development and smaller-scale deployments, OPEA aims to provide the robust infrastructure and capabilities needed for enterprise-level production environments. Here's how OPEA can help you transition your Docker Compose multi-age...