Posts

Showing posts from June 14, 2025

TensorRT

🧠 When to Use TensorRT Use TensorRT only for inference , not for training or fine-tuning. It provides: Lower latency Faster throughput Reduced memory footprint ⚙️ Requirements To use TensorRT: GPU with Tensor Cores (Volta, Turing, Ampere, etc.) ONNX model format (you need to export your model to ONNX first) Install: pip install nvidia-pyindex pip install tensorrt 🔥 Hugging Face + TensorRT You can export HuggingFace models using transformers.onnx : transformers-cli env # check installation transformers onnx --model=codellama/CodeLlama-7B-Instruct-hf --feature=causal-lm ./onnx/ Then optimize it via TensorRT with onnxruntime or trtexec . ⚠️ Kaggle Note Kaggle does not support TensorRT , as it lacks: root access for TensorRT driver-level installations required NVIDIA runtime permissions ✅ Use Locally or on Cloud (AWS/GCP/Colab Pro+ with CUDA support) Let me know if you want a step-by-step ONNX → TensorRT pipeline . To run inference with...