Posts

Showing posts with the label ast

Automatic Speech Recognition with Gemma

Image
I've created a complete ASR (Automatic Speech Recognition) demo using Docker Compose with the following architecture: 🏗️ Architecture Overview 3 Microservices: Ollama Service - Runs Gemma 2:2B model for text enhancement ASR Service - FastAPI backend with Whisper for transcription Web UI - Nginx-served interactive frontend 🚀 Key Features Audio Input: ✅ Browser-based recording with microphone ✅ File upload with drag & drop (MP3, WAV, M4A, OGG) Processing Pipeline: ✅ Whisper (tiny model) for fast speech-to-text ✅ Ollama Gemma 2:2B for text enhancement and correction ✅ Processing time tracking User Experience: ✅ Real-time recording with timer ✅ Health status monitoring ✅ Side-by-side comparison of raw vs enhanced text ✅ Responsive modern UI 📁 Quick Setup Create project structure: mkdir asr-demo && cd asr-demo Save all files to their respective directories: docker-compose.yml in root ASR service files in asr-service/ Web UI fil...