Meeting & Podcast Transcription (FunASR + MiMo)
Transcribe multi-speaker audio into structured Markdown with automatic speaker diarization, hotword biasing, and optional LLM cleanup. Two ASR engine families are available: FunASR (Paraformer / SenseVoice / Whisper — fast, cheap, GPU or CPU, 99 languages) and MiMo-V2.5-ASR (Xiaomi's 8B model, local GPU only, stronger on proper nouns and code-switching). Both share the same VAD + speaker-clustering stack.
All scripts run directly from t
[Description truncada. Veja o README completo no GitHub.]