toolSTTmodel

Mistral has just released its latest speech-to-text model: Voxtral Transcribe 2, with real-time latency below 200ms and support for speaker separation.

Mistral has launched the Voxtral Transcribe 2 speech-to-text model with real-time latency of less than 200ms and supports speaker separation. The model is available in two versions: Voxtral Realtime is suitable for real-time applications, with delays configurable to within 200ms and word error rates close to the offline version; Voxtral Mini Transcribe 2 is suitable for batch processing and supports 13 languages and word-level timestamps

51 views0 stars3/5/2026

Visit GitHub