NVIDIA's latest open source ASR: Nemotron Speech ASR, specializes in low-latency real-time voice agent scenarios, allowing multiple people to talk at the same time without causing delays
Nemotron Speech ASR launched by NVIDIA focuses on low-latency real-time speech processing and supports multi-person conversations. The transcription time of a single sentence is only 24 milliseconds, and the overall latency is less than 500 milliseconds. Through the FastConformer architecture and cache-aware mechanism, voice features are incrementally calculated and delay modes are dynamically configured to meet different application needs.
Nemotron Speech ASR launched by NVIDIA focuses on low-latency real-time speech processing and supports multi-person conversations. The transcription time of a single sentence is only 24 milliseconds, and the overall latency is less than 500 milliseconds. Through the FastConformer architecture and cache-aware mechanism, voice features are incrementally calculated and delay modes are dynamically configured to meet different application needs.