Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
May 30, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Automagically synchronize subtitles with video.
faster_whisper GUI with PySide6
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Whisper.net. Speech to text made simple using Whisper Models
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
An audio/acoustic activity detection and audio segmentation tool
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
Voice Activity Detection based on Deep Learning & TensorFlow
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
On-device voice activity detection (VAD) powered by deep learning
Add a description, image, and links to the vad topic page so that developers can more easily learn about it.
To associate your repository with the vad topic, visit your repo's landing page and select "manage topics."