Highlights
- Pro
Stars
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".
A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation
[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
Open-source text-to-speech for European languages with voice cloning
🗣️ ComfyUI nodes for KugelAudi- Open-source text-to-speech with voice cloning for 24 European languages
Precision Alignment, Infinite Possibilities
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…
AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
great-wind / MicroSoft_VibeVoice
Forked from microsoft/VibeVoiceFrontier Open-Source Text-to-Speech
Wan: Open and Advanced Large-Scale Video Generative Models
ONNX Inference of Pyannote Segmentation
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment
A song aesthetic evaluation toolkit trained on SongEval.

