Highlights
- Pro
Stars
Official inference framework for 1-bit LLMs
Wan: Open and Advanced Large-Scale Video Generative Models
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
Technical report of Kimina-Prover Preview.
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
A song aesthetic evaluation toolkit trained on SongEval.
Open-source text-to-speech for European languages with voice cloning
A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…
AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
Precision Alignment, Infinite Possibilities

