Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,821 158 Updated Feb 25, 2026

tencent-ailab / SongGeneration

The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment

Python 1,535 183 Updated Mar 12, 2026

wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,251 187 Updated Mar 31, 2026

tencent-ailab / SongBloom

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 764 86 Updated Dec 4, 2025

Tencent-Hunyuan / HunyuanImage-2.1

HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation

Python 672 55 Updated Oct 14, 2025

Soul-AILab / SoulX-Singer

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 513 55 Updated Mar 26, 2026

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 508 67 Updated Dec 22, 2025

ASLP-lab / OSUM

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 483 31 Updated Nov 23, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 481 31 Updated Mar 22, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 445 26 Updated Mar 24, 2026

MoonshotAI / Kimina-Prover-Preview

Technical report of Kimina-Prover Preview.

Python 366 20 Updated Jul 10, 2025

jzq2000 / MoonCast

Python 344 44 Updated Apr 11, 2025

tencent-ailab / MuQ

Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

Python 320 14 Updated Aug 4, 2025

ASLP-lab / SongEval

A song aesthetic evaluation toolkit trained on SongEval.

Python 292 25 Updated Jun 15, 2025

Kugelaudio / kugelaudio-open

Open-source text-to-speech for European languages with voice cloning

Python 237 38 Updated Feb 6, 2026

ShandaAI / Hive

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Python 233 26 Updated Mar 9, 2026

ddlBoJack / MMAR

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 202 4 Updated Feb 25, 2026

xcc-zach / xtalk

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 189 19 Updated Apr 1, 2026

thuhcsi / SECap

Python 178 12 Updated Jul 9, 2024

ByteDance-Seed / AHN

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 175 5 Updated Oct 17, 2025

MoonshotAI / Kimi-Audio-Evalkit

Python 162 11 Updated Nov 20, 2025

lattifai / lattifai-python

Precision Alignment, Infinite Possibilities

Python 122 8 Updated Apr 1, 2026

ASLP-lab / SongFormer

Python 120 19 Updated Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jianwei Yu MSLDCherryPick

Achievements

Achievements

Highlights

Block or report MSLDCherryPick

Stars

deepseek-ai / DeepSeek-V3

microsoft / BitNet

microsoft / VibeVoice

Wan-Video / Wan2.2

multimodal-art-projection / YuE

MoonshotAI / Kimi-Audio

fixie-ai / ultravox

FireRedTeam / FireRedASR