Skip to content
View MSLDCherryPick's full-sized avatar
  • Microsoft

Highlights

  • Pro

Block or report MSLDCherryPick

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
36 stars written in Python
Clear filter

Official inference framework for 1-bit LLMs

Python 36,896 3,215 Updated Mar 10, 2026

Open-Source Frontier Voice AI

Python 33,822 3,836 Updated Apr 1, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,980 1,815 Updated Mar 17, 2026

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 6,113 720 Updated Jun 4, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,551 343 Updated Jun 21, 2025

A fast multimodal LLM for real-time voice

Python 4,386 369 Updated Dec 12, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,821 158 Updated Feb 25, 2026

The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment

Python 1,535 183 Updated Mar 12, 2026

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,251 187 Updated Mar 31, 2026

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 764 86 Updated Dec 4, 2025

HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation​

Python 672 55 Updated Oct 14, 2025

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 513 55 Updated Mar 26, 2026

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 508 67 Updated Dec 22, 2025

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 483 31 Updated Nov 23, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 481 31 Updated Mar 22, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 445 26 Updated Mar 24, 2026

Technical report of Kimina-Prover Preview.

Python 366 20 Updated Jul 10, 2025
Python 344 44 Updated Apr 11, 2025

Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

Python 320 14 Updated Aug 4, 2025

A song aesthetic evaluation toolkit trained on SongEval.

Python 292 25 Updated Jun 15, 2025

Open-source text-to-speech for European languages with voice cloning

Python 237 38 Updated Feb 6, 2026

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Python 233 26 Updated Mar 9, 2026

[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 202 4 Updated Feb 25, 2026

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 189 19 Updated Apr 1, 2026
Python 178 12 Updated Jul 9, 2024

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 175 5 Updated Oct 17, 2025

Precision Alignment, Infinite Possibilities

Python 122 8 Updated Apr 1, 2026
Python 120 19 Updated Oct 16, 2025
Next