Skip to content
View MSLDCherryPick's full-sized avatar
  • Microsoft

Highlights

  • Pro

Block or report MSLDCherryPick

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LattifAI benchmark

HTML 3 Updated Mar 31, 2026
HTML 149 8 Updated Mar 31, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 445 26 Updated Mar 24, 2026

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 513 55 Updated Mar 26, 2026

Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".

Python 67 3 Updated Feb 7, 2026

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Python 233 26 Updated Mar 9, 2026

[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

Jupyter Notebook 849 77 Updated Jan 28, 2026

Open-source text-to-speech for European languages with voice cloning

Python 237 38 Updated Feb 6, 2026

🗣️ ComfyUI nodes for KugelAudi- Open-source text-to-speech with voice cloning for 24 European languages

Python 30 7 Updated Feb 10, 2026

Precision Alignment, Infinite Possibilities

Python 122 8 Updated Apr 1, 2026

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 189 19 Updated Apr 1, 2026
14 Updated Feb 2, 2026
Python 61 4 Updated Mar 8, 2026

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 175 5 Updated Oct 17, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 481 31 Updated Mar 22, 2026
Python 120 19 Updated Oct 16, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,612 244 Updated Jan 8, 2026

Open-Source Frontier Voice AI

Python 33,744 3,824 Updated Apr 1, 2026

HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation​

Python 672 55 Updated Oct 14, 2025

Frontier Open-Source Text-to-Speech

Python 9 4 Updated Sep 2, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,978 1,815 Updated Mar 17, 2026

ONNX Inference of Pyannote Segmentation

Python 97 18 Updated Dec 23, 2024

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 483 31 Updated Nov 23, 2025

Streaming Text to Speech Web UI

HTML 22 3 Updated May 6, 2024

The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.

Dockerfile 286 8 Updated Sep 26, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 6,113 720 Updated Jun 4, 2025

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 764 86 Updated Dec 4, 2025

The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment

Python 1,535 183 Updated Mar 12, 2026

A song aesthetic evaluation toolkit trained on SongEval.

Python 292 25 Updated Jun 15, 2025
Next