MSLDCherryPick

Jianwei Yu MSLDCherryPick

Senior Research in Microsoft Research

87 followers · 1 following

Microsoft

Achievements

Highlights

Stars

lattifai / benchmark

LattifAI benchmark

HTML 3 Updated Mar 31, 2026

VibingJustSpeakIt / Vibing

HTML 149 8 Updated Mar 31, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 445 26 Updated Mar 24, 2026

Soul-AILab / SoulX-Singer

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 513 55 Updated Mar 26, 2026

juhayna-zh / AudioControlNet

Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".

Python 67 3 Updated Feb 7, 2026

ShandaAI / Hive

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Python 233 26 Updated Mar 9, 2026

declare-lab / TangoFlux

[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

Jupyter Notebook 849 77 Updated Jan 28, 2026

Kugelaudio / kugelaudio-open

Open-source text-to-speech for European languages with voice cloning

Python 237 38 Updated Feb 6, 2026

Saganaki22 / ComfyUI-KugelAudio

🗣️ ComfyUI nodes for KugelAudi- Open-source text-to-speech with voice cloning for 24 European languages

Python 30 7 Updated Feb 10, 2026

lattifai / lattifai-python

Precision Alignment, Infinite Possibilities

Python 122 8 Updated Apr 1, 2026

xcc-zach / xtalk

X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…

Python 189 19 Updated Apr 1, 2026

boduan1 / HAGeo

14 Updated Feb 2, 2026

zhangxy-2019 / critique-GRPO

Python 61 4 Updated Mar 8, 2026

ByteDance-Seed / AHN

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 175 5 Updated Oct 17, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 481 31 Updated Mar 22, 2026

ASLP-lab / SongFormer

Python 120 19 Updated Oct 16, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,612 244 Updated Jan 8, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 33,744 3,824 Updated Apr 1, 2026

Tencent-Hunyuan / HunyuanImage-2.1

HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation

Python 672 55 Updated Oct 14, 2025

great-wind / MicroSoft_VibeVoice

Forked from microsoft/VibeVoice

Frontier Open-Source Text-to-Speech

Python 9 4 Updated Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jianwei Yu MSLDCherryPick

Achievements

Achievements

Highlights

Block or report MSLDCherryPick

Stars

lattifai / benchmark

VibingJustSpeakIt / Vibing

FireRedTeam / FireRedASR2S

Soul-AILab / SoulX-Singer

juhayna-zh / AudioControlNet

ShandaAI / Hive

declare-lab / TangoFlux

Kugelaudio / kugelaudio-open

Saganaki22 / ComfyUI-KugelAudio

lattifai / lattifai-python

xcc-zach / xtalk

boduan1 / HAGeo

zhangxy-2019 / critique-GRPO

ByteDance-Seed / AHN

meituan-longcat / LongCat-Flash-Omni

ASLP-lab / SongFormer

QwenLM / Qwen3-Omni

microsoft / VibeVoice

Tencent-Hunyuan / HunyuanImage-2.1

great-wind / MicroSoft_VibeVoice

Wan-Video / Wan2.2

pengzhendong / pyannote-onnx

ASLP-lab / OSUM

pengzhendong / streaming-tts-webui

rednote-hilab / dots.vlm1

multimodal-art-projection / YuE

tencent-ailab / SongBloom

deepseek-ai / DeepSeek-V3

tencent-ailab / SongGeneration

ASLP-lab / SongEval