Implement VibeVoice by pengzhiliang · Pull Request #40546 · huggingface/transformers

pengzhiliang · 2025-08-29T12:47:34Z

What does this PR do?

Merge the model from https://github.com/microsoft/VibeVoice/tree/main

HF:
https://huggingface.co/microsoft/VibeVoice-1.5B

update

ebezzam

@pengzhiliang thanks for the PR! This is an exciting model to add 🔥

My first comments are mainly on rearranging content to be consistent with the other models in Transformers, and creating a modular file to better optimize copying components from other models in Transformers.

There are also some other files to modify:

in src/transformers/models/auto
in docs
and eventually some tests (for which a lot of code can be copied from other models)

As an example of typical files to create/modify, you can check out the Qwen2.5-Omni PR, which is also multimodal.

src/transformers/models/vibevoice/__init__.py

src/transformers/models/vibevoice/audio_streamer.py

src/transformers/models/vibevoice/configuration_vibevoice.py

src/transformers/models/vibevoice/vibevoice_processor.py

src/transformers/models/vibevoice/vibevoice_audio_processor.py

fakerybakery · 2025-09-06T22:37:22Z

Hopefully this PR can get merged, since the deletion of the VibeVoice repo not sure the original authors can continue contributing to this PR

fakerybakery · 2025-10-15T01:33:05Z

Glad to see that someone has picked this PR up, thanks 🤗

ebezzam · 2026-03-04T12:15:12Z

run-slow: vibevoice

github-actions · 2026-03-04T12:16:20Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

github-actions · 2026-03-04T12:31:26Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	70e4b7f8	workflow commit (merge commit)
PR	6c2526d9	branch commit (from PR)
main	7235d442	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

ebezzam · 2026-03-04T15:00:26Z

run-slow: vibevoice, vibevoice_acoustic_tokenizer

github-actions · 2026-03-04T15:01:46Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice", "models/vibevoice_acoustic_tokenizer"]
quantizations: []

github-actions · 2026-03-04T15:21:39Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	4b9d46a1	workflow commit (merge commit)
PR	6f1fbf61	branch commit (from PR)
main	30c48016	base commit (on `main`)

Model CI Report

❌ 2 new failed tests from this PR 😭

vibevoice:
tests/models/vibevoice/test_modeling_vibevoice.py::VibeVoiceForConditionalGenerationIntegrationTest::test_1b5_inference (✅ ⟹ ❌)
tests/models/vibevoice/test_modeling_vibevoice.py::VibeVoiceForConditionalGenerationIntegrationTest::test_1b5_inference_no_voice (✅ ⟹ ❌)

ebezzam · 2026-03-04T15:59:51Z

run-slow: vibevoice

github-actions · 2026-03-04T16:00:03Z

Workflow Run ⚙️💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

github-actions · 2026-03-04T16:01:21Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

github-actions · 2026-03-04T16:28:56Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	45934fcb	workflow commit (merge commit)
PR	7ded0321	branch commit (from PR)
main	fd6bc380	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

ebezzam

Self-review for current version after merging changes from VibeVoice ASR.

Draft model page: https://huggingface.co/bezzam/VibeVoice-1.5B-hf
(7B would be very similar)

docs/source/en/model_doc/vibevoice.md

ebezzam · 2026-03-05T14:01:45Z

src/transformers/models/vibevoice/generation_vibevoice.py

+        if noise_scheduler is None:
+            raise ValueError(
+                "VibeVoice generation requires a `noise_scheduler` to be provided, e.g., "
+                "`diffusers.DPMSolverMultistepScheduler(beta_schedule='squaredcos_cap_v2', prediction_type='v_prediction')`."
+            )
+        if not (
+            hasattr(noise_scheduler, "set_timesteps")
+            and hasattr(noise_scheduler, "step")
+            and hasattr(noise_scheduler, "timesteps")
+        ):
+            raise ValueError(
+                f"The provided noise_scheduler ({type(noise_scheduler).__name__}) is not compatible with VibeVoice "
+                "generation. It must implement `set_timesteps` and `step` methods, and have a `timesteps` attribute."
+            )


Current version doesn't default to creating a noise scheduler to avoid imports from diffusers, but it does encourage to use one from diffusers, as it has the expected methods and attributes.

ebezzam · 2026-03-05T15:26:31Z

setup.py

        "parameterized",
        "psutil",
        "dill",
+        "diffusers",  # Needed for VibeVoice TTS integration tests


Seeing with @ydshieh about installing diffusers on the CI for the slow tests

Done with #44480

ebezzam · 2026-03-05T15:27:33Z

src/transformers/utils/import_utils.py

+def is_diffusers_available() -> bool:
+    return _is_package_available("diffusers")


needed for require_diffusers in testing utils (for VibeVoice integration tests)

ebezzam · 2026-03-05T17:05:06Z

docs/source/en/model_doc/vibevoice.md

+- [bezzam/VibeVoice-1.5B-hf](https://huggingface.co/bezzam/VibeVoice-1.5B-hf)
+- [bezzam/VibeVoice-7B-hf](https://huggingface.co/bezzam/VibeVoice-7B-hf)


TODO: change with final checkpoint in docs, testing, and the model card

ebezzam · 2026-03-06T08:50:26Z

run-slow: vibevoice

ebezzam · 2026-03-06T10:31:40Z

run-slow: vibevoice

github-actions · 2026-03-06T10:32:56Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

github-actions · 2026-03-06T10:33:34Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, csm, vibevoice, vibevoice_acoustic_tokenizer

github-actions · 2026-03-06T10:44:15Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	3670f8f5	workflow commit (merge commit)
PR	9501e9f6	branch commit (from PR)
main	4f91111b	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

pengzhiliang added 4 commits May 23, 2025 19:24

Merge pull request #1 from huggingface/main

db16649

update

Merge branch 'huggingface:main' into main

bc681cd

Merge branch 'huggingface:main' into main

30f09dd

Merge VibeVoice model

6e6e60f

ebezzam added New model Audio labels Aug 29, 2025

ebezzam requested changes Aug 29, 2025

View reviewed changes

twobob mentioned this pull request Sep 5, 2025

[Feature] [New Model]: VibeVoice-1.5B : A Frontier Long Conversational Text-to-Speech Model sgl-project/sglang#9697

Closed

2 tasks

ebezzam added 8 commits September 23, 2025 17:40

Restructuring.

5f461b6

Formatting.

c91497f

Clean up processor and feature extractor.

0bef8e3

Move audio processing to __call__.

f6893e2

Clean up tokenizer, use modular.

f508927

Move token sequence prep to __call__.

67fcd3e

Move batch prep to __call__.

90490d1

More efficient tokenization.

8698bb5

Galigator added a commit to Galigator/transformers that referenced this pull request Oct 8, 2025

Apply PR huggingface#40546

dca2cc5

ebezzam added 10 commits October 10, 2025 18:59

Start separate acoustic tokenizer model.

92d7a7d

Add init for acoustic tokenizer.

add4e35

Remove audio tokenizer from main model.

396bb2c

Separate semantic model.

fa6100c

Switch to modular and remove debug path.

24e9f13

Start conversion script for main model, clean up semantic modeling.

84c27ad

Update semantic modular.

d7d409e

Single conversion script.

06313d0

Clean up semantic tokenizer modeling.

4af74e9

Update modular.

47fdd93

More semantic cleanup.

8de03e7

ebezzam and others added 5 commits March 2, 2026 14:59

Merge branch 'main' into main

dac44a1

Revent back to tokenizer expected outputs from main.

27bbf57

Move tokenizer expected outputs.

9c69f59

Sync with latest acoustic tokenizer.

5d15982

Nits

6c2526d

Address tests

6f1fbf6

import diffusers differently

7ded032

Nits and cleanup.

8fe0b37

ebezzam reviewed Mar 5, 2026

View reviewed changes

More clean up after going through examples.

cec69c3

ebezzam reviewed Mar 5, 2026

View reviewed changes

Processor docstring.

750cca4

ydshieh mentioned this pull request Mar 5, 2026

Add diffusers to CI docker file #44480

Merged

ebezzam and others added 2 commits March 6, 2026 10:44

Merge branch 'main' into main

b2d06e3

Update generation usage with latest merge.

9501e9f

		def is_diffusers_available() -> bool:
		return _is_package_available("diffusers")

		- [bezzam/VibeVoice-1.5B-hf](https://huggingface.co/bezzam/VibeVoice-1.5B-hf)
		- [bezzam/VibeVoice-7B-hf](https://huggingface.co/bezzam/VibeVoice-7B-hf)

Conversation

pengzhiliang commented Aug 29, 2025

What does this PR do?

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fakerybakery commented Sep 6, 2025

fakerybakery commented Oct 15, 2025

ebezzam commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

CI Results

Commit Info

ebezzam commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

CI Results

Commit Info

Model CI Report

ebezzam commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

github-actions bot commented Mar 4, 2026

CI Results

Commit Info

ebezzam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam Mar 5, 2026

Choose a reason for hiding this comment

ebezzam Mar 5, 2026

Choose a reason for hiding this comment

ebezzam Mar 6, 2026

Choose a reason for hiding this comment

ebezzam Mar 5, 2026

Choose a reason for hiding this comment

ebezzam Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

ebezzam commented Mar 6, 2026

ebezzam commented Mar 6, 2026

github-actions bot commented Mar 6, 2026

github-actions bot commented Mar 6, 2026

github-actions bot commented Mar 6, 2026

CI Results

Commit Info

Labels

7 participants

ebezzam left a comment •

edited

Loading

ebezzam Mar 5, 2026 •

edited

Loading