Skip to content

Implement VibeVoice #40546

Open
pengzhiliang wants to merge 220 commits intohuggingface:mainfrom
pengzhiliang:main
Open

Implement VibeVoice #40546
pengzhiliang wants to merge 220 commits intohuggingface:mainfrom
pengzhiliang:main

Conversation

@pengzhiliang
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@ebezzam ebezzam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pengzhiliang thanks for the PR! This is an exciting model to add 🔥

My first comments are mainly on rearranging content to be consistent with the other models in Transformers, and creating a modular file to better optimize copying components from other models in Transformers.

There are also some other files to modify:

  • in src/transformers/models/auto
  • in docs
  • and eventually some tests (for which a lot of code can be copied from other models)

As an example of typical files to create/modify, you can check out the Qwen2.5-Omni PR, which is also multimodal.

@fakerybakery
Copy link
Copy Markdown

Hopefully this PR can get merged, since the deletion of the VibeVoice repo not sure the original authors can continue contributing to this PR

Galigator added a commit to Galigator/transformers that referenced this pull request Oct 8, 2025
@fakerybakery
Copy link
Copy Markdown

Glad to see that someone has picked this PR up, thanks 🤗

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 4, 2026

run-slow: vibevoice

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 70e4b7f8 workflow commit (merge commit)
PR 6c2526d9 branch commit (from PR)
main 7235d442 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 4, 2026

run-slow: vibevoice, vibevoice_acoustic_tokenizer

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice", "models/vibevoice_acoustic_tokenizer"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 4b9d46a1 workflow commit (merge commit)
PR 6f1fbf61 branch commit (from PR)
main 30c48016 base commit (on main)

Model CI Report

2 new failed tests from this PR 😭

  • vibevoice:
    tests/models/vibevoice/test_modeling_vibevoice.py::VibeVoiceForConditionalGenerationIntegrationTest::test_1b5_inference (✅ ⟹ ❌)
    tests/models/vibevoice/test_modeling_vibevoice.py::VibeVoiceForConditionalGenerationIntegrationTest::test_1b5_inference_no_voice (✅ ⟹ ❌)
@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 4, 2026

run-slow: vibevoice

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

Workflow Run ⚙️💔 This comment contains run-slow, but unknown error occurred and the workflow run aborted!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 45934fcb workflow commit (merge commit)
PR 7ded0321 branch commit (from PR)
main fd6bc380 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

Copy link
Copy Markdown
Contributor

@ebezzam ebezzam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review for current version after merging changes from VibeVoice ASR.

Draft model page: https://huggingface.co/bezzam/VibeVoice-1.5B-hf
(7B would be very similar)

Comment on lines +87 to +100
if noise_scheduler is None:
raise ValueError(
"VibeVoice generation requires a `noise_scheduler` to be provided, e.g., "
"`diffusers.DPMSolverMultistepScheduler(beta_schedule='squaredcos_cap_v2', prediction_type='v_prediction')`."
)
if not (
hasattr(noise_scheduler, "set_timesteps")
and hasattr(noise_scheduler, "step")
and hasattr(noise_scheduler, "timesteps")
):
raise ValueError(
f"The provided noise_scheduler ({type(noise_scheduler).__name__}) is not compatible with VibeVoice "
"generation. It must implement `set_timesteps` and `step` methods, and have a `timesteps` attribute."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current version doesn't default to creating a noise scheduler to avoid imports from diffusers, but it does encourage to use one from diffusers, as it has the expected methods and attributes.

"parameterized",
"psutil",
"dill",
"diffusers", # Needed for VibeVoice TTS integration tests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing with @ydshieh about installing diffusers on the CI for the slow tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with #44480

Comment on lines +820 to +821
def is_diffusers_available() -> bool:
return _is_package_available("diffusers")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for require_diffusers in testing utils (for VibeVoice integration tests)

Comment on lines +29 to +30
- [bezzam/VibeVoice-1.5B-hf](https://huggingface.co/bezzam/VibeVoice-1.5B-hf)
- [bezzam/VibeVoice-7B-hf](https://huggingface.co/bezzam/VibeVoice-7B-hf)
Copy link
Copy Markdown
Contributor

@ebezzam ebezzam Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: change with final checkpoint in docs, testing, and the model card

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 6, 2026

run-slow: vibevoice

@ebezzam
Copy link
Copy Markdown
Contributor

ebezzam commented Mar 6, 2026

run-slow: vibevoice

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/vibevoice"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, csm, vibevoice, vibevoice_acoustic_tokenizer

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3670f8f5 workflow commit (merge commit)
PR 9501e9f6 branch commit (from PR)
main 4f91111b base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment