Conversation
ebezzam
left a comment
There was a problem hiding this comment.
@pengzhiliang thanks for the PR! This is an exciting model to add 🔥
My first comments are mainly on rearranging content to be consistent with the other models in Transformers, and creating a modular file to better optimize copying components from other models in Transformers.
There are also some other files to modify:
- in
src/transformers/models/auto - in
docs - and eventually some tests (for which a lot of code can be copied from other models)
As an example of typical files to create/modify, you can check out the Qwen2.5-Omni PR, which is also multimodal.
|
Hopefully this PR can get merged, since the deletion of the VibeVoice repo not sure the original authors can continue contributing to this PR |
|
Glad to see that someone has picked this PR up, thanks 🤗 |
|
run-slow: vibevoice |
|
This comment contains models: ["models/vibevoice"] |
|
run-slow: vibevoice, vibevoice_acoustic_tokenizer |
|
This comment contains models: ["models/vibevoice", "models/vibevoice_acoustic_tokenizer"] |
CI ResultsCommit Info
Model CI Report❌ 2 new failed tests from this PR 😭
|
|
run-slow: vibevoice |
|
Workflow Run ⚙️💔 This comment contains |
|
This comment contains models: ["models/vibevoice"] |
There was a problem hiding this comment.
Self-review for current version after merging changes from VibeVoice ASR.
Draft model page: https://huggingface.co/bezzam/VibeVoice-1.5B-hf
(7B would be very similar)
| if noise_scheduler is None: | ||
| raise ValueError( | ||
| "VibeVoice generation requires a `noise_scheduler` to be provided, e.g., " | ||
| "`diffusers.DPMSolverMultistepScheduler(beta_schedule='squaredcos_cap_v2', prediction_type='v_prediction')`." | ||
| ) | ||
| if not ( | ||
| hasattr(noise_scheduler, "set_timesteps") | ||
| and hasattr(noise_scheduler, "step") | ||
| and hasattr(noise_scheduler, "timesteps") | ||
| ): | ||
| raise ValueError( | ||
| f"The provided noise_scheduler ({type(noise_scheduler).__name__}) is not compatible with VibeVoice " | ||
| "generation. It must implement `set_timesteps` and `step` methods, and have a `timesteps` attribute." | ||
| ) |
There was a problem hiding this comment.
Current version doesn't default to creating a noise scheduler to avoid imports from diffusers, but it does encourage to use one from diffusers, as it has the expected methods and attributes.
| "parameterized", | ||
| "psutil", | ||
| "dill", | ||
| "diffusers", # Needed for VibeVoice TTS integration tests |
There was a problem hiding this comment.
Seeing with @ydshieh about installing diffusers on the CI for the slow tests
| def is_diffusers_available() -> bool: | ||
| return _is_package_available("diffusers") |
There was a problem hiding this comment.
needed for require_diffusers in testing utils (for VibeVoice integration tests)
| - [bezzam/VibeVoice-1.5B-hf](https://huggingface.co/bezzam/VibeVoice-1.5B-hf) | ||
| - [bezzam/VibeVoice-7B-hf](https://huggingface.co/bezzam/VibeVoice-7B-hf) |
There was a problem hiding this comment.
TODO: change with final checkpoint in docs, testing, and the model card
|
run-slow: vibevoice |
|
run-slow: vibevoice |
|
This comment contains models: ["models/vibevoice"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, csm, vibevoice, vibevoice_acoustic_tokenizer |
What does this PR do?
Merge the model from https://github.com/microsoft/VibeVoice/tree/main
HF:
https://huggingface.co/microsoft/VibeVoice-1.5B