Skip to content

fix: add vLLM 0.16+ compatibility for multimodal processor APIs#291

Open
yubingjiaocn wants to merge 1 commit intomicrosoft:mainfrom
yubingjiaocn:fix/vllm-016-compat
Open

fix: add vLLM 0.16+ compatibility for multimodal processor APIs#291
yubingjiaocn wants to merge 1 commit intomicrosoft:mainfrom
yubingjiaocn:fix/vllm-016-compat

Conversation

@yubingjiaocn
Copy link
Copy Markdown

Problem

VibeVoice vLLM plugin fails on vLLM 0.16+ with two errors:

  1. vLLM >= 0.16:

    ValueError: BaseMultiModalProcessor._get_data_parser has been moved to BaseProcessingInfo.build_data_parser in v0.16
    
  2. vLLM >= 0.18:

    TypeError: ProcessorInputs.__init__() got an unexpected keyword argument 'mm_data'
    

Changes

Three modifications to vllm_plugin/model.py:

  1. Move _get_data_parserget_data_parser: The canonical data parser (setting target_sr=24kHz) now lives on VibeVoiceProcessingInfo.get_data_parser() instead of VibeVoiceMultiModalProcessor._get_data_parser(), matching the vLLM 0.16 API.

  2. Remove get_dummy_processor_inputs override: The custom override in VibeVoiceDummyInputsBuilder passed mm_data= to ProcessorInputs, which breaks on vLLM 0.18+ (expects mm_data_items). The base class already handles both API versions correctly, so we rely on it while keeping our get_dummy_text() and get_dummy_mm_data() methods.

  3. Add backward compat shim for vLLM < 0.16: At import time, we detect the vLLM version. For < 0.16, we monkey-patch _get_data_parser onto VibeVoiceMultiModalProcessor so the same codebase works on 0.14.x.

Additionally, docs/vllm-version-compatibility.md documents supported vLLM versions and API changes.

Backward Compatibility

  • vLLM 0.14.x: Fully supported via the version-gated monkey-patch that restores _get_data_parser on the processor class.
  • vLLM 0.16.x: Uses the new VibeVoiceProcessingInfo.get_data_parser() natively.
  • vLLM 0.18.x: Base class handles ProcessorInputs construction correctly.

Testing

  • vLLM version: 0.16.0
  • GPU: NVIDIA L40S (single card)
  • Workload: 20-minute audio file
  • Result: Completed in 197 seconds, output quality matches 0.14.x baseline

Fixes #259
Related: #231

- Move _get_data_parser from VibeVoiceMultiModalProcessor to
  VibeVoiceProcessingInfo.get_data_parser (required by vLLM >= 0.16)
- Remove get_dummy_processor_inputs override to avoid ProcessorInputs
  signature mismatch (fixes vLLM >= 0.18 compatibility)
- Add backward compat shim for vLLM < 0.16 via version detection
- Add docs/vllm-version-compatibility.md documenting supported versions

Fixes microsoft#259
Related: microsoft#231
@yubingjiaocn
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant