add multimodal preprocessing support by shx2005 · Pull Request #344 · vllm-project/speculators

shx2005 · 2026-03-13T07:15:44Z

Purpose

Enable multimodal-aware(currently single image-text pair) preprocessing for the EAGLE offline data-generation pipeline while preserving backward compatibility for existing text-only workflows.

Description

This PR adds multimodal preprocessing support and refactors related preprocessing internals:

Added --multimodal flag to offline generation entrypoint and passed it through to preprocessing.
Added helpers for:
- multimodal content normalization,
- image reference extraction,
- singleton batch flattening with clearer error context.
Generalized assistant-mask/pattern logic to work with either tokenizer or processor callers.
Kept compatibility for existing callers:
- is_multimodal=None defaults to text mode.
Updated tests to new helper/function names and added a regression test for default text fallback behavior.

Related Issue

#290

Tests

Ran an existing pure-text Qwen example end-to-end to ensure this change does not affect the original text-only path.
Ran training in my adapted VL code path to verify multimodal preprocessing outputs are consumable and behavior is correct.

Notes:

The data generation and training workflow in speculators will be updated soon and this PR intentionally focuses on organizing and stabilizing the preprocessing part only.

I have filled in:

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan/results, such as providing test command and pasting the results.
(Optional) The necessary documentation update.
I (a human) have written or reviewed the code in this pr to the best of my ability.

Copilot

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

shanjiaz

Hi @shx2005 This looks good to me! Can you make sure to rebase off of main once this PR lands? Since this PR refactors preprocessing. Thank you so much for your patience and contribution!

mergify · 2026-03-26T20:10:10Z

The quality checks have failed. Please run make style and make quality under
the root directory to address the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/speculators/blob/main/CONTRIBUTING.md

mergify · 2026-03-26T20:10:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @shx2005.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

shx2005 · 2026-03-27T00:23:03Z

@shanjiaz Thanks so much for your review! I saw that the other PR has already landed, so I’ll update this one based on the latest main.

Signed-off-by: Haoxiang Sun <shx2005@126.com>

shx2005 · 2026-03-27T06:09:51Z

I’ve rebased onto the latest PR and also fixed a missing dtype argument in scripts/train.py, which could cause dtype mismatches during matrix multiplication.

I ran tox and verified against the updated train.py in the new PR that this does not affect the current logic.

I haven’t re-tested the image-text pair path yet because the training logic has changed. I’m planning to finish the remaining image-text support over the next few weeks and then test them together.

Thank you again for your patience and for all the guidance throughout this process！

Signed-off-by: Haoxiang Sun <shx2005@126.com>

Copilot AI review requested due to automatic review settings March 13, 2026 07:15

Copilot started reviewing on behalf of shx2005 March 13, 2026 07:20 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

shanjiaz reviewed Mar 26, 2026

View reviewed changes

mergify bot added the quality-failed label Mar 26, 2026

mergify bot added the needs-rebase label Mar 26, 2026

add multimodal preprocessing support

abcae25

Signed-off-by: Haoxiang Sun <shx2005@126.com>

shx2005 force-pushed the multimodal-support-v2 branch from 7214735 to abcae25 Compare March 27, 2026 00:59

mergify bot removed needs-rebase quality-failed labels Mar 27, 2026

Fix hidden state dtype propagation and line wrapping

56c47b5

Signed-off-by: Haoxiang Sun <shx2005@126.com>

shx2005 force-pushed the multimodal-support-v2 branch from 0a5d18c to 56c47b5 Compare March 30, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multimodal preprocessing support#344

add multimodal preprocessing support#344
shx2005 wants to merge 2 commits intovllm-project:mainfrom
shx2005:multimodal-support-v2

shx2005 commented Mar 13, 2026

Copilot AI left a comment

shanjiaz left a comment

mergify bot commented Mar 26, 2026

mergify bot commented Mar 26, 2026

shx2005 commented Mar 27, 2026 •

edited

Loading

shx2005 commented Mar 27, 2026

Labels

3 participants

Conversation

shx2005 commented Mar 13, 2026

Purpose

Description

Related Issue

Tests

Copilot AI left a comment

Choose a reason for hiding this comment

shanjiaz left a comment

Choose a reason for hiding this comment

mergify bot commented Mar 26, 2026

mergify bot commented Mar 26, 2026

shx2005 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

shx2005 commented Mar 27, 2026

Labels

3 participants

shx2005 commented Mar 27, 2026 •

edited

Loading