Skip to content

add multimodal preprocessing support#344

Open
shx2005 wants to merge 2 commits intovllm-project:mainfrom
shx2005:multimodal-support-v2
Open

add multimodal preprocessing support#344
shx2005 wants to merge 2 commits intovllm-project:mainfrom
shx2005:multimodal-support-v2

Conversation

@shx2005
Copy link
Copy Markdown

@shx2005 shx2005 commented Mar 13, 2026

Purpose

Enable multimodal-aware(currently single image-text pair) preprocessing for the EAGLE offline data-generation pipeline while preserving backward compatibility for existing text-only workflows.

Description

This PR adds multimodal preprocessing support and refactors related preprocessing internals:

  • Added --multimodal flag to offline generation entrypoint and passed it through to preprocessing.
  • Added helpers for:
    • multimodal content normalization,
    • image reference extraction,
    • singleton batch flattening with clearer error context.
  • Generalized assistant-mask/pattern logic to work with either tokenizer or processor callers.
  • Kept compatibility for existing callers:
    • is_multimodal=None defaults to text mode.
  • Updated tests to new helper/function names and added a regression test for default text fallback behavior.

Related Issue

#290

Tests

  • Ran an existing pure-text Qwen example end-to-end to ensure this change does not affect the original text-only path.
  • Ran training in my adapted VL code path to verify multimodal preprocessing outputs are consumable and behavior is correct.

Notes:

  • The data generation and training workflow in speculators will be updated soon and this PR intentionally focuses on organizing and stabilizing the preprocessing part only.

I have filled in:

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan/results, such as providing test command and pasting the results.
  • (Optional) The necessary documentation update.
  • I (a human) have written or reviewed the code in this pr to the best of my ability.
Copilot AI review requested due to automatic review settings March 13, 2026 07:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shx2005 This looks good to me! Can you make sure to rebase off of main once this PR lands? Since this PR refactors preprocessing. Thank you so much for your patience and contribution!

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 26, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to address the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/speculators/blob/main/CONTRIBUTING.md

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @shx2005.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 26, 2026
@shx2005
Copy link
Copy Markdown
Author

shx2005 commented Mar 27, 2026

@shanjiaz Thanks so much for your review! I saw that the other PR has already landed, so I’ll update this one based on the latest main.

Signed-off-by: Haoxiang Sun <shx2005@126.com>
@shx2005
Copy link
Copy Markdown
Author

shx2005 commented Mar 27, 2026

I’ve rebased onto the latest PR and also fixed a missing dtype argument in scripts/train.py, which could cause dtype mismatches during matrix multiplication.

I ran tox and verified against the updated train.py in the new PR that this does not affect the current logic.

I haven’t re-tested the image-text pair path yet because the training logic has changed. I’m planning to finish the remaining image-text support over the next few weeks and then test them together.

Thank you again for your patience and for all the guidance throughout this process!

Signed-off-by: Haoxiang Sun <shx2005@126.com>
@shx2005 shx2005 force-pushed the multimodal-support-v2 branch from 0a5d18c to 56c47b5 Compare March 30, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants