Skip to content

Update speculator config & converter to support hidden states indexing#142

Merged
shanjiaz merged 27 commits intomainfrom
hz-update-config
Oct 9, 2025
Merged

Update speculator config & converter to support hidden states indexing#142
shanjiaz merged 27 commits intomainfrom
hz-update-config

Conversation

@shanjiaz
Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz commented Sep 29, 2025

Changes:

  • Added support for optional arguments eagle_aux_hidden_state_layer_ids and inference_type.
  • Added more robust logic for target_vocab_size. We default on using "t2d" length, if not available, load the config file of verifier model, recursively search the dict for vocab_size. (The search is needed for nested dict. e.g. target_config_dict["text_config"]["vocab_size"] )
  • Removed tests for adding verifier embeddings as it's handled on the vllm side now.
  • Removed forward pass tests since forward function is defined on the vllm side.

Command used:

speculators convert nvidia/Llama-4-Maverick-17B-128E-Eagle3 \
  --algorithm eagle3 \
  --verifier RedHatAI/Llama-4-Maverick-17B-128E-Instruct-quantized.w4a16 \
  --output-path Llama4-Maverick-Eagle3-Speculators \
  --validate-device cuda:0 \
  --algorithm-kwargs '{"eagle_aux_hidden_state_layer_ids": [1,23,44], "inference_type": "text"}'

Converted checkpoint:

shanjiaz/Llama4-Maverick-Eagle3-Speculators-converted

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 29, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/18381300407/artifacts/4227990755.
They will be retained for up to 30 days.
Commit: 1f84913

shanjiaz and others added 4 commits September 30, 2025 11:34
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz marked this pull request as ready for review October 1, 2025 01:28
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested a review from rahul-tuli October 3, 2025 17:01
Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good, but now since we are removing the forward pass through the model, does it still make sense to keep the --validate/ --validate-device arguments?

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested a review from rahul-tuli October 6, 2025 15:12
fynnsu
fynnsu previously approved these changes Oct 6, 2025
Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple comments

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
rahul-tuli
rahul-tuli previously approved these changes Oct 7, 2025
Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions/nits which can be addressed in a follow up, good work on this, LGTM once we raise the NotImplementedError for forward passes

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
rahul-tuli
rahul-tuli previously approved these changes Oct 7, 2025
Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

dsikka
dsikka previously requested changes Oct 7, 2025
Copy link
Copy Markdown
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test cases for multiple decoder layers?

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested review from dsikka and rahul-tuli October 8, 2025 16:03
fynnsu
fynnsu previously approved these changes Oct 8, 2025
Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question below which might require a fix.

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@rahul-tuli
Copy link
Copy Markdown
Collaborator

LGTM pending quality!

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested a review from fynnsu October 9, 2025 16:26
Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shanjiaz shanjiaz dismissed dsikka’s stale review October 9, 2025 19:44

Added tests and review has been addressed.

@shanjiaz shanjiaz merged commit 8af566f into main Oct 9, 2025
12 checks passed
@shanjiaz shanjiaz deleted the hz-update-config branch October 9, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants