Skip to content

[Tests][Eagle3] Extend vLLM test cases with conversion step#93

Merged
dsikka merged 6 commits intomainfrom
extend_vllm_test
Aug 27, 2025
Merged

[Tests][Eagle3] Extend vLLM test cases with conversion step#93
dsikka merged 6 commits intomainfrom
extend_vllm_test

Conversation

@dsikka
Copy link
Copy Markdown
Collaborator

@dsikka dsikka commented Aug 25, 2025

Summary

  • Updates vLLM token assertion
  • Updates how vllm is detected, if installed
  • Extends the vLLM test flow to add an additional test with conversion
  • Converts a speculators model and a model from the Eagle3 repo
  • Runs the converted models in vllm and asserts for valid tokens
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 25, 2025

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/17244541644/artifacts/3855588296.
They will be retained for up to 30 days.
Commit: d99db49

@dsikka dsikka marked this pull request as ready for review August 25, 2025 18:34
@dsikka dsikka requested review from markurtz and rahul-tuli August 25, 2025 18:34
@dsikka dsikka changed the title [Tests] Extend vLLM test cases with conversion step Aug 25, 2025
Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Copy Markdown
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major issue that I left as a comment on the code, otherwise there are several minor cleanups that we could get in for this or for a later PR:

  • The sampling params use temperature and top_p which causes extra, unneeded variability in the tests
  • The test is asserting that the number of returned tokens has to be 20. Should be updated to >1 and <=20 or we should null out the stop tokens so it's forced to go to 20
  • There's no cleanup of the vLLM resources (del llm, gc.collect, empty_cache, etc). A parameterized fixture with a yield would ensure both setup and teardown always run
  • Currently only checking for vLLM imports are available, we likely would want to put an extra case on there for torch.cuda / GPU availability so we don't waist resources
  • We could require in the dev environments hf_transfer and set HF_HUB_ENABLE_HF_TRANSFER for faster downloads
  • _run_vllm_engine runs as a helper, so any failures within there are not going to surface properly with pytest
  • Enabling a test config/settings longer term would be good so we can pull down things like gpu_memory_utilization and have a central place to update them for workers running tests
  • Base comparisons / interface comparisons for what the output from conversion should look like before running in vLLM
Copy link
Copy Markdown
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one of the pytest skips in the original review, approving with a note towards the potential future improvements we should get onto the backlog

@dsikka dsikka merged commit b5030c5 into main Aug 27, 2025
12 checks passed
@dsikka dsikka deleted the extend_vllm_test branch August 27, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants