Puzzletron tutorial fixes for runtime optimization#1803
Puzzletron tutorial fixes for runtime optimization#1803grzegorz-k-karch wants to merge 4 commits into
Conversation
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
📝 WalkthroughWalkthroughAdds ChangesvLLM GPU Memory Utilization Support
Llama-3.1-8B pruneffn Runtime YAML Configs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1803 +/- ##
==========================================
- Coverage 62.88% 61.55% -1.34%
==========================================
Files 511 514 +3
Lines 56634 60573 +3939
==========================================
+ Hits 35615 37284 +1669
- Misses 21019 23289 +2270
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/puzzletron/subblock_stats/runtime_utils.py (1)
91-114: 📐 Maintainability & Code Quality | 🔵 TrivialAdd return type annotation and document (or parameterize) hardcoded Llama architecture assumption.
Return type hint: Add
-> Noneto the function signature (line 93). The function has no explicit return statement.Hardcoded
base_architecture: Line 107 unconditionally setsbase_architecture = "LlamaForCausalLM". This module is Llama-specific (importsLlamaForCausalLMandLlamaModelDescriptor), so the hardcoding appears intentional. Either add a docstring note clarifying this function is Llama-specific, or acceptbase_architectureas a parameter if broader model support is planned.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py` around lines 91 - 114, The function convert_config_to_vllm_anymodel is missing a return type annotation and has a hardcoded assumption about the model architecture. First, add the return type hint -> None to the function signature since the function does not explicitly return any value. Second, address the hardcoded base_architecture assignment that unconditionally sets it to "LlamaForCausalLM". Either add documentation in the function's docstring to clarify that this function is Llama-specific and explain why the architecture is hardcoded, or alternatively, parameterize the base_architecture by accepting it as an optional function parameter with a default value to allow for broader model support in the future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`:
- Around line 91-114: The function convert_config_to_vllm_anymodel is missing a
return type annotation and has a hardcoded assumption about the model
architecture. First, add the return type hint -> None to the function signature
since the function does not explicitly return any value. Second, address the
hardcoded base_architecture assignment that unconditionally sets it to
"LlamaForCausalLM". Either add documentation in the function's docstring to
clarify that this function is Llama-specific and explain why the architecture is
hardcoded, or alternatively, parameterize the base_architecture by accepting it
as an optional function parameter with a default value to allow for broader
model support in the future.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c26ffda4-6f74-455d-a721-7a5ed0be45e2
📒 Files selected for processing (10)
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yamlmodelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.pymodelopt/torch/puzzletron/subblock_stats/runtime_utils.pymodelopt/torch/puzzletron/subblock_stats/runtime_vllm.py
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
|
|
||
| config = SimpleNamespace(**config_data) | ||
| config.architectures = ["AnyModel"] | ||
| config.base_architecture = "LlamaForCausalLM" |
There was a problem hiding this comment.
Why hard-coding to llama and no other model?
There was a problem hiding this comment.
that will be the next step to extent support to other models - working on now;
for this iteration only llama-based models are supported
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
What does this PR do?
Type of change: Bug fix
Fixes some issues related to runtime optimization
validate_model_defaults not founderror - runtime optimization has now its own separate config files instead of reusing memory optimization filesUsage
(does not apply)
Testing
Tested by running the whole pipeline as described in the tutorial
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Summary by CodeRabbit
Release Notes
New Features
Configuration