Puzzletron tutorial fixes for runtime optimization by grzegorz-k-karch · Pull Request #1803 · NVIDIA/Model-Optimizer

grzegorz-k-karch · 2026-06-23T09:31:49Z

What does this PR do?

Type of change: Bug fix

Fixes some issues related to runtime optimization

Solved OOM - fix: reduced GPU memory utilization
Correctly export AnyModel config for vLLM - use namespace instead of dict to correctly read config
Fixed validate_model_defaults not found error - runtime optimization has now its own separate config files instead of reusing memory optimization files

Usage

(does not apply)

Testing

Tested by running the whole pipeline as described in the tutorial

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: ❌
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive model pruning configuration suite supporting attention, FFN, and hidden dimension optimization strategies
- Introduced GPU memory utilization controls for efficient runtime benchmarking
- Added vLLM model configuration conversion utilities for model validation
Configuration
- Updated validation defaults for improved evaluation consistency

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

copy-pr-bot · 2026-06-23T09:31:53Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-23T09:31:58Z

📝 Walkthrough

Walkthrough

Adds gpu_memory_utilization support to RuntimeConfig and the vLLM benchmark command, introduces a convert_config_to_vllm_anymodel helper in runtime_utils.py, and introduces a full set of runtime pruning and validation YAML configs for the Llama-3.1-8B pruneffn experiment, including attention, FFN, and hidden-dimension pruning strategies.

Changes

vLLM GPU Memory Utilization Support

Layer / File(s)	Summary
RuntimeConfig field and calc_runtime_stats wiring `modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`, `modelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.py`	Adds `gpu_memory_utilization: float = 0.5` to `RuntimeConfig` with co-resident benchmarking documentation, adds imports for `SimpleNamespace`, `mprint`, and `convert_block_configs_to_per_layer_config`, and wires the field from `runtime_stats_config` into the `RuntimeConfig` constructor call.
convert_config_to_vllm_anymodel helper and runtime_vllm.py update `modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`, `modelopt/torch/puzzletron/subblock_stats/runtime_vllm.py`	Introduces `convert_config_to_vllm_anymodel` which loads `config.json`, wraps it in `SimpleNamespace`, sets `architectures`/`base_architecture`, runs `convert_block_configs_to_per_layer_config` with `mprint` logging, and writes the result. Updates `runtime_vllm.py` to use `SimpleNamespace(**config)` + `vars()` serialization and appends `--gpu-memory-utilization` to the vLLM subprocess command.

Llama-3.1-8B pruneffn Runtime YAML Configs

Layer / File(s)	Summary
Validation model and solution defaults `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yaml`, `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yaml`, `examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml`	Adds `validate_model_defaults.yaml` (bf16, block sizing, dataset column/name, disabled optional behaviors, `load_dataset_fn` reference) and `validate_solutions_defaults.yaml` (Hydra defaults composition, `skip_validation`, `save_models`, `bigger_is_better`, ablation sort flag). Fixes `val_dataset_name` from `validation` to `valid` in the memory config.
Pruning defaults base config `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yaml`	Adds `pruning_defaults.yaml` with `validate_model_defaults` include, model/experiment placeholders, eval/dataset settings, pruned checkpoint output path, and three parameterized pruning mode sections (FFN, KV-heads, hidden-dimension) with explicit initialization mode and activation-log wiring.
Attention, FFN, and hidden-dim pruning strategy configs `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yaml`, `.../ffn_pruning.yaml`, `.../hidden_dim_pruning.yaml`	Adds `attn_pruning.yaml` (`IndependentKvHeadContributionHook`, `KVHeadsPruningMixIn`, `PruneKVHeads` gqa init, `n_heads_in_group_list`), `ffn_pruning.yaml` (`IterativeChannelContributionHook`, `FFNIntermediatePruningMixIn`, `intermediate_size_list`, `PruneByActivationsLog`), and `hidden_dim_pruning.yaml` (`layer_norm_contribution` hook, `hidden_size_list`, multi-mode init settings).
Top-level Llama-3_1-8B.yaml defaults update `examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yaml`	Rewires Hydra `defaults` for `pruning`, `scoring`, and `realize_model` to reference `../validate_solutions_defaults` instead of `../llama-3_1-8B_pruneffn_memory` entries, and reduces `scoring.eval_samples` from `128` to `16`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

meenchen
kevalmorabia97
ChenhanYu

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	The PR adds two `# nosec` comments in runtime_vllm.py (lines 30 and 105) to bypass Bandit security checks (B404 and B603). Per SECURITY.md line 164, "# nosec comments are not allowed" as a bypass f...	Remove the `# nosec B404` and `# nosec B603` comments from runtime_vllm.py. If subprocess usage is genuinely required, add inline comments explaining why it is safe and request review from `@NVIDIA/modelopt-setup-codeowners` per SECURITY.m...

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: fixing Puzzletron tutorial issues related to runtime optimization by adding new config files and addressing OOM/config export issues.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gkarch/puzzletron-tutorial-fixes

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-23T09:36:16Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1803/
Built to branch `gh-pages` at 2026-06-25 20:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-23T09:41:34Z

Codecov Report

❌ Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.55%. Comparing base (c3b913b) to head (b86f13c).
��️ Report is 17 commits behind head on main.

Files with missing lines	Patch %	Lines
...t/torch/puzzletron/subblock_stats/runtime_utils.py	0.00%	20 Missing ⚠️
...pt/torch/puzzletron/subblock_stats/runtime_vllm.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1803      +/-   ##
==========================================
- Coverage   62.88%   61.55%   -1.34%     
==========================================
  Files         511      514       +3     
  Lines       56634    60573    +3939     
==========================================
+ Hits        35615    37284    +1669     
- Misses      21019    23289    +2270

Flag	Coverage Δ
unit	`54.60% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/puzzletron/subblock_stats/runtime_utils.py (1)
91-114: 📐 Maintainability & Code Quality | 🔵 Trivial

Add return type annotation and document (or parameterize) hardcoded Llama architecture assumption.

Return type hint: Add -> None to the function signature (line 93). The function has no explicit return statement.

Hardcoded base_architecture: Line 107 unconditionally sets base_architecture = "LlamaForCausalLM". This module is Llama-specific (imports LlamaForCausalLM and LlamaModelDescriptor), so the hardcoding appears intentional. Either add a docstring note clarifying this function is Llama-specific, or accept base_architecture as a parameter if broader model support is planned.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py` around lines 91 -
114, The function convert_config_to_vllm_anymodel is missing a return type
annotation and has a hardcoded assumption about the model architecture. First,
add the return type hint -> None to the function signature since the function
does not explicitly return any value. Second, address the hardcoded
base_architecture assignment that unconditionally sets it to "LlamaForCausalLM".
Either add documentation in the function's docstring to clarify that this
function is Llama-specific and explain why the architecture is hardcoded, or
alternatively, parameterize the base_architecture by accepting it as an optional
function parameter with a default value to allow for broader model support in
the future.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/torch/puzzletron/subblock_stats/runtime_utils.py`:
- Around line 91-114: The function convert_config_to_vllm_anymodel is missing a
return type annotation and has a hardcoded assumption about the model
architecture. First, add the return type hint -> None to the function signature
since the function does not explicitly return any value. Second, address the
hardcoded base_architecture assignment that unconditionally sets it to
"LlamaForCausalLM". Either add documentation in the function's docstring to
clarify that this function is Llama-specific and explain why the architecture is
hardcoded, or alternatively, parameterize the base_architecture by accepting it
as an optional function parameter with a default value to allow for broader
model support in the future.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c26ffda4-6f74-455d-a721-7a5ed0be45e2

📥 Commits

Reviewing files that changed from the base of the PR and between c3b913b and 8b02d7a.

📒 Files selected for processing (10)

examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/Llama-3_1-8B.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/attn_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/ffn_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/hidden_dim_pruning.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/pruning/pruning_defaults.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_model_defaults.yaml
examples/puzzletron/configs/llama-3_1-8B_pruneffn_runtime/validate_solutions_defaults.yaml
modelopt/torch/puzzletron/subblock_stats/calc_runtime_stats.py
modelopt/torch/puzzletron/subblock_stats/runtime_utils.py
modelopt/torch/puzzletron/subblock_stats/runtime_vllm.py

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

kevalmorabia97 · 2026-06-23T11:39:18Z

+
+    config = SimpleNamespace(**config_data)
+    config.architectures = ["AnyModel"]
+    config.base_architecture = "LlamaForCausalLM"


Why hard-coding to llama and no other model?

that will be the next step to extent support to other models - working on now;
for this iteration only llama-based models are supported

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

grzegorz-k-karch added 2 commits June 15, 2026 12:09

adding some fixes for puzzletron/runtime tutorial

a724afd

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Merge branch 'main' into gkarch/puzzletron-tutorial-fixes

8b02d7a

grzegorz-k-karch self-assigned this Jun 23, 2026

grzegorz-k-karch marked this pull request as ready for review June 23, 2026 09:33

grzegorz-k-karch requested a review from a team as a code owner June 23, 2026 09:33

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 23, 2026

View reviewed changes

validation->valid in llama 3.1 config

ff55eee

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

kevalmorabia97 reviewed Jun 23, 2026

View reviewed changes

Comment thread examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml

kevalmorabia97 reviewed Jun 23, 2026

View reviewed changes

reverting unnecessary change

b86f13c

Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Puzzletron tutorial fixes for runtime optimization#1803

Puzzletron tutorial fixes for runtime optimization#1803
grzegorz-k-karch wants to merge 4 commits into
mainfrom
gkarch/puzzletron-tutorial-fixes

grzegorz-k-karch commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

copy-pr-bot Bot commented Jun 23, 2026

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks failed

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-25 20:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot left a comment

Uh oh!

kevalmorabia97 Jun 23, 2026

grzegorz-k-karch Jun 25, 2026

Labels

2 participants

Uh oh!

Conversation

grzegorz-k-karch commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

copy-pr-bot Bot commented Jun 23, 2026

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error)

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-25 20:13 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Jun 23, 2026

Choose a reason for hiding this comment

grzegorz-k-karch Jun 25, 2026

Choose a reason for hiding this comment

Labels

2 participants

grzegorz-k-karch commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-25 20:13 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 23, 2026 •

edited

Loading