Tags · NVIDIA/Model-Optimizer

0.45.0rc1

chore: stop tracking .claude/scheduled_tasks.lock (#1758)

## What
Remove `.claude/scheduled_tasks.lock` from version control and add a
`.gitignore` rule so it is never committed again.

## Why
This file is an **ephemeral Claude Code scheduler lock** — its contents
are runtime process state (`sessionId`, `pid`, `procStart`,
`acquiredAt`), not source. It was accidentally committed in #1623 and is
currently tracked on `main`.

Reported by @sychen52 in [review of
#1623](#1623 (review)).

## Changes
- `git rm --cached .claude/scheduled_tasks.lock`
- Add `.claude/scheduled_tasks.lock` to `.gitignore`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated repository configuration to exclude internal runtime lock
files from version control.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Ye Yu <yeyu@nvidia.com>

Jun 23, 2026
56c1416
zip
tar.gz
Notes
Downloads

0.46.0dev

Adds AutoQuant support for VLM / Qwen3.5-Qwen3.6 style models (#1381)

### What does this PR do?

Type of change: new feature, bug fix, new tests

### Details

- Enables AutoQuant search over fused MoE expert containers by
snapshotting/restoring their per-expert quantizers.
- Adds Qwen3.5/3.6 linear-attention grouping rules so fused deployment
layers keep compatible quant formats.
- Supports `w4a16_nvfp4` as an AutoQuant search format.
- Preserves disabled AutoQuant layer patterns in generated configs while
allowing selected modules like `lm_head` to override default disables.
- Keeps recipe-mode and AutoQuantize VLM paths on the outer CausalLM so
Qwen3.5/3.6-MoE `lm_head` remains visible.
- Skips `parent_class`-scoped quant config entries during AutoQuant bare
quantizer matching, preventing class-scoped global entries from
last-match overriding every selected module.
- Adds temporary hardcoded Qwen/VLM AutoQuant disabled-layer patterns in
`hf_ptq.py` with a TODO to refactor into the config system.

### Usage

```bash
python examples/llm_ptq/hf_ptq.py \
  --pyt_ckpt_path <model_path> \
  --qformat fp8,w4a16_nvfp4 \
  --auto_quantize_bits 5.0 \
  --auto_quantize_cost_model active_moe \
  --auto_quantize_checkpoint <autoquant_state.pt> \
  --export_path <output_dir>
```

### Testing

- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest
tests/unit/torch/quantization/test_autoquant.py::test_get_auto_quantize_config_keeps_selected_lm_head_enabled
tests/unit/torch/quantization/test_config_validation.py::TestMatchQuantizerCfg::test_parent_class_scoped_entries_are_ignored_for_bare_autoquant_lookup`
- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_config_validation.py -k "not
data_parallel"` (`120 passed, 1 deselected`)
- `/Users/weimingc/miniconda3/envs/modelopt/bin/python -m py_compile
examples/llm_ptq/hf_ptq.py modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/_auto_quantize_cost.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_config_validation.py`
- Full local affected-file pytest without `-k "not data_parallel"` only
failed `test_data_parallel_auto_quantize` because this local sandbox
cannot bind a free socket (`PermissionError: Operation not permitted`).
- Ran Qwen3.6 35B AutoQuant e2e with `fp8,w4a16_nvfp4` and exported a
checkpoint.
- Verified exported checkpoint loads in vLLM nightly without local
patches.

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A

### Additional Information

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added w4a16_nvfp4 quantization format and optional cost-exclusion
patterns for AutoQuantize.

* **Improvements**
* Safer multimodal/VLM handling and AutoQuantize now runs on the full
outer model when applicable.
* Better fused-MoE support, more accurate weight accounting, and refined
attention-grouping for improved quantization choices.
  * Dynamic layer-disabling support for targeted disables.

* **Tests**
* New unit tests covering cost-model exclusions, fused-MoE accounting,
and config selection.

* **Documentation**
  * Updated cost-constraint example to show exclusion-pattern usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

Jun 8, 2026
1f4a489
zip
tar.gz

0.45.0rc0

[OMNIML-4788] specdec_bench/Qwen3.5-4B: throughput_32k benchmark + S3…

… upload step (#1564)

### What does this PR do?

Type of change: enhancement (follow-up to
[#1531](#1531)).

Extends the merged Qwen3.5-4B SPEED-Bench launcher YAMLs from a
single-task qualitative-only smoke into a **3-task pipeline** that also
covers long-context throughput and verifies the S3-upload path
end-to-end. Two commits, cleanly cherry-picked from #1531's late branch
state — they were authored after the merge-commit was resolved against
an earlier rebased head and so didn't ride along with that merge.

### Pipeline shape (both YAMLs)

| Task | Split | Save dir |
|---|---|---|
| `task_0` | qualitative (existing quality / acceptance-rate signal) |
`/scratchspace/specdec_bench{,_mtp}/qualitative` |
| `task_1` | **throughput_32k** (new — long-context throughput) |
`/scratchspace/specdec_bench{,_mtp}/throughput_32k` |
| `task_2` | **upload to S3 in sweep layout** |
`s3://team-specdec-workgroup/results/specdec_bench{,_mtp}/<split>/` |

### New artifacts

* `tools/launcher/common/specdec_bench/upload_to_s3.sh` — thin wrapper
around `examples/specdec_bench/upload_to_s3.py` so it can be invoked as
a launcher task. Installs `boto3` from `requirements.txt` on cold
containers; warm pipelines pick it up from the prior `run.sh`.
*
`tools/launcher/common/specdec_bench/runtime_params_throughput_32k.yaml`
— pins `engine_args.max_model_len = 40,960` (32K input + 4K output + 4K
headroom) so vLLM doesn't silently auto-cap `max_model_len` below the
36K minimum needed for `throughput_32k` prompts on single-GPU runs.

### Why max_model_len matters

Without an explicit `max_model_len`, vLLM auto-derives it from the model
config (Qwen3.5-4B = 128K) **and from the GPU-memory budget**. On a
single GPU the second factor can cap effective `max_model_len` well
below 36K, silently truncating 32K-token prompts and producing wrong
throughput numbers. The qualitative split is not affected (its prompts
top out around 8K, well under any auto-derivation floor) so only
`task_1` carries the override.

### S3 credentials

`upload_to_s3.sh` reads `S3_ENDPOINT` / `S3_KEY_ID` / `S3_SECRET` from
the runtime environment (not hardcoded). `--skip-existing` +
`--allow-incomplete-provenance` are passed by default so re-runs land
alongside the prior upload, and runs lacking `CONTAINER_IMAGE` (Phase-2
harness work in OMNIML-4788 will populate it) still upload.

### Testing

Cluster smoke on cw_dfw via:

```
uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3.5-4B/specdec_bench.yaml --yes
```

is currently in-flight (jobs `12257378/79/80`, PD). Will update this PR
with timing/AR numbers + S3 upload confirmation once it lands.

### Before your PR is "Ready for review"

- Backward compatible: ✅ (additive — task_0 keeps the prior qualitative
behavior, just with `/qualitative` suffix in `save_dir`)
- New PIP dep: ✅ no (boto3 already in
`examples/specdec_bench/requirements.txt` from #1531)
- New tests: N/A (launcher YAML + shell wrapper; covered by cluster
smoke)
- Changelog: N/A (internal-facing tooling)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a 32K-context runtime configuration (higher max model length) to
enable long-context throughput benchmarking and avoid silent prompt
truncation.
* Added a launcher helper to upload benchmark results to S3 with
incremental/retry-friendly options and pass/fail reporting.

* **Chores**
* Split Qwen3.5-4B benchmark into separate qualitative and 32K
throughput tasks and added coordinated S3 upload.
* Applied the same multi-task pipeline layout and clearer output
organization to the MTP speculative-decoding benchmark.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1564?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: chenhany <chenhany@nvidia.com>
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>

Jun 7, 2026
2c52e7b
zip
tar.gz
Notes
Downloads

0.44.0

fix(te-plugin): handle TE 2.15+ tuple return from `_Linear` / `_Group…

…edLinear`

TE 2.15+ changed `_Linear.forward` and `_GroupedLinear.forward` to return
`(out, new_workspace)` tuples instead of a single tensor. ModelOpt's
patched `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` still
passed the whole tuple into `self.output_quantizer`, crashing inside
`TensorQuantizer.forward` on `tuple.numel()`:

  AttributeError: 'tuple' object has no attribute 'numel'

Mirror the existing pattern from `_QuantTELayerNormLinear.forward`:
quantize only `output[0]` (activation) and pass auxiliary workspace
metadata through verbatim. TE <= 2.14 returns a single tensor and falls
through the isinstance branch unchanged.

This unblocks Megatron-Bridge's TE 2.15 path; the local
`patch_modelopt_te_linear_tuple_output` shim can be removed once this
ships in a tagged release.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

May 13, 2026
c897fbe
zip
tar.gz
Notes
Downloads

0.44.0rc5

fix(te-plugin): handle TE 2.15+ tuple return from `_Linear` / `_Group…

…edLinear`

TE 2.15+ changed `_Linear.forward` and `_GroupedLinear.forward` to return
`(out, new_workspace)` tuples instead of a single tensor. ModelOpt's
patched `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` still
passed the whole tuple into `self.output_quantizer`, crashing inside
`TensorQuantizer.forward` on `tuple.numel()`:

  AttributeError: 'tuple' object has no attribute 'numel'

Mirror the existing pattern from `_QuantTELayerNormLinear.forward`:
quantize only `output[0]` (activation) and pass auxiliary workspace
metadata through verbatim. TE <= 2.14 returns a single tensor and falls
through the isinstance branch unchanged.

This unblocks Megatron-Bridge's TE 2.15 path; the local
`patch_modelopt_te_linear_tuple_output` shim can be removed once this
ships in a tagged release.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

May 13, 2026
c897fbe
zip
tar.gz
Downloads

0.44.0rc4

fix(te-plugin): make _Linear arg indexing robust to TE signature chan…

…ges (#1473)

### What does this PR do?

Type of change: Bug fix

ModelOpt's `te_quantized_linear_fn` and `te_grouped_quantized_linear_fn`
read `weight` / `inp` from hard-coded positions in `args`. Two TE
signature changes broke this scheme:

- **TE 1.x → 2.0:** dropped the legacy `weight_fp8` slot between
`weight` and `inp`. ModelOpt handled this with an `if Version("2.0") <=
_TE_VERSION:` branch + a duplicate else branch.
- **TE 2.14 → 2.15:** inserted `weight_workspace` between `weight` and
`inp` at the `_Linear.forward` call site ([TE 2.15 linear.py
L1663](https://github.com/NVIDIA/TransformerEngine/blob/release_v2.15/transformer_engine/pytorch/module/linear.py#L1663)).
Unhandled by ModelOpt — `args[idx + 1]` resolved to `None` (workspace is
None outside FP8), which then crashed `TensorQuantizer.forward` on
`inputs.numel()` with `AttributeError: 'NoneType' object has no
attribute 'numel'`. Surfaced as a regression in Megatron-Bridge after
the TE 2.15 bump alongside ModelOpt 0.44.0rc3.
- **TE 2.10:** `_GroupedLinear.forward`'s second positional slot was
renamed `m_splits` → `non_tensor_args` (tuple wrapping). ModelOpt had a
separate `Version("2.10")` gate for this.

Replace all three version gates with **parameter-name introspection** of
the live `_Linear.forward` / `_GroupedLinear.forward` signature. The
parameter names (`weight`, `inp`, `m_splits`, `non_tensor_args`) have
been stable across TE 1.x, 2.x, and 2.15+; only their relative positions
shift. The new code reads the live signature via
`inspect.signature(...).parameters`, locates `weight`/`inp` by name, and
mutates only those positions in a list copy of `args` — everything
between (e.g. TE 2.15's `weight_workspace`) and after passes through
verbatim. The dual-branch code in `te_quantized_linear_fn` collapses to
a single path.

### Usage

No public API change. PTQ continues to work transparently across all
supported TE versions:

```python
import modelopt.torch.quantization as mtq
# Works on TE 1.x, 2.0-2.14, 2.15.x, and 2.16+ — no version flag needed.
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)
```

### Testing
<!-- Mention how have you tested your change if applicable. -->

Existing TE plugin tests
(`tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py`)
exercise both the `_forward` (no-grad calibration) and `_apply`
(grad-enabled training) paths of `te_quantized_linear_fn` for
`te.pytorch.Linear` — they would have caught the original TE 2.15
regression on a CI matrix entry pinned to TE 2.15. Verified trace
correctness across:

| TE version | `_Linear.forward` signature | `_te_linear` weight→inp gap
| `_GroupedLinear.forward` second slot |
|---|---|---|---|
| 1.x | `(ctx, weight, weight_fp8, inp, …)` | 1 | n/a |
| 2.0–2.14 | `(ctx, weight, inp, bias, …)` | 0 | `m_splits` |
| 2.15.x | `(ctx, weight, weight_workspace, inp, …)` | 1 |
`non_tensor_args` |
| 2.16+ (main) | `(ctx, weight, inp, bias, fwd_args)` | 0 |
`non_tensor_args` |

### Before your PR is "*Ready for review*"

Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).

Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).

- Is this change backward compatible?: ✅ <!--- Public API unchanged;
broadens the range of TE versions that work (TE 2.15.x now supported, TE
1.x still supported via the same introspection path). -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A <!--- Only
adds a stdlib `inspect` import. -->
- Did you write any new necessary tests?: Existing tests sufficient
<!--- Bug fix is covered by existing `test_transformer_engine.py` for
whatever single TE version CI exercises. A multi-version TE matrix is
the right next step but is out of scope for this PR. -->

### Additional Information
<!-- E.g. related issue. -->
Triggered by Megatron-Bridge
NVIDIA-NeMo/Megatron-Bridge#3783 failing tests
after bumping ModelOpt 0.44.0rc2 → 0.44.0rc3 together with a Megatron-LM
bump that pulls TE 2.15. ModelOpt rc2 had the same latent bug — it just
wasn't exercised until TE 2.15 became the runtime version.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved Transformer Engine quantization plugin robustness by using
runtime parameter inspection instead of version-based branching,
ensuring compatibility across TE versions without requiring manual
updates.

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/Model-Optimizer/pull/1473)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

May 12, 2026
50e112e
zip
tar.gz
Downloads

0.44.0rc3

Add Deprecation warning for GradNAS

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

May 11, 2026
1b5d448
zip
tar.gz
Notes
Downloads

0.44.0rc2

[Cherry-pick] PRs #1352 #1351 #1330 #1354 #1355 #1360 #1342 #1324 #1340 

#1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 (#1385)

## Cherry-picked PRs

- #1352
- #1351
- #1330
- #1354
- #1355
- #1360
- #1342
- #1324
- #1340
- #1368
- #1373
- #1359
- #1361
- #1325
- #1369
- #1370
- #1371
- #1375
- #1386
- #1353
- #1356
- #1390

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added Python 3.14 support (basic unit tests verified; production
defaults on Python 3.12)
  * Added Windows CUDA 13.x installation guidance
  * Introduced LLM ONNX export utilities with quantization support
  * Extended Medusa mode support in speculative decoding pipeline

* **Bug Fixes**
  * Fixed FP8 quantization for vision transformer multi-head attention
* Improved MoE expert handling in quantization calibration and inference
  * Enhanced ONNX graph utilities for FP8 weight transformation

* **Documentation**
* Comprehensive Minitron pruning + distillation + quantization + vLLM
tutorials with ablation studies
  * Megatron data preparation guide for tokenization workflows
  * Puzzletron distillation results and cross-reference updates

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: ynankani <ynankani@nvidia.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: vipandya <vipandya@nvidia.com>
Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Jenny Chen <jennifchen@nvidia.com>
Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
Co-authored-by: ynankani <ynankani@nvidia.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com>
Co-authored-by: dthienan-nv <dmoodie@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com>
Co-authored-by: yeyu-nvidia <yeyu@nvidia.com>
Co-authored-by: kaix-nv <kaix@nvidia.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>

May 5, 2026
cc06062
zip
tar.gz
Notes
Downloads

0.45.0dev

fix: PTQ 1GPU, export PP divisibility, hidden states conversations key (

#1293)

## Summary
- **megatron_lm_ptq.yaml**: Qwen3-8B PTQ to single GPU for L40 clusters
(TP=1, all tasks)
- **quantize.sh**: Auto-find largest PP dividing model's
`num_hidden_layers` for export step. Qwen3-8B has 36 layers which isn't
divisible by 8, causing `AssertionError` on 8-GPU nodes
- **compute_hidden_states_trtllm.py**: Use `messages` with
`conversations` fallback, matching the HF version. Fixes `KeyError:
'conversations'` when data uses OpenAI `messages` format

## Test plan
- [x] Qwen3-8B PTQ runs on single L40 GPU
- [x] Export PP auto-selects valid divisor (36 layers → PP=6 on 8 GPUs,
PP=4 on 4 GPUs, PP=1 on 1 GPU)
- [x] EAGLE3 offline pipeline reads data with `messages` field

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Dataset input handling now supports multiple field formats for
enhanced compatibility.

* **Bug Fixes**
* Optimized GPU resource allocation during model quantization with
improved pipeline parallelism computation.
* Updated quantization configuration for more efficient resource
utilization.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Apr 20, 2026
355c6b7
zip
tar.gz

0.44.0rc1

[Release-fix] Pin transformers<5.6 in release branch

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Apr 20, 2026
8d2f99f
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.45.0rc1

0.46.0dev

0.45.0rc0

0.44.0

0.44.0rc5

0.44.0rc4

0.44.0rc3

0.44.0rc2

0.45.0dev

0.44.0rc1

Uh oh!

Tags: NVIDIA/Model-Optimizer