Add LAQ NVFP4 export support by realAsma · Pull Request #1847 · NVIDIA/Model-Optimizer

realAsma · 2026-06-28T15:45:29Z

Summary

Adds LAQ export support for NVFP4 unified HF export.

Export packs FP4 weights with the LAQ pre scale and serializes the LAQ post scale for dequantization.
Supports LAQ learnable modes covered by the LAQ config/tests, including tied pre/post and quantize_pre_scale=False.
Carries the LAQ quantize_pre_scale config plumbing and dtype/FSDP2 test updates from the LAQ branch worktree base.

Testing

pre-commit run --files modelopt/torch/export/quant_utils.py modelopt/torch/export/unified_export_hf.py modelopt/torch/quantization/config.py modelopt/torch/quantization/model_calib.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py tests/unit/torch/quantization/test_laq.py tests/gpu/torch/quantization/test_laq_cuda.py tests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.py tests/gpu/torch/quantization/test_fsdp2.py
pytest_pwd tests/unit/torch/quantization/test_laq.py tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py -q
Qwen3-8B LAQ export validation on omniml-a8.nvidia.com, CUDA_VISIBLE_DEVICES=1,2,3,4, recipe nvfp4_laq_post_unquantized_pre_scale-mse_init-fp8_kv.yml with quantize_pre_scale: false: exported config.json, hf_quant_config.json, .quant_summary.txt, tokenizer files, and one 5.2 GB model.safetensors shard under /home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/agent_art/laq_qwen3_8b_export_a8_20260628T1815/qwen3_8b_laq_post_unquantized_pre_scale_export.
Reran the same Qwen3-8B LAQ export on omniml-a8.nvidia.com, CUDA_VISIBLE_DEVICES=1,2,3,4, after source ~/.bashrc and conda activate black; log confirms python=/home/akuriparambi/anaconda3/envs/black/bin/python, modelopt_file=/home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/modelopt/__init__.py, and pythonpath_head=/home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export. Export succeeded with hf_quant_config.json, .quant_summary.txt, tokenizer/config files, and one 5.5 GB model.safetensors under /home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/agent_art/laq_qwen3_8b_export_a8_black_20260629T0005/qwen3_8b_laq_post_unquantized_pre_scale_export.

Notes

The first a8 attempt reused a stale shared modelopt_cuda_ext_fp8 Torch extension cache and failed with CUDA error: no kernel image is available for execution on the device. Passing runs used artifact-local TORCH_EXTENSIONS_DIR values and TORCH_CUDA_ARCH_LIST=8.9 so the FP8 extension rebuilt for Ada.

Signed-off-by: realAsma <akuriparambi@nvidia.com>

copy-pr-bot · 2026-06-28T15:45:33Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-28T15:45:35Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7263985c-fe05-4e89-82a2-c9cc999209af

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch asma/laq-export-support

_{Comment @coderabbitai help to get the list of available commands.}

Add LAQ NVFP4 export support

30e8b91

Signed-off-by: realAsma <akuriparambi@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LAQ NVFP4 export support#1847

Add LAQ NVFP4 export support#1847
realAsma wants to merge 1 commit into
asma/laq-algorithmfrom
asma/laq-export-support

realAsma commented Jun 28, 2026 •

edited

Loading

copy-pr-bot Bot commented Jun 28, 2026

coderabbitai Bot commented Jun 28, 2026

Review skipped

Labels

1 participant

Uh oh!

Conversation

realAsma commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Notes

copy-pr-bot Bot commented Jun 28, 2026

coderabbitai Bot commented Jun 28, 2026

Review skipped

Labels

1 participant

realAsma commented Jun 28, 2026 •

edited

Loading