Skip to content

Add LAQ NVFP4 export support#1847

Draft
realAsma wants to merge 1 commit into
asma/laq-algorithmfrom
asma/laq-export-support
Draft

Add LAQ NVFP4 export support#1847
realAsma wants to merge 1 commit into
asma/laq-algorithmfrom
asma/laq-export-support

Conversation

@realAsma

@realAsma realAsma commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds LAQ export support for NVFP4 unified HF export.

  • Export packs FP4 weights with the LAQ pre scale and serializes the LAQ post scale for dequantization.
  • Supports LAQ learnable modes covered by the LAQ config/tests, including tied pre/post and quantize_pre_scale=False.
  • Carries the LAQ quantize_pre_scale config plumbing and dtype/FSDP2 test updates from the LAQ branch worktree base.

Testing

  • pre-commit run --files modelopt/torch/export/quant_utils.py modelopt/torch/export/unified_export_hf.py modelopt/torch/quantization/config.py modelopt/torch/quantization/model_calib.py modelopt/torch/quantization/nn/modules/tensor_quantizer.py tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py tests/unit/torch/quantization/test_laq.py tests/gpu/torch/quantization/test_laq_cuda.py tests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.py tests/gpu/torch/quantization/test_fsdp2.py
  • pytest_pwd tests/unit/torch/quantization/test_laq.py tests/unit/torch/quantization/test_nvfp4_static_export_cpu.py -q
  • Qwen3-8B LAQ export validation on omniml-a8.nvidia.com, CUDA_VISIBLE_DEVICES=1,2,3,4, recipe nvfp4_laq_post_unquantized_pre_scale-mse_init-fp8_kv.yml with quantize_pre_scale: false: exported config.json, hf_quant_config.json, .quant_summary.txt, tokenizer files, and one 5.2 GB model.safetensors shard under /home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/agent_art/laq_qwen3_8b_export_a8_20260628T1815/qwen3_8b_laq_post_unquantized_pre_scale_export.
  • Reran the same Qwen3-8B LAQ export on omniml-a8.nvidia.com, CUDA_VISIBLE_DEVICES=1,2,3,4, after source ~/.bashrc and conda activate black; log confirms python=/home/akuriparambi/anaconda3/envs/black/bin/python, modelopt_file=/home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/modelopt/__init__.py, and pythonpath_head=/home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export. Export succeeded with hf_quant_config.json, .quant_summary.txt, tokenizer/config files, and one 5.5 GB model.safetensors under /home/scratch.akuriparambi_coreai/Model-Optimizer-LAQ-export/agent_art/laq_qwen3_8b_export_a8_black_20260629T0005/qwen3_8b_laq_post_unquantized_pre_scale_export.

Notes

The first a8 attempt reused a stale shared modelopt_cuda_ext_fp8 Torch extension cache and failed with CUDA error: no kernel image is available for execution on the device. Passing runs used artifact-local TORCH_EXTENSIONS_DIR values and TORCH_CUDA_ARCH_LIST=8.9 so the FP8 extension rebuilt for Ada.

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7263985c-fe05-4e89-82a2-c9cc999209af

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch asma/laq-export-support

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant