[Refactor] Extract model specific logics in export lib#1828
Conversation
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1828 +/- ##
==========================================
- Coverage 77.36% 77.16% -0.21%
==========================================
Files 513 534 +21
Lines 56889 57365 +476
==========================================
+ Hits 44012 44265 +253
- Misses 12877 13100 +223
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
|
||
|
|
||
| @dataclass | ||
| class ModelSpec: |
There was a problem hiding this comment.
trtllm export path is deprecated.
| @@ -60,6 +60,13 @@ | |||
| RgLruConfig, | |||
| ) | |||
| from .model_config_utils import pad_weights | |||
What does this PR do?
Type of change: Refactor / code health
Extracts model-specific logic out of the generic export code into a new model-family registry,
modelopt/torch/export/modeling/. Each family is described declaratively (per-model data + optional behavioral hooks); the export engine resolves a family'sModelSpecand reads from it instead of branching on model names. An unmatched lookup returnsNone, so the engine falls back to its original path — the migration is incremental and behavior-preserving.Migrated so far:
pre_quant_scalefusion rules.ModelHooks+ExportContext, dependency-injected to keepmodeling/cycle-free): decoder sub-module placement for Gemma2/3 layernorms, Gemma3 q/k-norms, and MLlama self/cross attention.Shared tables (
HF_CONFIG_MAP) and generic algorithms stay in the engine. Seemodelopt/torch/export/MODEL_SPECIFIC_REFACTOR.mdfor the inventory, design, and migration plan.Not yet migrated
Planned for follow-ups (still uses the engine's default path):
build_moe— per-family MoE router / experts / shared-expert construction (Llama/Phi3, DBRX, DeepSeek, Qwen).unwrap_decoder_layer— DBRX/ExaOne/Deci module-tree unwrap, plus thehead_is_first_dimflag (Bloom/Falcon/Phi3Small/InternLM) as data.tensorrt_llm_utils) — positional-embedding type, MPT alibi, RecurrentGemma, DBRXclip_qkv, Phi3-MoE sparse-mixer (mostly data).unified_export_hf/moe_utils) — MoE expert export branches (Llama4/GptOss/DBRX/iterable) are a separate engine and would need HF-side seams; VLM language-tower extraction currently lives inmodel_utils.Deferred (low value): embed √-scale (Gemma1-only + version-gated) and norm+1 (mixes the generic
LayerNorm1P).Intentionally kept in the engine:
HF_CONFIG_MAP(a shared alias table, not a per-model branch), generic algorithms (_GATE_UP_PAIRS, expert-amax fallback), the ChatGLM/Phi3 fused gate/fc chunk-swap (a fused-weight reshape), and speculative-decoding export (already modular underplugins/hf_spec_*).Usage
Testing
pytest tests/unit/torch/export/passes (78 passed;test_export_diffusers.pyskipped — pre-existing CUDA/glibc collection error unrelated to this change).Before your PR is "Ready for review"
tests/unit/torch/export/Additional Information
Refactor only — no change to exported checkpoints. Design/rationale:
modelopt/torch/export/MODEL_SPECIFIC_REFACTOR.md.