Skip to content

[Refactor] Extract model specific logics in export lib#1828

Draft
h-guo18 wants to merge 8 commits into
mainfrom
haoguo/export-modelinglib
Draft

[Refactor] Extract model specific logics in export lib#1828
h-guo18 wants to merge 8 commits into
mainfrom
haoguo/export-modelinglib

Conversation

@h-guo18

@h-guo18 h-guo18 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Refactor / code health

Extracts model-specific logic out of the generic export code into a new model-family registry, modelopt/torch/export/modeling/. Each family is described declaratively (per-model data + optional behavioral hooks); the export engine resolves a family's ModelSpec and reads from it instead of branching on model names. An unmatched lookup returns None, so the engine falls back to its original path — the migration is incremental and behavior-preserving.

Migrated so far:

  • Data: MoE expert names, iterable-expert flag, activation override, shared-embedding flag, MLP keyword roles, and AWQ pre_quant_scale fusion rules.
  • Hooks (ModelHooks + ExportContext, dependency-injected to keep modeling/ cycle-free): decoder sub-module placement for Gemma2/3 layernorms, Gemma3 q/k-norms, and MLlama self/cross attention.

Shared tables (HF_CONFIG_MAP) and generic algorithms stay in the engine. See modelopt/torch/export/MODEL_SPECIFIC_REFACTOR.md for the inventory, design, and migration plan.

Not yet migrated

Planned for follow-ups (still uses the engine's default path):

  • build_moe — per-family MoE router / experts / shared-expert construction (Llama/Phi3, DBRX, DeepSeek, Qwen).
  • unwrap_decoder_layer — DBRX/ExaOne/Deci module-tree unwrap, plus the head_is_first_dim flag (Bloom/Falcon/Phi3Small/InternLM) as data.
  • TRT-LLM target-config extras (tensorrt_llm_utils) — positional-embedding type, MPT alibi, RecurrentGemma, DBRX clip_qkv, Phi3-MoE sparse-mixer (mostly data).
  • HF path (unified_export_hf / moe_utils) — MoE expert export branches (Llama4/GptOss/DBRX/iterable) are a separate engine and would need HF-side seams; VLM language-tower extraction currently lives in model_utils.

Deferred (low value): embed √-scale (Gemma1-only + version-gated) and norm+1 (mixes the generic LayerNorm1P).

Intentionally kept in the engine: HF_CONFIG_MAP (a shared alias table, not a per-model branch), generic algorithms (_GATE_UP_PAIRS, expert-amax fallback), the ChatGLM/Phi3 fused gate/fc chunk-swap (a fused-weight reshape), and speculative-decoding export (already modular under plugins/hf_spec_*).

Usage

# Adding a model family is a small declarative file; no engine edits.
# modelopt/torch/export/modeling/families/<family>.py
register(
    ModelSpec(
        name="qwen_moe",
        moe_block_names=("Qwen3MoeSparseMoeBlock",),
        expert_linear_names=("gate_proj", "down_proj", "up_proj"),
        has_iterable_experts=True,
    )
)

Testing

  • pytest tests/unit/torch/export/ passes (78 passed; test_export_diffusers.py skipped — pre-existing CUDA/glibc collection error unrelated to this change).
  • Each migration step verified for behavioral equivalence against the legacy branches; fallback-first means un-migrated families are byte-for-byte unaffected.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (behavior-preserving; fallback-first)
  • If you copied code from any other sources or added a new PIP dependency: N/A
  • Did you write any new necessary tests?: N/A — behavior-preserving refactor covered by existing tests/unit/torch/export/
  • Did you update Changelog?: N/A (internal refactor, no public API change)
  • Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

Refactor only — no change to exported checkpoints. Design/rationale: modelopt/torch/export/MODEL_SPECIFIC_REFACTOR.md.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@h-guo18 h-guo18 changed the title [Refactor] Extract model specific logics in export Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1f7520a8-570a-432d-a5dd-0b18214bd98f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoguo/export-modelinglib

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.96296% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.16%. Comparing base (4093664) to head (8ddd26c).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/export/layer_utils.py 14.28% 24 Missing ⚠️
modelopt/torch/export/modeling/families/gemma.py 33.33% 16 Missing ⚠️
modelopt/torch/export/modeling/registry.py 50.00% 16 Missing ⚠️
modelopt/torch/export/modeling/families/mllama.py 53.84% 6 Missing ⚠️
modelopt/torch/export/modeling/hooks.py 72.72% 3 Missing ⚠️
modelopt/torch/export/quant_utils.py 25.00% 3 Missing ⚠️
modelopt/torch/export/model_config_export.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1828      +/-   ##
==========================================
- Coverage   77.36%   77.16%   -0.21%     
==========================================
  Files         513      534      +21     
  Lines       56889    57365     +476     
==========================================
+ Hits        44012    44265     +253     
- Misses      12877    13100     +223     
Flag Coverage Δ
unit 54.69% <62.96%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
h-guo18 added 7 commits June 25, 2026 20:23
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>


@dataclass
class ModelSpec:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trtllm export path is deprecated.

@@ -60,6 +60,13 @@
RgLruConfig,
)
from .model_config_utils import pad_weights

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HF export only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant