Skip to content

🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255

Merged
ArthurZucker merged 25 commits into
huggingface:mainfrom
zucchini-nlp:rope-params
Nov 28, 2025
Merged

🚨 Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 #42255
ArthurZucker merged 25 commits into
huggingface:mainfrom
zucchini-nlp:rope-params

Conversation

@zucchini-nlp

@zucchini-nlp zucchini-nlp commented Nov 18, 2025

Copy link
Copy Markdown
Member

What does this PR do?

To finalize the work on rope config, I am moving rotary_partial_emb to rope parameter dict as well. Along with it, I did some clean-up on standardization because we can make a few assumptions with the models we have

Note, PR is breaking BC completely and users will no longer have access to config.rope_theta since I pop it from config kwargs manually. That way is more clear imo than having two rope thetas in different places

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [WIP] Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 26, 2025
Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment on lines 85 to 91
def get_standardized_rope_params(config):
"""
Helper to standardize the config's rope params field by ensuring the params are defined for each
later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility)
"""
rope_parameters = getattr(config, "rope_parameters", None)
layer_types = getattr(config, "layer_types", None)
if rope_theta is None:
rope_theta = getattr(config, "rope_theta", None)
rope_parameters = getattr(config, "rope_parameters", {})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could have been simplified if we make a few assumption, and we can make assumptions because only 2 models have a nested rope parameterization

Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/models/apertus/configuration_apertus.py Outdated
@vasqu vasqu mentioned this pull request Nov 27, 2025
5 tasks

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2 cents 😄

Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/modeling_rope_utils.py Outdated
Comment thread src/transformers/models/apertus/configuration_apertus.py Outdated
Comment thread src/transformers/models/efficientloftr/configuration_efficientloftr.py Outdated

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general if we can put stuff in PreTrainedConfig I am also happy, but fine this way as well

Comment thread src/transformers/models/apertus/configuration_apertus.py Outdated
zucchini-nlp and others added 6 commits November 27, 2025 16:19
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…loftr.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RotaryEmbeddingConfigMixin is my Christmas gift! ty its a lot better

Comment thread src/transformers/models/apertus/configuration_apertus.py Outdated
@zucchini-nlp

Copy link
Copy Markdown
Member Author

run-slow: llama, gemma3, qwen2, qwen2_vl, mistral, mixtral, modernbert, llava

@zucchini-nlp zucchini-nlp changed the title Move rotary_partial_emb to RopeParams and delete unnecessary code 🔪 Nov 28, 2025
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon, cohere, cohere2, csm, cwm, dbrx, deepseek_v2, deepseek_v3, dia, diffllama

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/gemma3", "models/llama", "models/llava", "models/mistral", "models/mixtral", "models/modernbert", "models/qwen2", "models/qwen2_vl"]
quantizations: []

@zucchini-nlp

Copy link
Copy Markdown
Member Author

Doc-builder and weight tying tests will fail but are not related

@github-actions

Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@ArthurZucker ArthurZucker merged commit 078ff68 into huggingface:main Nov 28, 2025
20 of 25 checks passed
sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
… 🔪 (huggingface#42255)

* tmp

* batch push

* maybe better pop and break, and we'll have one theta per config in the rope dict

* update a few models?

* fix tests that are easu first

* dont overwrite if already present!!!

* partial rotary factor

* more fixes to the god of fixes

* setdefault

* fix copies

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update src/transformers/models/efficientloftr/configuration_efficientloftr.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* attempt one

* update all models

* fix tests

* fix tests

* oops

* fix slow tests with nested rope models

* fix copies

* deal with  circular import and move the mixin to base config class

* fix copies

* fix a few tests

* update the migration guide

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
@fxmarty-amd fxmarty-amd mentioned this pull request Dec 22, 2025
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
… 🔪 (huggingface#42255)

* tmp

* batch push

* maybe better pop and break, and we'll have one theta per config in the rope dict

* update a few models?

* fix tests that are easu first

* dont overwrite if already present!!!

* partial rotary factor

* more fixes to the god of fixes

* setdefault

* fix copies

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update src/transformers/models/efficientloftr/configuration_efficientloftr.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* attempt one

* update all models

* fix tests

* fix tests

* oops

* fix slow tests with nested rope models

* fix copies

* deal with  circular import and move the mixin to base config class

* fix copies

* fix a few tests

* update the migration guide

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
jaybe1234 added a commit to jaybe1234/sglang that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants