Skip to content

🚨 [v5] Remove relative position embeddings (for bert like models)#41170

Merged
vasqu merged 9 commits into
huggingface:mainfrom
vasqu:remove-relative-positions-bert-likes
Oct 6, 2025
Merged

🚨 [v5] Remove relative position embeddings (for bert like models)#41170
vasqu merged 9 commits into
huggingface:mainfrom
vasqu:remove-relative-positions-bert-likes

Conversation

@vasqu

@vasqu vasqu commented Sep 25, 2025

Copy link
Copy Markdown
Collaborator

These embedding types are barely used and make the modeling files just more complex without justifying their existence. Position embedding types still exist in a few models; this PR just addresses the relative_key(_query) ones.

Some stats:

  • None of the slow tests use them except bert
  • The respective models in those tests together have less than 2k downloads in the last month

cc @hmellor this should remove any clashes with the kwargs you encountered in vLLM :D

@vasqu

vasqu commented Sep 25, 2025

Copy link
Copy Markdown
Collaborator Author

run-slow: flava, instructblib, mra

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly due to me forgetting to update them in my bert refactor PR --> big diff because the whole refactor is included (same for the roberta example)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Only includes the changes here now

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/flava', 'models/mra']
quantizations: [] ...

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu

vasqu commented Sep 25, 2025

Copy link
Copy Markdown
Collaborator Author

run-slow: instructblip

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/instructblip']
quantizations: [] ...

@vasqu vasqu force-pushed the remove-relative-positions-bert-likes branch from 6046d27 to 0dbd18b Compare September 25, 2025 18:49
@vasqu vasqu marked this pull request as ready for review September 26, 2025 11:14
@vasqu

vasqu commented Sep 26, 2025

Copy link
Copy Markdown
Collaborator Author

run-slow: bert, roberta, albert, mra, instructblip, blip_2, flava

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/albert', 'models/bert', 'models/blip_2', 'models/flava', 'models/instructblip', 'models/mra', 'models/roberta']
quantizations: [] ...

@vasqu

vasqu commented Sep 26, 2025

Copy link
Copy Markdown
Collaborator Author

Failing slow tests are the same as in main 👀

@vasqu vasqu added the for_v5? label Sep 30, 2025

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, super nice clean-up! 🧼

return embeddings


def eager_attention_forward(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think now we can copy bert from llama or another big model group? 👀 Keeping less sources of truth makes it easier to submit PRs

@vasqu vasqu Oct 1, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make a follow-up PR for that, would like to sync bert and bart instead tho since llama would indicate causal masks which is not the case here + unnecessary gqa dependency from llama

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #41248 for the sync

Comment on lines -505 to -520
if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
seq_length = hidden_states.size()[1]
position_ids_l = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)
position_ids_r = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(1, -1)
distance = position_ids_l - position_ids_r
positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
positional_embedding = positional_embedding.to(dtype=query_layer.dtype) # fp16 compatibility

if self.position_embedding_type == "relative_key":
relative_position_scores = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
attention_scores = attention_scores + relative_position_scores
elif self.position_embedding_type == "relative_key_query":
relative_position_scores_query = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
relative_position_scores_key = torch.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to see it, I assumed that BLIP models use relative positions haha. Now it can support attention implementation API in qformer 🙌🏻

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it's honestly a bit baffling how many models have this while not using it at all 👀

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longdue ! Thanks

@github-actions

github-actions Bot commented Oct 3, 2025

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: albert, align, altclip, bert, bert_generation, big_bird, blip, blip_2, bridgetower, bros, camembert, canine, chinese_clip, clap, data2vec, dpr

@vasqu

vasqu commented Oct 6, 2025

Copy link
Copy Markdown
Collaborator Author

Merging as nothing has changed, just forgot to merge 😅

@vasqu vasqu merged commit c27b67f into huggingface:main Oct 6, 2025
25 checks passed
@vasqu vasqu deleted the remove-relative-positions-bert-likes branch October 6, 2025 12:21
@vasqu vasqu mentioned this pull request Oct 9, 2025
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
…uggingface#41170)

* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants