Fix dynamic RoPE not resetting inv_freq when layer_type is None#46624
Merged
zucchini-nlp merged 3 commits intoJun 15, 2026
Merged
Conversation
dynamic_frequency_update reads self.max_seq_len_cached but writes the
updated length to f"{layer_type}_max_seq_len_cached" (lines 110, 118).
When layer_type is None this targets a stray None_max_seq_len_cached
attribute, so self.max_seq_len_cached is never updated: the reset branch
never fires and inv_freq stays scaled after a long sequence instead of
resetting for the following short sequences.
Use prefix for both setattr calls so the write matches the read at
line 91. The layer_type is not None path is unchanged. Add a regression
test driving a long then short sequence.
Contributor
Author
|
Note on the current failing The following failures are downstream of that ( |
zucchini-nlp
approved these changes
Jun 15, 2026
| torch.testing.assert_close(inv_freq, default_inv_freq / factor) | ||
| torch.testing.assert_close(inv_freq, EXPECTED_INV_FREQ) | ||
|
|
||
| def test_dynamic_rope_resets_after_long_sequence(self): |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
|
CI Dashboard: View test results in Grafana |
Merged
via the queue into
huggingface:main
with commit Jun 15, 2026
63ddf94
119 of 121 checks passed
Contributor
Author
|
Thanks @zucchini-nlp! That was a sneaky one. Appreciate the quick review and merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
On the
layer_type=Nonepath ofdynamic_frequency_update(src/transformers/modeling_rope_utils.py), the cached length is read fromself.max_seq_len_cachedbut written withsetattr(self, f"{layer_type}_max_seq_len_cached", ...), which creates a strayNone_max_seq_len_cachedattribute.self.max_seq_len_cachedis therefore never updated after a long sequence, so the reset branch (max_seq_len_cached > self.original_max_seq_len) cannot run when a short sequence follows: the rotary embedding keeps the dynamically-scaledinv_freqinstead of restoring the original one.The other buffer/attribute writes in
dynamic_frequency_updatealready useprefix(""whenlayer_type is None,f"{layer_type}_"otherwise). This switches the two cached-length writes toprefixas well, so the write matches the read. Thelayer_type is not Nonepath is unchanged, sinceprefix == f"{layer_type}_"there.Added a regression test that runs a dynamic RoPE embedding through a long sequence and then a short one, checking that the cache length is updated and
inv_freqresets.Code Agent Policy
Before submitting
Who can review?
@zucchini-nlp