[RoPE] explicit factor > implicit factor in YaRN#40320
[RoPE] explicit factor > implicit factor in YaRN#40320gante merged 1 commit intohuggingface:mainfrom
Conversation
| # values to compute the default attention scaling factor, instead of using `factor`. | ||
| if "original_max_position_embeddings" in config.rope_scaling: | ||
| original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"] | ||
| factor = config.max_position_embeddings / original_max_position_embeddings |
There was a problem hiding this comment.
here the implicit factor is taking precedence, which shouldn't happen
(validation is added below, in the validation function)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # values to compute the default attention scaling factor, instead of using `factor`. | ||
| if "original_max_position_embeddings" in config.rope_scaling: | ||
| original_max_position_embeddings = config.rope_scaling["original_max_position_embeddings"] | ||
| factor = config.max_position_embeddings / original_max_position_embeddings |
There was a problem hiding this comment.
this deletes factor and it seems to be still used below with attention_factor 👀
There was a problem hiding this comment.
@zucchini-nlp factor is defined a few lines above (L219). The line deleted here is a redefinition, where factor is implicitly derived from other parameters.
We should use explicit parameterization and warn when the defined parameters don't match as a whole (which is what this PR does 🤗 )
| if implicit_factor != factor: | ||
| logger.warning_once( | ||
| f"The explicitly set RoPE scaling factor (config.rope_scaling['factor'] = {factor}) does not match " | ||
| "the ratio implicitly set by other parameters (implicit factor = " | ||
| "post-yarn context length / pre-yarn context length = " | ||
| "config.max_position_embeddings / config.rope_scaling['original_max_position_embeddings'] = " | ||
| f"{implicit_factor}). Using the explicit factor ({factor}) in YaRN. This may cause unexpected " | ||
| "behaviour in model usage, please correct the 'max_position_embeddings' fields in the model config." |
There was a problem hiding this comment.
should we throw an error instead, if users explicitly set config values to be mismatching?
There was a problem hiding this comment.
Agreed in theory, but it can be breaking for some models on the Hub 😢 As such, I believe a warning is more adequate.
|
@zucchini-nlp I see likes in my replies, so I'm assuming you're happy with the PR :p merging |
|
I think this PR is the root cause of #40461. In |
What does this PR do?
Fixes #38224
This PR: