Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This pull request fixes configuration parameters for the UME_large model in the ModernBERT module to align with the intended architecture specifications.
Key changes:
- Corrected attention head count to be divisible by the hidden size
- Adjusted intermediate and hidden sizes to match proper model dimensions
| "num_attention_heads": 25, | ||
| "intermediate_size": 6400, | ||
| "hidden_size": 1600, | ||
| "num_attention_heads": 24, |
There was a problem hiding this comment.
The change from 25 to 24 attention heads ensures the hidden_size (1728) is evenly divisible by num_attention_heads (24), which is required for multi-head attention to work correctly. This fixes a potential runtime error where head_dim would not be an integer.
taylormjs
approved these changes
Aug 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request includes a configuration update for the
UME_largemodel in theModernBERTmodule. The changes adjust several parameters to better align with the intended architecture.Configuration updates for
UME_largemodel:src/lobster/model/modern_bert/_modern_bert_configuration.py: UpdatedUME_largemodel parameters:num_attention_headsfrom 25 to 24.intermediate_sizefrom 6400 to 6912.hidden_sizefrom 1600 to 1728.Type of Change