Conversation
taylormjs
approved these changes
Jul 10, 2025
Collaborator
taylormjs
left a comment
There was a problem hiding this comment.
We may want to move the metadata method from the dgeb adapter to UME. Other than that, logging looks solid
slurm/scripts/eval_dgeb_ume.sh
Outdated
| uv run lobster_dgeb_eval \ | ||
| ume-mini-base-12M \ | ||
| --modality dna \ | ||
| --tasks ec_dna_classification \ |
Collaborator
There was a problem hiding this comment.
Just one task by default?
Contributor
Author
There was a problem hiding this comment.
updated to loop through all tasks
|
|
||
| # Validate the loaded model | ||
| logger.info("Validating loaded model configuration...") | ||
| total_params = sum(p.numel() for p in model.parameters()) |
Collaborator
There was a problem hiding this comment.
What about making a metadata property method within the UME class and calling that throughout UME & dgeb_adapter? That might remove some redundancies
Contributor
Author
There was a problem hiding this comment.
metadata is just to satisfy what BioSeqTransformer expects when we wrap it; i think most of the actual properties are already contained in the model config
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces enhancements to logging and model configuration validation in the
UMEmodel's codebase, along with updates to supported model variants. The changes aim to improve transparency during model loading, debugging, and validation processes. Additionally, a new task has been added to the evaluation script.Logging Enhancements:
src/lobster/evaluation/dgeb_adapter.py: Added detailed logging for model configurations during loading and creation, including embedding dimensions, number of layers, and total parameters. This helps track model properties during runtime. [1] [2] [3] [4]src/lobster/model/_ume.py: Introduced logging for model loading, checkpoint validation, and parameter count checks against expected ranges. This ensures proper model initialization and provides warnings for mismatches. [1] [2]src/lobster/model/_utils_checkpoint.py: Enhanced logging for checkpoint loading and retry mechanisms, including detailed error handling for corrupted checkpoints.Model Configuration Updates:
src/lobster/model/_ume.py: Added support for a new model variant,ume-small-base-90M, expanding the range of available pretrained models.Evaluation Script Update:
slurm/scripts/eval_dgeb_ume.sh: Included theec_dna_classificationtask in the evaluation script for DNA modality, broadening the scope of tasks being evaluated.Type of Change
Testing
Checklist