Conversation
| ) | ||
|
|
||
| # Convert to numpy | ||
| batch_embeddings = batch_embeddings.detach().cpu().numpy() |
There was a problem hiding this comment.
TODO: run on gpu as well
* rope tests * skip test --------- Co-authored-by: freyn6 <freyn6@gene.com>
taylormjs
left a comment
There was a problem hiding this comment.
Looks great overall! Left some minor comments
| --output-dir OUTPUT_DIR # Optional: results directory (default: dgeb_results) | ||
| --batch-size BATCH_SIZE # Optional: encoding batch size (default: 32) | ||
| --max-seq-length MAX_LENGTH # Optional: max sequence length (default: 1024) | ||
| --use-flash-attn # Optional: enable flash attention |
There was a problem hiding this comment.
We may only want one of these two flags --use-flash-attn or --no-flash-attn
src/lobster/evaluation/README.md
Outdated
| ### Performance Tips | ||
|
|
||
| - **Batch Size**: Increase `--batch-size` for faster evaluation on GPU (try 64-128) | ||
| - **Sequence Length**: Reduce `--max-seq-length` if memory is limited (try 512) |
There was a problem hiding this comment.
We should consider indicating if any tasks require longer sequence lengths, especially for DNA tasks. Having a flashback to the AAV task where all sequence variation was after index 512
| Embeddings of shape [batch_size, num_layers, embedding_dim]. | ||
| """ | ||
| # For now, use the high-level embed_sequences method which gives us aggregated embeddings | ||
| # TODO: In the future, we could implement proper layer-wise extraction by calling |
There was a problem hiding this comment.
Agreed! This is good for now
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class UMEAdapter(BioSeqTransformer): |
There was a problem hiding this comment.
Maybe change to UMEAdapterDGEB or something more specific? In case we add more of these, and to avoid confusion in case we add LoRAdapters
| output_path : Path | ||
| Path to save the report. | ||
| """ | ||
| report_path = output_path / "evaluation_report.md" |
There was a problem hiding this comment.
Should this include a timestamp to avoid overwriting reports? I suppose that could also be in the output_path
|
|
||
| # Get available tasks | ||
| all_tasks = dgeb.get_all_task_names() | ||
| assert len(all_tasks) > 0, "Should find some tasks" |
There was a problem hiding this comment.
Assert len(all_tasks) = num_tasks instead of > 0? Just to be sure we're capturing all of them. Same for protein & dna tasks
There was a problem hiding this comment.
kept this as > 0 to be robust to dgeb dataset updates
| assert max_diff > 1e-6, f"Rotary embedding did not modify the tensor. Max diff: {max_diff}" | ||
|
|
||
|
|
||
| def test_rotary_embedding_positional_invariance(): |
|
@ncfrey Just saw the merge conflicts in uv.lock |
| from pathlib import Path | ||
|
|
||
| # Add the evaluation module to the path | ||
| sys.path.insert(0, str(Path(__file__).parent.parent)) |
Summary
This PR adds comprehensive DGEB (DNA/Protein Language Model Benchmark) evaluation integration for UME models, enabling standardized benchmarking of biological sequence models.
Key Features
Ume.from_pretrained()methodlobster_dgeb_evalcommand for easy evaluation runsImplementation Details
BioSeqTransformerinterfaceFiles Added
src/lobster/evaluation/dgeb_adapter.py- Core adapter implementationsrc/lobster/evaluation/dgeb_runner.py- Evaluation orchestrationsrc/lobster/evaluation/README.md- Comprehensive documentationsrc/lobster/cmdline/dgeb_eval.py- CLI entry pointtests/lobster/evaluation/test_dgeb_integration.py- Complete test suiteUsage
Testing
All tests pass including:
Test Plan