Conversation
Major Features: - Add comprehensive DGEBEvaluationCallback for UME and ESM models - Implement ESMAdapterDGEB for direct ESM model evaluation without checkpoints - Add shared pooling utilities for consistent embedding aggregation - Enhance MoleculeACE linear probe with better model compatibility Core Components: - DGEBEvaluationCallback: Unified callback supporting both UME (checkpoint-based) and ESM (direct) evaluation workflows - ESMAdapterDGEB: DGEB-compatible adapter for ESM models with proper masked pooling - Shared pooling utilities: mean/max/cls/last pooling with attention masking - Enhanced error handling and graceful task failure recovery Improvements: - Better embedding extraction across different model types - Improved linear probe callbacks with enhanced input processing - Updated DGEB runners with better error handling and reporting - Comprehensive test coverage for new ESM adapter functionality
…CE callback with internal implementation
| import lightning as L | ||
| import numpy as np | ||
| import torch | ||
| from lobster.transforms import Transform |
| def apply_dgeb_pooling( | ||
| token_embeddings: torch.Tensor, | ||
| attention_mask: torch.Tensor, | ||
| pool_type: Literal["mean", "max", "cls", "last"] = "mean", |
There was a problem hiding this comment.
possibly define an Enum for pooling types across the library?
| pooled = torch.stack([layer_hidden[i, l, :] for i, l in enumerate(lengths)], dim=0) | ||
| else: | ||
| raise ValueError(f"Unsupported pool_type: {self.pool_type}") | ||
| pooled = apply_dgeb_pooling(layer_hidden, attention_mask, self.pool_type) |
There was a problem hiding this comment.
nice! i like having the pooling logic self-contained
There was a problem hiding this comment.
Yeah, it's nice for easy to transfer to esm_dgeb_adapter, for example
| } | ||
|
|
||
| # Extract key metrics from results | ||
| # Extract key metrics from results with error handling for individual tasks |
There was a problem hiding this comment.
good call - are you seeing failed tasks frequently?
There was a problem hiding this comment.
Not frequently, but enough to where I wanted a report of which tasks failed
| def __init__( | ||
| self, | ||
| module: L.LightningModule, | ||
| modality: Literal["protein", "dna"] = "protein", |
There was a problem hiding this comment.
can we harmonize this with UME's modality types?
| """ | ||
|
|
||
| # Create a minimal tokenizer object with required attributes | ||
| class MinimalTokenizer: |
There was a problem hiding this comment.
maybe DummyTokenizer then since this isn't used?
There was a problem hiding this comment.
Yeah, good point. I had it as DummyTokenizer before but changed it because of Monday "vibes"
Updates to DGEB, CaLM and MoleculeACE (public) callbacks:
Major Features:
Core Components:
Improvements:
Type of Change