T/callback update by taylormjs · Pull Request #183 · prescient-design/lobster

taylormjs · 2025-08-08T22:03:31Z

Updates to DGEB, CaLM and MoleculeACE (public) callbacks:

Add DGEB callback to be consistent with other linear probe callbacks and to work with hydra configs
LinearProbeCallback now properly handles mean pooling (before was including pad tokens)
MoleculeACE Callback now matches lobster_internal
CaLM callback now uses LinearProbeCallbacks embed method instead of its own
DGEB callback implemented for ESM

Major Features:

Add comprehensive DGEBEvaluationCallback for UME and ESM models
Implement ESMAdapterDGEB for direct ESM model evaluation without checkpoints
Add shared pooling utilities for consistent embedding aggregation
Enhance MoleculeACE linear probe with better model compatibility

Core Components:

DGEBEvaluationCallback: Unified callback supporting both UME (checkpoint-based) and ESM (direct) evaluation workflows
ESMAdapterDGEB: DGEB-compatible adapter for ESM models with proper masked pooling
Shared pooling utilities: mean/max/cls/last pooling with attention masking
Enhanced error handling and graceful task failure recovery

Improvements:

Better embedding extraction across different model types
Improved linear probe callbacks with enhanced input processing
Updated DGEB runners with better error handling and reporting
Comprehensive test coverage for new ESM adapter functionality

Type of Change

Major Features: - Add comprehensive DGEBEvaluationCallback for UME and ESM models - Implement ESMAdapterDGEB for direct ESM model evaluation without checkpoints - Add shared pooling utilities for consistent embedding aggregation - Enhance MoleculeACE linear probe with better model compatibility Core Components: - DGEBEvaluationCallback: Unified callback supporting both UME (checkpoint-based) and ESM (direct) evaluation workflows - ESMAdapterDGEB: DGEB-compatible adapter for ESM models with proper masked pooling - Shared pooling utilities: mean/max/cls/last pooling with attention masking - Enhanced error handling and graceful task failure recovery Improvements: - Better embedding extraction across different model types - Improved linear probe callbacks with enhanced input processing - Updated DGEB runners with better error handling and reporting - Comprehensive test coverage for new ESM adapter functionality

…CE callback with internal implementation

ncfrey · 2025-08-12T15:14:15Z

src/lobster/callbacks/_linear_probe_callback.py

 import lightning as L
 import numpy as np
 import torch
-from lobster.transforms import Transform


we made these changes in #166 for consistency

ncfrey · 2025-08-12T15:15:29Z

src/lobster/evaluation/_pooling_utils.py

+def apply_dgeb_pooling(
+    token_embeddings: torch.Tensor,
+    attention_mask: torch.Tensor,
+    pool_type: Literal["mean", "max", "cls", "last"] = "mean",


possibly define an Enum for pooling types across the library?

ncfrey · 2025-08-12T15:16:43Z

src/lobster/evaluation/dgeb_adapter.py

-                pooled = torch.stack([layer_hidden[i, l, :] for i, l in enumerate(lengths)], dim=0)
-            else:
-                raise ValueError(f"Unsupported pool_type: {self.pool_type}")
+            pooled = apply_dgeb_pooling(layer_hidden, attention_mask, self.pool_type)


nice! i like having the pooling logic self-contained

Yeah, it's nice for easy to transfer to esm_dgeb_adapter, for example

ncfrey · 2025-08-12T15:17:11Z

src/lobster/evaluation/dgeb_mock_runner.py

        }

-        # Extract key metrics from results
+        # Extract key metrics from results with error handling for individual tasks


good call - are you seeing failed tasks frequently?

Not frequently, but enough to where I wanted a report of which tasks failed

ncfrey · 2025-08-12T15:18:34Z

src/lobster/evaluation/esm_dgeb_adapter.py

+    def __init__(
+        self,
+        module: L.LightningModule,
+        modality: Literal["protein", "dna"] = "protein",


can we harmonize this with UME's modality types?

ncfrey · 2025-08-12T15:19:26Z

src/lobster/evaluation/esm_dgeb_adapter.py

+        """
+
+        # Create a minimal tokenizer object with required attributes
+        class MinimalTokenizer:


maybe DummyTokenizer then since this isn't used?

Yeah, good point. I had it as DummyTokenizer before but changed it because of Monday "vibes"

Taylor Joren added 12 commits August 4, 2025 20:38

fix public moleculeace, linear probe with correct pooling

372f5cb

update dgeb to handle esm baseline, gracefully handle task failures

4872ec9

update dgeb callback

0ca8d1e

add tests for esm dgeb adapter

a809ef9

fix ruff

79a2ad2

gitignore: prevent large database files and macOS artifacts

8d71ecd

callbacks: remove stray MoleculeACE fixed variant and align MoleculeA…

e1d4d78

…CE callback with internal implementation

fix calm log metrics call

7729b10

ruff

05226df

cleanup esm dgeb adpater

153ead3

polish esm dgeb adapter

b1ecdbd

taylormjs marked this pull request as ready for review August 11, 2025 21:46

taylormjs requested a review from ncfrey August 11, 2025 21:46

Taylor Joren added 2 commits August 11, 2025 21:51

update eval init

56a2c7a

ws

1f0e395

ncfrey approved these changes Aug 12, 2025

View reviewed changes

Taylor Joren and others added 5 commits August 12, 2025 17:54

incorporate feedback

a46a411

fix merge conflict, update flash attn fixes

1b9c38b

Merge branch 'main' into t/callback-update

8d42d0d

add backwards compatibility for dna, protein str inputs

71b6df5

adapter molecule type

d91a4ed

taylormjs merged commit 8dbe3f2 into main Aug 12, 2025
4 checks passed

taylormjs deleted the t/callback-update branch August 12, 2025 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T/callback update#183

T/callback update#183
taylormjs merged 19 commits intomainfrom
t/callback-update

taylormjs commented Aug 8, 2025 •

edited

Loading

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

taylormjs Aug 12, 2025

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

taylormjs Aug 12, 2025

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

ncfrey Aug 12, 2025

Uh oh!

taylormjs Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taylormjs commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates to DGEB, CaLM and MoleculeACE (public) callbacks:

Type of Change

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taylormjs commented Aug 8, 2025 •

edited

Loading