cDNA comparison by taylormjs · Pull Request #188 · prescient-design/lobster

taylormjs · 2025-08-29T22:13:15Z

Description

Fix all calm tasks, including function and species-specific tasks
Match eval done by calm paper for more fair comparison
Tests

Type of Change

- Add default species selection (hsapiens, ecoli, scerevisiae) when none specified - Improve error handling with logger.error instead of logger.debug - Add detailed logging for task processing and completion - Update docstring to reflect default species behavior

…ings match/case, logging; LinearProbe: configurable classification_threshold, LR max_iter, simplified multilabel proba; Dataset: clean prints/len, logging; Hydra: expose options incl. classification_threshold; lint + callback tests pass

karinazad · 2025-09-03T12:08:08Z

src/lobster/callbacks/_calm_linear_probe_callback.py

 from lobster.datasets import CalmPropertyDataset

 from ._linear_probe_callback import LinearProbeCallback
+from ._peer_utils import convert_numpy_to_python


nit: some of these utils seem more general than just for PEER so perhaps they can live elsewhere

It's a good nit. I'll do some reorganizing in a separate PR to avoid this one getting too much bigger

src/lobster/callbacks/_calm_linear_probe_callback.py

karinazad · 2025-09-03T12:11:54Z

src/lobster/callbacks/_calm_linear_probe_callback.py

        try:
-            train_embeddings, train_targets = self._get_embeddings(module, train_loader, modality="nucleotide")
-            test_embeddings, test_targets = self._get_embeddings(module, test_loader, modality="nucleotide")
+            if self.use_cross_validation:


nit: would self.use_cross_validation=False be the same as 1-fold CV?

They would only differ in the test split, which would be an issue if we have dedicated test splits. 1-fold CV would merge the train + test and then do an iid random split. Also, linear_probe_callback uses sklearn's KFold, which requires 2+ folds

src/lobster/callbacks/_calm_linear_probe_callback.py

karinazad · 2025-09-03T12:15:28Z

src/lobster/callbacks/_linear_probe_callback.py

        return probe

-    def _evaluate_probe(self, probe, embeddings: Tensor, targets: Tensor) -> dict[str, float]:
+    def _evaluate_probe(


this is a long function 😅 might be out of scope of this MR but I wonder whether we should turn callbacks into Metrics which would implement the core logic and then the callback just calls metrics

for now, we could probably just refactor the parts where we get metrics into a helper func

That's a great idea! Just going to make the metric calls into helpers for this MR. We can carve this out into a separate module for metrics in another

karinazad · 2025-09-03T12:20:42Z

do we need more tests for the new parts of the code?

src/lobster/callbacks/_linear_probe_callback.py

src/lobster/datasets/_calm_property_dataset.py

- Applied ruff check --fix to resolve 21 linting errors - Applied ruff format to ensure consistent code formatting - Updated callbacks, constants, datasets, and test files

Taylor Joren and others added 10 commits August 11, 2025 23:45

add default species

26d73fd

add calm patch 1

28d4bf4

fix protein, transcript abundance and GO tasks

d5aef85

match calm evals with paper, add metrics

d1f167b

update calm callback and datasets for better comparison with calm paper

b07d09c

Merge branch 'main' into t/calm-patch-all-tests

8c1cbe5

ruff

093814a

more ruff

4c1380c

taylormjs marked this pull request as ready for review September 2, 2025 20:01

cleanup

9fd5627