Skip to content

cDNA comparison#188

Merged
taylormjs merged 12 commits intomainfrom
t/calm-patch-all-tests
Sep 3, 2025
Merged

cDNA comparison#188
taylormjs merged 12 commits intomainfrom
t/calm-patch-all-tests

Conversation

@taylormjs
Copy link
Collaborator

Description

  • Fix all calm tasks, including function and species-specific tasks
  • Match eval done by calm paper for more fair comparison
  • Tests

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring

Taylor Joren and others added 10 commits August 11, 2025 23:45
- Add default species selection (hsapiens, ecoli, scerevisiae) when none specified
- Improve error handling with logger.error instead of logger.debug
- Add detailed logging for task processing and completion
- Update docstring to reflect default species behavior
…ings match/case, logging; LinearProbe: configurable classification_threshold, LR max_iter, simplified multilabel proba; Dataset: clean prints/len, logging; Hydra: expose options incl. classification_threshold; lint + callback tests pass
@taylormjs taylormjs marked this pull request as ready for review September 2, 2025 20:01
from lobster.datasets import CalmPropertyDataset

from ._linear_probe_callback import LinearProbeCallback
from ._peer_utils import convert_numpy_to_python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some of these utils seem more general than just for PEER so perhaps they can live elsewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good nit. I'll do some reorganizing in a separate PR to avoid this one getting too much bigger

try:
train_embeddings, train_targets = self._get_embeddings(module, train_loader, modality="nucleotide")
test_embeddings, test_targets = self._get_embeddings(module, test_loader, modality="nucleotide")
if self.use_cross_validation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would self.use_cross_validation=False be the same as 1-fold CV?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would only differ in the test split, which would be an issue if we have dedicated test splits. 1-fold CV would merge the train + test and then do an iid random split. Also, linear_probe_callback uses sklearn's KFold, which requires 2+ folds

return probe

def _evaluate_probe(self, probe, embeddings: Tensor, targets: Tensor) -> dict[str, float]:
def _evaluate_probe(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a long function 😅 might be out of scope of this MR but I wonder whether we should turn callbacks into Metrics which would implement the core logic and then the callback just calls metrics

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, we could probably just refactor the parts where we get metrics into a helper func

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! Just going to make the metric calls into helpers for this MR. We can carve this out into a separate module for metrics in another

@karinazad
Copy link
Collaborator

do we need more tests for the new parts of the code?

- Applied ruff check --fix to resolve 21 linting errors
- Applied ruff format to ensure consistent code formatting
- Updated callbacks, constants, datasets, and test files
@taylormjs taylormjs merged commit de1df9b into main Sep 3, 2025
4 checks passed
@taylormjs taylormjs deleted the t/calm-patch-all-tests branch September 3, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants