Skip to content

Ume evaluation#87

Merged
karinazad merged 42 commits intomainfrom
ume-eval
May 28, 2025
Merged

Ume evaluation#87
karinazad merged 42 commits intomainfrom
ume-eval

Conversation

@karinazad
Copy link
Collaborator

@karinazad karinazad commented May 20, 2025

Added UMAP Visualization Callback and Generic Evaluation Framework

What's New

  • Added UMAP visualization callback for dimensionality reduction visualization of model embeddings
  • Created generic model evaluation framework that works with any compatible callbacks
  • Implemented evaluate_model_with_callbacks function for evaluating models with multiple callbacks
  • Added markdown report generation that formats results from different callback types
  • Added documentation explaining how to create evaluation-compatible callbacks
  • Created README for callbacks directory showing how to implement the evaluation interface
  • Created README for evaluation directory explaining the framework usage and report format

Technical Details

  • Callbacks must inherit from lightning.Callback and implement an evaluate method
  • The evaluation framework automatically detects if callbacks need dataloaders
  • Evaluation results are aggregated into a single comprehensive markdown report
  • Supported return types include metrics dictionaries, file paths, and nested structures
  • Command-line interface available via lobster_eval command

How to Use

  • Create callbacks that implement the evaluate method
  • Use evaluate_model_with_callbacks to run evaluation with multiple callbacks
  • Results are automatically formatted into a structured markdown report

@karinazad karinazad changed the title [Draft] Ume evaluation Ume evaluation May 20, 2025
@ncfrey ncfrey requested a review from Copilot May 20, 2025 18:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new model evaluation framework alongside a UMAP visualization callback to simplify model evaluation with structured markdown reports. Key changes include:

  • Addition of UMAP visualization callback configuration and implementation.
  • Creation of a generic evaluation framework that aggregates callback evaluations into a markdown report.
  • Updates to command-line interfaces and documentation for evaluation and callback usage.

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/lobster/hydra_config/evaluate.yaml New evaluation configuration file
src/lobster/evaluation/_evaluate_model_with_callbacks.py Implementation of the evaluation framework
src/lobster/evaluation/README.md Documentation for using the evaluation framework
src/lobster/cmdline/evaluate.py Command-line entry point for model evaluation
src/lobster/cmdline/init.py Updates for including the new evaluation command
src/lobster/callbacks/_peer_evaluation_callback.py Type annotation updates in the PEER evaluation callback
src/lobster/callbacks/init.py Export of the new UMAP visualization callback
src/lobster/callbacks/README.md Documentation for creating evaluation callbacks
src/lobster/init.py Public API and module imports update
pyproject.toml Dependency and entry point updates
docs/CONTRIBUTORS.md Removal of outdated contributor list
README.md Installation and usage instructions update

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant code is here

results : dict[str, Any]
Dictionary of callback name -> evaluation results, which can be metrics dictionary
or paths to generated files
issues : list[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this refers to or where it comes from

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was that if one evaluation fails, we want to complete the rest and just note all problems at the end of a report -- but we might as well just skip it instead and remove issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it and it's just logging issues now

@karinazad karinazad merged commit 69b2956 into main May 28, 2025
5 checks passed
@karinazad karinazad deleted the ume-eval branch May 28, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants