Ume evaluation by karinazad · Pull Request #87 · prescient-design/lobster

karinazad · 2025-05-20T14:23:21Z

Added UMAP Visualization Callback and Generic Evaluation Framework

What's New

Added UMAP visualization callback for dimensionality reduction visualization of model embeddings
Created generic model evaluation framework that works with any compatible callbacks
Implemented evaluate_model_with_callbacks function for evaluating models with multiple callbacks
Added markdown report generation that formats results from different callback types
Added documentation explaining how to create evaluation-compatible callbacks
Created README for callbacks directory showing how to implement the evaluation interface
Created README for evaluation directory explaining the framework usage and report format

Technical Details

Callbacks must inherit from lightning.Callback and implement an evaluate method
The evaluation framework automatically detects if callbacks need dataloaders
Evaluation results are aggregated into a single comprehensive markdown report
Supported return types include metrics dictionaries, file paths, and nested structures
Command-line interface available via lobster_eval command

How to Use

Create callbacks that implement the evaluate method
Use evaluate_model_with_callbacks to run evaluation with multiple callbacks
Results are automatically formatted into a structured markdown report

Copilot

Pull Request Overview

This PR introduces a new model evaluation framework alongside a UMAP visualization callback to simplify model evaluation with structured markdown reports. Key changes include:

Addition of UMAP visualization callback configuration and implementation.
Creation of a generic evaluation framework that aggregates callback evaluations into a markdown report.
Updates to command-line interfaces and documentation for evaluation and callback usage.

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/lobster/hydra_config/evaluate.yaml	New evaluation configuration file
src/lobster/evaluation/_evaluate_model_with_callbacks.py	Implementation of the evaluation framework
src/lobster/evaluation/README.md	Documentation for using the evaluation framework
src/lobster/cmdline/evaluate.py	Command-line entry point for model evaluation
src/lobster/cmdline/init.py	Updates for including the new evaluation command
src/lobster/callbacks/_peer_evaluation_callback.py	Type annotation updates in the PEER evaluation callback
src/lobster/callbacks/init.py	Export of the new UMAP visualization callback
src/lobster/callbacks/README.md	Documentation for creating evaluation callbacks
src/lobster/init.py	Public API and module imports update
pyproject.toml	Dependency and entry point updates
docs/CONTRIBUTORS.md	Removal of outdated contributor list
README.md	Installation and usage instructions update

src/lobster/callbacks/_peer_evaluation_callback.py

…to ume-eval

karinazad · 2025-05-27T18:52:30Z

src/lobster/evaluation/_evaluate_model_with_callbacks.py

relevant code is here

ncfrey · 2025-05-27T23:05:26Z

src/lobster/evaluation/_evaluate_model_with_callbacks.py

+    results : dict[str, Any]
+        Dictionary of callback name -> evaluation results, which can be metrics dictionary
+        or paths to generated files
+    issues : list[str]


not sure what this refers to or where it comes from

the idea was that if one evaluation fails, we want to complete the rest and just note all problems at the end of a report -- but we might as well just skip it instead and remove issues

removed it and it's just logging issues now

karinazad added 4 commits May 20, 2025 10:03

eval

ce52127

Tests

6cc477f

tests

66c7e79

tests

b3f22ba

karinazad temporarily deployed to test.pypi.org May 20, 2025 14:28 — with GitHub Actions Inactive

Fix tests

dabe4ac

karinazad temporarily deployed to test.pypi.org May 20, 2025 14:34 — with GitHub Actions Inactive

karinazad added 2 commits May 20, 2025 11:16

add umap callback

fb1e31e

add umap callback

4263d1e

karinazad temporarily deployed to test.pypi.org May 20, 2025 15:17 — with GitHub Actions Inactive

eval

69a29fa

karinazad temporarily deployed to test.pypi.org May 20, 2025 17:15 — with GitHub Actions Inactive

evaluation

88d2a66

karinazad temporarily deployed to test.pypi.org May 20, 2025 17:53 — with GitHub Actions Inactive

readme

4c30589

karinazad temporarily deployed to test.pypi.org May 20, 2025 18:05 — with GitHub Actions Inactive

restore ume

7e9e213

karinazad changed the title ~~[Draft] Ume evaluation~~ Ume evaluation May 20, 2025

karinazad temporarily deployed to test.pypi.org May 20, 2025 18:25 — with GitHub Actions Inactive

max length

9d0fab3

karinazad temporarily deployed to test.pypi.org May 20, 2025 18:29 — with GitHub Actions Inactive

ncfrey requested a review from Copilot May 20, 2025 18:29

Copilot AI reviewed May 20, 2025

View reviewed changes

src/lobster/callbacks/_peer_evaluation_callback.py Outdated Show resolved Hide resolved

fix tests

dd24525

karinazad temporarily deployed to test.pypi.org May 20, 2025 19:22 — with GitHub Actions Inactive

tests

a6e3014

karinazad temporarily deployed to test.pypi.org May 21, 2025 00:16 — with GitHub Actions Inactive

karinazad added 2 commits May 20, 2025 21:18

config tasks

5bcd3bc

config tasks

c522020

karinazad temporarily deployed to test.pypi.org May 21, 2025 01:19 — with GitHub Actions Inactive

metadata

93e1cef

karinazad temporarily deployed to test.pypi.org May 22, 2025 01:31 — with GitHub Actions Inactive

temp umap

d1256de

karinazad temporarily deployed to test.pypi.org May 22, 2025 01:50 — with GitHub Actions Inactive

metadata

2368cbd

karinazad temporarily deployed to test.pypi.org May 22, 2025 12:35 — with GitHub Actions Inactive

umap

cac6134

karinazad temporarily deployed to test.pypi.org May 22, 2025 13:47 — with GitHub Actions Inactive

wandb resume

4ae2b89

karinazad temporarily deployed to test.pypi.org May 23, 2025 02:09 — with GitHub Actions Inactive

wandb resume

2b83b73

karinazad temporarily deployed to test.pypi.org May 23, 2025 02:17 — with GitHub Actions Inactive

wandb resume

4ec28eb

karinazad temporarily deployed to test.pypi.org May 23, 2025 02:21 — with GitHub Actions Inactive

Merge branch 'main' of https://github.com/prescient-design/lobster in…

eb0aa02

…to ume-eval

karinazad temporarily deployed to test.pypi.org May 27, 2025 17:38 — with GitHub Actions Inactive

ruff

4ca395a

karinazad temporarily deployed to test.pypi.org May 27, 2025 17:48 — with GitHub Actions Inactive

mock huggingface

c469fb1

karinazad temporarily deployed to test.pypi.org May 27, 2025 18:40 — with GitHub Actions Inactive

union operator in isinstance

5b836fd

karinazad temporarily deployed to test.pypi.org May 27, 2025 18:50 — with GitHub Actions Inactive

karinazad commented May 27, 2025

View reviewed changes

src/lobster/evaluation/_evaluate_model_with_callbacks.py

Copy link

Collaborator Author

karinazad May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant code is here

num workers

34a1441

karinazad temporarily deployed to test.pypi.org May 27, 2025 20:39 — with GitHub Actions Inactive

ncfrey approved these changes May 27, 2025

View reviewed changes

remove issues

7f45ea5

karinazad temporarily deployed to test.pypi.org May 28, 2025 16:49 — with GitHub Actions Inactive

karinazad merged commit 69b2956 into main May 28, 2025
5 checks passed

karinazad deleted the ume-eval branch May 28, 2025 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ume evaluation#87

Ume evaluation#87
karinazad merged 42 commits intomainfrom
ume-eval

karinazad commented May 20, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

karinazad May 27, 2025

Uh oh!

ncfrey May 27, 2025

Uh oh!

karinazad May 27, 2025

Uh oh!

karinazad May 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

karinazad commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

karinazad May 27, 2025

Choose a reason for hiding this comment

Uh oh!

ncfrey May 27, 2025

Choose a reason for hiding this comment

Uh oh!

karinazad May 27, 2025

Choose a reason for hiding this comment

Uh oh!

karinazad May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karinazad commented May 20, 2025 •

edited

Loading