Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new model evaluation framework alongside a UMAP visualization callback to simplify model evaluation with structured markdown reports. Key changes include:
- Addition of UMAP visualization callback configuration and implementation.
- Creation of a generic evaluation framework that aggregates callback evaluations into a markdown report.
- Updates to command-line interfaces and documentation for evaluation and callback usage.
Reviewed Changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/lobster/hydra_config/evaluate.yaml | New evaluation configuration file |
| src/lobster/evaluation/_evaluate_model_with_callbacks.py | Implementation of the evaluation framework |
| src/lobster/evaluation/README.md | Documentation for using the evaluation framework |
| src/lobster/cmdline/evaluate.py | Command-line entry point for model evaluation |
| src/lobster/cmdline/init.py | Updates for including the new evaluation command |
| src/lobster/callbacks/_peer_evaluation_callback.py | Type annotation updates in the PEER evaluation callback |
| src/lobster/callbacks/init.py | Export of the new UMAP visualization callback |
| src/lobster/callbacks/README.md | Documentation for creating evaluation callbacks |
| src/lobster/init.py | Public API and module imports update |
| pyproject.toml | Dependency and entry point updates |
| docs/CONTRIBUTORS.md | Removal of outdated contributor list |
| README.md | Installation and usage instructions update |
There was a problem hiding this comment.
relevant code is here
| results : dict[str, Any] | ||
| Dictionary of callback name -> evaluation results, which can be metrics dictionary | ||
| or paths to generated files | ||
| issues : list[str] |
There was a problem hiding this comment.
not sure what this refers to or where it comes from
There was a problem hiding this comment.
the idea was that if one evaluation fails, we want to complete the rest and just note all problems at the end of a report -- but we might as well just skip it instead and remove issues
There was a problem hiding this comment.
removed it and it's just logging issues now
Added UMAP Visualization Callback and Generic Evaluation Framework
What's New
Technical Details
How to Use