CVPR 2026
Xuefei (Julie) Wang, Kai A. Horstmann, Ethan Lin, Jonathan Chen, Alexander R. Farhang, Sophia Stiles,
Atharva Sehgal, Jonathan Light, David Van Valen, Yisong Yue, Jennifer J. Sun
Caltech · Cornell · UT Austin · RPI
AI agents (AutoGen + GPT-4.1/o3/Llama 3.3-70B) autonomously write and iteratively optimize preprocessing/postprocessing code for scientific image analysis. A function bank accumulates all generated solutions with metrics, feeding the best and worst back into prompts to guide exploration. Simple agent designs consistently outperform human-expert baselines across three production-level biomedical imaging tasks.
| Task | Tool | Metric | Expert | Agent (best) |
|---|---|---|---|---|
| Spot detection | Polaris / DeepCell | F1 | 0.841 | > 0.841 |
| Cell segmentation | Cellpose 3 (cyto3) | AP @ IoU 0.5 | 0.402 | > 0.402 |
| Medical segmentation | MedSAM | NSD + DSC | 0.820 | > 0.820 |
# Install uv if needed: curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv && source .venv/bin/activate
# Pick the extra for your task:
uv pip install -e ".[cellpose]" # Cell segmentation
uv pip install -e ".[polaris]" # Spot detection
uv pip install -e ".[medsam]" # Medical segmentation
uv pip install -e ".[dev]" # Development / testsUsing pip instead of uv
python -m venv .venv && source .venv/bin/activate
pip install -e ".[cellpose]" # or .[polaris], .[medsam]export OPENAI_API_KEY="sk-..."
# Task-specific:
export DEEPCELL_ACCESS_TOKEN="..." # Polaris onlypython main.py \
--dataset /path/to/data \
--experiment_name cellpose_segmentation \
--gpu_id 0 \
--random_seed 42 \
-k 3 \
--num_optim_iter 20 \
--history_threshold 5Results are saved to {experiment_name}/{timestamp}/preprocessing_func_bank.json.
-
Download data — see data utilities in
utils/(cellpose_data.py,spotdetection_data.py,medsam_data.py). -
Task-specific setup:
- MedSAM — download
medsam_vit_b.pth(instructions) and pass--checkpoint_path. - Polaris — set
DEEPCELL_ACCESS_TOKEN(instructions). - Cellpose — comment out
fill_holes_and_remove_small_masksindynamics.resize_and_compute_masks.
- MedSAM — download
-
Run experiments with appropriate
--experiment_name(spot_detection,cellpose_segmentation, ormedSAM_segmentation). -
Analyze trajectories:
python figs/{task_name}_analyze_trajectories.py --data_path /path/to/dataCreates
analysis_results/under each result folder with test-set evaluations and plots.
Integrate a custom scientific workflow by implementing four components:
| Component | File | What to implement |
|---|---|---|
| Tool wrapper | src/{task}.py |
__init__(), predict(), evaluate() |
| Prompts | prompts/{task}_prompts.py |
Inherit TaskPrompts; implement get_template_replacements(), get_task_details(), get_pipeline_metrics_info() |
| Expert baseline | prompts/{task}_expert_postprocessing.py.txt |
Reference solution for comparison |
| Registry entry | main.py → TASK_CONFIGS |
Prompt class, sampling function, extra kwargs |
See docs/adding_a_task.md for a full walkthrough.
All arguments
Core:
| Flag | Default | Description |
|---|---|---|
--dataset / -d |
— | Path to dataset |
--experiment_name |
— | Task name (e.g. cellpose_segmentation) |
--checkpoint_path |
— | Model checkpoint (MedSAM only) |
--gpu_id |
0 |
GPU device ID |
--random_seed |
42 |
Random seed |
--num_optim_iter |
20 |
Total optimization iterations |
--llm_model |
gpt-4.1 |
LLM to use |
--max_round |
20 |
Max conversation rounds per iteration |
--cache_seed |
4 |
AutoGen cache seed |
-k |
3 |
Function pairs per iteration |
Function bank:
| Flag | Default | Description |
|---|---|---|
--n_top |
3 |
Top-performing functions shown in prompt |
--n_worst |
3 |
Worst-performing functions shown in prompt |
--n_last |
0 |
Most recent functions shown in prompt |
--history_threshold |
0 |
Iterations before including function bank history |
AutoML (Optuna):
| Flag | Default | Description |
|---|---|---|
--hyper_optimize |
off | Enable hyperparameter search |
--n_hyper_optimize |
3 |
Functions to optimize |
--n_hyper_optimize_trials |
24 |
Optuna trials per function |
--hyper_optimize_interval |
5 |
Run AutoML every N iterations |
# Unit tests (no GPU/data required)
python -m pytest tests/ -v
# Integration tests (require data + task packages)
python -m tests.test_cellpose_segmentation --data_path /path/to/data
python -m tests.test_spotdetection
python -m tests.test_medsam_segmentation@inproceedings{wang2026simple,
title={Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization},
author={Wang, Xuefei and Horstmann, Kai A. and Lin, Ethan and Chen, Jonathan and Farhang, Alexander R. and Stiles, Sophia and Sehgal, Atharva and Light, Jonathan and Van Valen, David and Yue, Yisong and Sun, Jennifer J.},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
note={to appear}
}Apache License 2.0
