A cooperative framework for generalist-specialist collaboration in medical AI.
[Paper] [MedDr] [Specialist Models]
GSCo is an innovative medical AI framework that achieves better performance through Generalist-Specialist collaboration. This project consists of three main components:
- MedDr (Generalist Foundation Model): An open-source medical generalist foundation model.
- Specialist Models: Task-specific expert models optimized for medical image classification and report generation.
- GSCo Framework: A collaborative framework that combines the strengths of Generalist and Specialist models.
- Medical Multimodal: Supports medical image analysis and report generation.
- Collaborative Inference: Enhances performance through Generalist-Specialist cooperation.
- Modular Design: Supports different specialist models and datasets.
We build our model based on InternVL. Please refer to the INSTALLATION.md to prepare the environment.
-
Demo Experience:
python3 demo.py
-
Model Training:
sh train.sh
-
Model Evaluation:
# Evaluate MedDr foundation model sh inference_meddr.sh # Generate Specialist predictions sh inference_specialist.sh # Evaluate GSCo collaborative framework sh inference_gsco.sh
Download the checkpoint and change the model_path in demo.py. The demo will be finished in 5 seconds (on a H800 GPU).
python3 demo.pyWe follow the format of InternVL to prepare the data. For example, the data format is as follows:
{
"id": 0,
"study_id": 50414267,
"subject_id": 10000032,
"split": "train",
"image": "files/p10/p10000032/s50414267/02aa804e-bde0afdd-112c0b34-7bc16630-4e384014.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nYou are a helpful medical assistant. Your task is report generation. You are given a chest x-ray image and you are required to generate a summary report about the image."
},
{
"from": "gpt",
"value": "There is no focal consolidation, pleural effusion or pneumothorax. Bilateral nodular opacities that most likely represent nipple shadows. The cardiomediastinal silhouette is normal. Clips project over the left lung, potentially within the breast. The imaged upper abdomen is unremarkable. Chronic deformity of the posterior left sixth and seventh ribs are noted."
}
]
}We also provide an example file data/meddr.json for your reference.
Please find more details about the datasets involved in the DATASET.md.
You need at least two 80GB GPUs (e.g., NVIDIA H800 GPU) to train the Generalist Foundation Model.
sh train.shWe provide the checkpoint of MedDr here.
We follow the training process in Rethinking Model Prototyping MedMNIST+ and extend it to more datasets. Training can be finished on a single 24GB GPU (e.g., NVIDIA RTX 4090 GPU).
Please find the checkpoints of the specialist models on GSCo_Specialist.
We employ R2GenGPT as our specialist model for medical report generation. We use the Official Checkpoint of R2GenGPT.
In this section, we evaluate the performance of MedDr, our generalist foundation model that can handle various medical imaging tasks.
We provide the test data of IU-XRay dataset for reproduction.
Please download the data and change the path of the dataset accordingly.
The metafile used in this demo is data/iu_mrg_meta.jsonl.
You need at least one 80GB GPU (e.g., NVIDIA H800 GPU) to evaluate the Generalist Foundation Model.
sh inference_meddr.shBefore evaluating GSCo, please generate predictions from Specialist models with the following script.
sh inference_specialist.shYou can modify the script parameters to adapt to different datasets and model architectures:
DATASET: Specify the dataset to useARCH: Specify the model architecture- Additional parameters can be configured in
src/config/config.yaml
You can find the data of PCam200 curated for the specialist model on PCam200 (password: GSCo). The metafile and the result of PCam200 dataset can be find on PCam200 Google Drive
After generating Specialist predictions, you can evaluate the performance of GSCo. GSCo achieves better medical AI performance through a Generalist-Specialist collaborative framework.
The metafile used in this demo is data/pcam200_meta.json.
You can find the data of PCam200 on PCam200 (password: GSCo).
You need at least one 80GB GPU (e.g., NVIDIA H800 GPU) to evaluate the GSCo Framework.
sh inference_gsco.sh- InternVL: Thanks for their efforts in the open-source community. InternVL is a highly valuable work that contributes significantly to the VLM domain.
- Rethinking Model Prototyping MedMNIST+: Thanks for their implementation of training and evaluation on MedMNIST+.
- R2GenGPT: Thanks for their great work in the medical report generation task.
If you find this work helpful, please consider citing:
@article{he2026generalizable,
title={Towards generalizable AI in medicine via Generalist-Specialist Collaboration},
author={Sunan He and Yuxiang Nie and Hongmei Wang and Shu Yang and Yihui Wang and Zhiyuan Cai and Zhixuan Chen and Yingxue Xu and Luyang Luo and Huiling Xiang and Xi Lin and Mingxiang Wu and Yifan Peng and George Shih and Ziyang Xu and Xian Wu and Qiong Wang and Ronald Cheong Kin Chan and Xiaohui Duan and Varut Vardhanabhuti and Winnie Chiu Wing Chu and Yefeng Zheng and Pranav Rajpurkar and Kang Zhang and Hao Chen},
journal={Nature Biomedical Engineering},
year={2026},
doi={10.1038/s41551-026-01653-3},
url={https://www.nature.com/articles/s41551-026-01653-3}
}