MAC_Bench

🚀 MAC - A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

🌟 Features

Two Task Types: Image-to-Text and Text-to-Image understanding
Advanced Methods: DAD (Description and Deduction) methodology with multiple variants
Multiple Models: Support for GPT-4o, Qwen2.5-VL, Step-1V, Gemini, and more
Scientific Focus: Real scientific journal covers from Nature, Science, Cell, etc.

🚀 Quick Start

One-Click Installation

git clone https://github.com/mhjiang0408/MAC_Bench.git
cd MAC_Bench
chmod +x setup.sh
./setup.sh

The setup script automatically:

✅ Creates environment from environment.yml
✅ Installs CLI dependencies
✅ Downloads dataset from Hugging Face
✅ Sets up and verifies CLI tools

Three-Step Usage

1. Create Configuration

mac config template --output config.yaml --type example
# Edit config.yaml with your API keys

2. Run Experiment

mac run --config config.yaml

3. Analyze Results

mac analyze experiment/results/

⚙️ Configuration

Create your configuration file:

mac config template --output config.yaml --type basic

Example configuration:

models:
  - name: gpt-4o
    api_base: https://api.openai.com/v1
    api_key: sk-your-api-key
    prompt_template: Config/prompt_template/4_choice_template.json
    # If you are testing text2image tasks, you need to set prompt_template to Config/prompt_template/4_choice_template_given_cover_story.json
    resume: false
    resume_path: None
    num_workers: 4

data:
  data_path: MAC_Bench/image2text_info.csv
  output_folder: ./experiment/results/
  scaling_factor: 1.0
  num_options: 4
  type: image2text
  # If you are testing text2image tasks, you need to set type to text2image
  random_seed: 42

🎯 CLI Commands

Main Commands

Command	Purpose	Example
`mac run`	Run experiments	`mac run --config config.yaml`
`mac analyze`	Analyze results	`mac analyze experiment/results/`
`mac status`	Check system status	`mac status --detailed`
`mac config`	Manage configurations	`mac config validate config.yaml`

Common Options

For mac run:

--config config.yaml - Configuration file
--models gpt-4o - Run specific model
--scaling-factor 0.01 - Use 1% of data for testing
--dry-run - Preview without running
--verbose - Detailed output

For mac analyze:

--output reports/ - Output directory
--format html - Report format (json/csv/html/all)
--compare exp2.csv - Compare experiments
--detailed - Include detailed analysis

🔬 Example Workflows

Quick Test Run

# Test with 1% of data
mac run --config config.yaml --scaling-factor 0.01 --verbose

Full Experiment

# Run all models from config
mac run --config config.yaml

# Analyze with comprehensive reports
mac analyze experiment/results/ --output reports/ --format all

Compare Models

# Run specific models
mac run --config config.yaml --models gpt-4o --models qwen-vl-max

# Compare results
mac analyze results1.csv --compare results2.csv --plot

📊 Understanding Tasks

Image-to-Text Task

Input: Scientific journal cover image
Question: "Which of the following options best describe the cover image?"
Options: 4 text descriptions (A, B, C, D)
Goal: Select the most accurate description

Text-to-Image Task

Input: Journal cover story text
Question: "Which image best describes the cover story?"
Options: 4 candidate images (A, B, C, D)
Goal: Select the matching image

📁 Project Structure

MAC_Bench/
├── mac                          # CLI entry point
├── setup.sh                     # One-click installation
├── download_dataset.py          # Dataset download script
├── environment.yml              # Conda environment
├── requirements-cli.txt         # CLI dependencies
│
├── mac_cli/                     # CLI implementation
│   ├── commands/                # CLI commands
│   ├── core/                    # Core functionality  
│   └── utils/                   # Utilities
│
├── Config/                      # Configuration files
│   └── prompt_template/         # Prompt templates
│
├── Dataset/                     # Dataset construction scripts
├── experiment/                  # Experiment code
│   ├── method/                  # CoVR implementations
│   └── understanding/           # Task implementations
│
├── utils/                       # Core utilities
├── MAC_Bench/                   # Downloaded dataset
└── experiment/results/          # Experiment outputs

🐛 Troubleshooting

System Check

mac status --detailed  # Check what's wrong

Common Issues

Environment Problems:

conda env update -f environment.yml

Missing Dependencies:

pip install -r requirements-cli.txt

Dataset Download Issues:

python download_dataset.py  # Manual download

API Connection Problems:

mac status --check-apis --config config.yaml
# Check your API keys in config.yaml

📚 Dataset Information

The MAC_Bench dataset is available on 🤗 Hugging Face and contains:

Source Journals: Nature, Science, Cell, ACS Central Science
Cover Images: High-resolution scientific journal covers
Cover Stories: Corresponding textual descriptions
Task Variants: Image2Text and Text2Image understanding
Size: 10,000+ samples across multiple journals

Download Dataset

The dataset is automatically downloaded during setup, but you can also download it manually:

# Via Hugging Face
from datasets import load_dataset
dataset = load_dataset("mhjiang0408/MAC_Bench")

# Or via download script
python download_dataset.py

💡 Advanced Usage

Performance Optimization

# More workers for faster processing
mac run --config config.yaml --workers 8

# Resume interrupted experiments
mac run --config config.yaml --resume

Custom Analysis

# Group results by journal
mac analyze results/ --group-by journal

# Generate only JSON reports
mac analyze results/ --format json --no-plot

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📖 Citation

If you use MAC_Bench in your research, please cite our paper:

@misc{jiang2025maclivebenchmarkmultimodal,
      title={MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding}, 
      author={Mohan Jiang and Jin Gao and Jiahao Zhan and Dequan Wang},
      year={2025},
      eprint={2508.15802},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.15802}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAC_Bench

🌟 Features

🚀 Quick Start

One-Click Installation

Three-Step Usage

⚙️ Configuration

🎯 CLI Commands

Main Commands

Common Options

🔬 Example Workflows

Quick Test Run

Full Experiment

Compare Models

📊 Understanding Tasks

Image-to-Text Task

Text-to-Image Task

📁 Project Structure

🐛 Troubleshooting

System Check

Common Issues

📚 Dataset Information

Download Dataset

💡 Advanced Usage

Performance Optimization

Custom Analysis

🤝 Contributing

📄 License

📖 Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Config		Config
Dataset		Dataset
experiment		experiment
mac_cli		mac_cli
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_dataset.py		download_dataset.py
environment.yml		environment.yml
mac		mac
requirements-cli.txt		requirements-cli.txt
setup.py		setup.py
setup.sh		setup.sh

License

mhjiang0408/MAC_Bench

Folders and files

Latest commit

History

Repository files navigation

MAC_Bench

🌟 Features

🚀 Quick Start

One-Click Installation

Three-Step Usage

⚙️ Configuration

🎯 CLI Commands

Main Commands

Common Options

🔬 Example Workflows

Quick Test Run

Full Experiment

Compare Models

📊 Understanding Tasks

Image-to-Text Task

Text-to-Image Task

📁 Project Structure

🐛 Troubleshooting

System Check

Common Issues

📚 Dataset Information

Download Dataset

💡 Advanced Usage

Performance Optimization

Custom Analysis

🤝 Contributing

📄 License

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages