Skip to content

mhjiang0408/MAC_Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MAC_Bench

arXiv Hugging Face License: MIT

πŸš€ MAC - A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

🌟 Features

  • Two Task Types: Image-to-Text and Text-to-Image understanding
  • Advanced Methods: DAD (Description and Deduction) methodology with multiple variants
  • Multiple Models: Support for GPT-4o, Qwen2.5-VL, Step-1V, Gemini, and more
  • Scientific Focus: Real scientific journal covers from Nature, Science, Cell, etc.

πŸš€ Quick Start

One-Click Installation

git clone https://github.com/mhjiang0408/MAC_Bench.git
cd MAC_Bench
chmod +x setup.sh
./setup.sh

The setup script automatically:

  • βœ… Creates environment from environment.yml
  • βœ… Installs CLI dependencies
  • βœ… Downloads dataset from Hugging Face
  • βœ… Sets up and verifies CLI tools

Three-Step Usage

1. Create Configuration

mac config template --output config.yaml --type example
# Edit config.yaml with your API keys

2. Run Experiment

mac run --config config.yaml

3. Analyze Results

mac analyze experiment/results/

βš™οΈ Configuration

Create your configuration file:

mac config template --output config.yaml --type basic

Example configuration:

models:
  - name: gpt-4o
    api_base: https://api.openai.com/v1
    api_key: sk-your-api-key
    prompt_template: Config/prompt_template/4_choice_template.json
    # If you are testing text2image tasks, you need to set prompt_template to Config/prompt_template/4_choice_template_given_cover_story.json
    resume: false
    resume_path: None
    num_workers: 4

data:
  data_path: MAC_Bench/image2text_info.csv
  output_folder: ./experiment/results/
  scaling_factor: 1.0
  num_options: 4
  type: image2text
  # If you are testing text2image tasks, you need to set type to text2image
  random_seed: 42

🎯 CLI Commands

Main Commands

Command Purpose Example
mac run Run experiments mac run --config config.yaml
mac analyze Analyze results mac analyze experiment/results/
mac status Check system status mac status --detailed
mac config Manage configurations mac config validate config.yaml

Common Options

For mac run:

  • --config config.yaml - Configuration file
  • --models gpt-4o - Run specific model
  • --scaling-factor 0.01 - Use 1% of data for testing
  • --dry-run - Preview without running
  • --verbose - Detailed output

For mac analyze:

  • --output reports/ - Output directory
  • --format html - Report format (json/csv/html/all)
  • --compare exp2.csv - Compare experiments
  • --detailed - Include detailed analysis

πŸ”¬ Example Workflows

Quick Test Run

# Test with 1% of data
mac run --config config.yaml --scaling-factor 0.01 --verbose

Full Experiment

# Run all models from config
mac run --config config.yaml

# Analyze with comprehensive reports
mac analyze experiment/results/ --output reports/ --format all

Compare Models

# Run specific models
mac run --config config.yaml --models gpt-4o --models qwen-vl-max

# Compare results
mac analyze results1.csv --compare results2.csv --plot

πŸ“Š Understanding Tasks

Image-to-Text Task

  • Input: Scientific journal cover image
  • Question: "Which of the following options best describe the cover image?"
  • Options: 4 text descriptions (A, B, C, D)
  • Goal: Select the most accurate description

Text-to-Image Task

  • Input: Journal cover story text
  • Question: "Which image best describes the cover story?"
  • Options: 4 candidate images (A, B, C, D)
  • Goal: Select the matching image

πŸ“ Project Structure

MAC_Bench/
β”œβ”€β”€ mac                          # CLI entry point
β”œβ”€β”€ setup.sh                     # One-click installation
β”œβ”€β”€ download_dataset.py          # Dataset download script
β”œβ”€β”€ environment.yml              # Conda environment
β”œβ”€β”€ requirements-cli.txt         # CLI dependencies
β”‚
β”œβ”€β”€ mac_cli/                     # CLI implementation
β”‚   β”œβ”€β”€ commands/                # CLI commands
β”‚   β”œβ”€β”€ core/                    # Core functionality  
β”‚   └── utils/                   # Utilities
β”‚
β”œβ”€β”€ Config/                      # Configuration files
β”‚   └── prompt_template/         # Prompt templates
β”‚
β”œβ”€β”€ Dataset/                     # Dataset construction scripts
β”œβ”€β”€ experiment/                  # Experiment code
β”‚   β”œβ”€β”€ method/                  # CoVR implementations
β”‚   └── understanding/           # Task implementations
β”‚
β”œβ”€β”€ utils/                       # Core utilities
β”œβ”€β”€ MAC_Bench/                   # Downloaded dataset
└── experiment/results/          # Experiment outputs

πŸ› Troubleshooting

System Check

mac status --detailed  # Check what's wrong

Common Issues

Environment Problems:

conda env update -f environment.yml

Missing Dependencies:

pip install -r requirements-cli.txt

Dataset Download Issues:

python download_dataset.py  # Manual download

API Connection Problems:

mac status --check-apis --config config.yaml
# Check your API keys in config.yaml

πŸ“š Dataset Information

The MAC_Bench dataset is available on πŸ€— Hugging Face and contains:

  • Source Journals: Nature, Science, Cell, ACS Central Science
  • Cover Images: High-resolution scientific journal covers
  • Cover Stories: Corresponding textual descriptions
  • Task Variants: Image2Text and Text2Image understanding
  • Size: 10,000+ samples across multiple journals

Download Dataset

The dataset is automatically downloaded during setup, but you can also download it manually:

# Via Hugging Face
from datasets import load_dataset
dataset = load_dataset("mhjiang0408/MAC_Bench")

# Or via download script
python download_dataset.py

πŸ’‘ Advanced Usage

Performance Optimization

# More workers for faster processing
mac run --config config.yaml --workers 8

# Resume interrupted experiments
mac run --config config.yaml --resume

Custom Analysis

# Group results by journal
mac analyze results/ --group-by journal

# Generate only JSON reports
mac analyze results/ --format json --no-plot

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

License: MIT

πŸ“– Citation

If you use MAC_Bench in your research, please cite our paper:

@misc{jiang2025maclivebenchmarkmultimodal,
      title={MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding}, 
      author={Mohan Jiang and Jin Gao and Jiahao Zhan and Dequan Wang},
      year={2025},
      eprint={2508.15802},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.15802}, 
}

About

[COLM2025] MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published