Skip to content

adithya7/scaling-mds

Repository files navigation

Scaling Multi-document Summarization

Code for our NAACL 2025 paper, Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches.

Setup

See requirements.txt.

Data

See artifacts/misc for dataset loading scripts and the instructions to download the datasets.

Experiments

We provide slurm scripts for downloading models, running prediction and scoring system generated summaries.

Downloading models

sbatch bash_scripts/download.slurm

Generating summaries

# for Llama-3.1-8B-Instruct on SummHay
# full context
bash bash_scripts/pred.sh Llama31_8B SummHay test
# hierarchical
bash bash_scripts/pred.sh Llama31_8B_Hierarchical SummHay test
# incremental
bash bash_scripts/pred.sh Llama31_8B_Incremental SummHay test
# retrieval-augmented using SFR-Embedding_2
bash bash_scripts/pred.sh Llama31_8B_RAG_SFR SummHay test

See src/configs/ for the full list of retrievers, summarizers and datasets used in our experiments.

Scoring system generated summaries

We compute ROUGE and A3CU metrics.

# for Llama-3.1-8B-Instruct on SummHay
# full context
bash bash_scripts/score.sh Llama31_8B SummHay test

System Predictions and Human Eval

We share our system predictions and human evaluation data in this Google Drive folder.

License

This project is licensed under the MIT License. See the LICENSE file.

Reference

If you find this work useful, please consider citing our NAACL paper:

@misc{pratapa-mitamura-2025-scaling,
    title={Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches},
    author={Adithya Pratapa and Teruko Mitamura},
    year={2025},
    eprint={2502.06617},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2502.06617},
}

About

NAACL 2025 paper on 'Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors