Code for our NAACL 2025 paper, Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches.
See requirements.txt.
See artifacts/misc for dataset loading scripts and the instructions to download the datasets.
We provide slurm scripts for downloading models, running prediction and scoring system generated summaries.
sbatch bash_scripts/download.slurm# for Llama-3.1-8B-Instruct on SummHay
# full context
bash bash_scripts/pred.sh Llama31_8B SummHay test
# hierarchical
bash bash_scripts/pred.sh Llama31_8B_Hierarchical SummHay test
# incremental
bash bash_scripts/pred.sh Llama31_8B_Incremental SummHay test
# retrieval-augmented using SFR-Embedding_2
bash bash_scripts/pred.sh Llama31_8B_RAG_SFR SummHay testSee src/configs/ for the full list of retrievers, summarizers and datasets used in our experiments.
We compute ROUGE and A3CU metrics.
# for Llama-3.1-8B-Instruct on SummHay
# full context
bash bash_scripts/score.sh Llama31_8B SummHay testWe share our system predictions and human evaluation data in this Google Drive folder.
This project is licensed under the MIT License. See the LICENSE file.
If you find this work useful, please consider citing our NAACL paper:
@misc{pratapa-mitamura-2025-scaling,
title={Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches},
author={Adithya Pratapa and Teruko Mitamura},
year={2025},
eprint={2502.06617},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.06617},
}