colm-2024-paper-code

Code for the paper "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models"

Here we provide the code to replicate the COLM'24 paper "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models".

The code is organized as follows.

The files run_tabular_experiments.py run_time_series_experiments.py and run_statistical_experiments.py run the different experiments, that is sending queries to the LLM.
The LLM queries are saved to disk and analyzed in Jupyter Notebooks. These are contained in the notebooks folder. The notebooks generate the figures and tables in the paper.
The memorization tests can be directly performed with the tabmemcheck package, see the notebook memorization-tests.ipynb
The datasets folder contains CSV files.
The preprocessing folder contains notebooks that create the ACS Income, ACS Travel and ICU datasets.
The config folder contains prompt configurations and YAML files that specify the different dataset transforms.
The environment used to run the experiments is given in environment.yml

The data to replicate our results (LLM queries and responses) is available here.

Citation

@article{bordt2024colm,
  title={Elephants Never Forget: Memorization and Learning of Tabular Data in
  Large Language Models},
  author={Bordt, Sebastian and Nori, Harsha and Rodrigues, Vanessa and Nushi, Besmira and Caruana, Rich},
  journal={Conference on Language Modeling (COLM)},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Code for the paper "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models"

Citation

Name		Name	Last commit message	Last commit date
parent directory ..
config		config
datasets		datasets
notebooks		notebooks
preprocessing		preprocessing
results		results
README.md		README.md
environment.yml		environment.yml
run_statistical_experiments.py		run_statistical_experiments.py
run_tabular_experiments.py		run_tabular_experiments.py
run_time_series_experiments.py		run_time_series_experiments.py
statutils.py		statutils.py
tabular_queries.py		tabular_queries.py

FilesExpand file tree

colm-2024-paper-code

Directory actions

More options

Directory actions

More options

Latest commit

History

colm-2024-paper-code

Folders and files

parent directory

README.md

Code for the paper "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models"

Citation