Skip to content

analysis-bots/WikiTabGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating Tables from the Parametric Knowledge of Language Models

This repository contains WikiTabGen - a benchmark for evaluating LLM capabilities in on-demand table generation.

The benchmark includes 100 tables curated and processed from the WikiTables Project. The tables feature a diverse set of properties: length, width, amount of numerical data, and popularity.

LeaderBoard

This is our current leaderboard, evaluating the LLMs ability to generate the correct data in the key columns, non-key columns and overall:

Rank LLM Method Keys F1 Non-Keys F1 Overall F1
1 GPT-4o Row-by-row 53.5% 13.8% 20.8%
2 LLama3.1-70B Full-Table 49.9% 13.1% 20.0%
3 GPT-4 Row-by-row 53.7% 12.2% 19.6%
4 LLama3.1-70B Row-by-row 50.0% 12.2% 19.0%
5 GPT-4 Cell-by-cell 53.7% 11.1% 18.6%
6 GPT-4 Full-Table 43.8% 11.5% 17.5%
7 GPT-4o Full-Table 40.3% 10.5% 16.3%
8 GPT-3.5 Full-Table 46.4% 9.6% 16.1%
9 GPT-3.5 Cell-by-cell 49.4% 7.6% 14.6%
10 GPT-3.5 Row-by-row 49.4% 7.2% 14.3%

Usage

Examples for GPT-3.5 for all prompting methods (full table, row-by-row, and cell-by-cell) are available in the example_notebooks folder. You need to set your open.api_key in the Imports section. Upon successful execution, a results folder will be created with the tables subfolder containing generated tables in CSV format, and a result.json file with the logs of prompts and LLM responses.

Evaluation

To produce the evaluation metrics of your experiment, run the notebook example_notebooks/Metrics_calculation.ipynb. You need to set the value of tables_folder (path to CSV files generated by LLM) and result_folder (path to the folder where you want to save the metrics report). The notebook will calculate the metrics and save the report in CSV format in the result_folder.

More

If you encounter any errors or observe unexpected behavior, please report the issue to us.

About

A benchmark dataset for LLM-based generation of tabular data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors