DataNarrative 📊📖

Welcome to DataNarrative, a dataset and benchmark for automated data-driven storytelling using visualizations and text. This repository contains structured datasets from diverse sources including Tableau, Pew, and GapMinder for training and evaluating LLMs and VLMs in visual-textual narrative generation.

📁 Repository Structure

├── Train
│   ├── Pew
│   │   ├── pew_train_images_final/
│   │   └── pew_train.zip
│   ├── Tableau
│   │   ├── tab_001/
│   │   ├── tab_005/
│   │   ├── ... (more tab folders)
|   |   └── tableau_train.json
├── Test
│   ├── GapMinder
│   │   ├── gap_001/
│   │   ├── gap_002/
│   │   └── ... (more gap folders)
│   │   └── gapminder_test.json
│   ├── Pew
│   │   ├── multiColumn/
│   │   ├── singleColumn/
│   │   └── pew_test.json
│   ├── Tableau
│   │   ├── tab_002/
│   │   ├── tab_003/
│   │   ├── ... (more tab folders)
|   |   └── tableau_test.json

Dataset Overview

1. Train Set

Pew:
- Contains images in pew_train_images_final/.
- A zipped version of the metadata is available as pew_train.zip.
Tableau:
- Folders named tab_001, tab_005, ..., each includes:
  - Chart images
  - Tableau workbooks
  - Associated datasets
  - Metadata linked via tableau_train.json

2. Test Set

GapMinder:
- Contains folders like gap_001, gap_002, ..., with charts and data.
- Metadata is provided in gapminder_test.json.
Pew:
- multiColumn/ and singleColumn/: Contain test charts.
- Metadata and narrative intents are stored in pew_test.json.
Tableau:
- Folders like tab_002, tab_003, ..., similar in structure to the training Tableau set.
- Metadata is available in tableau_test.json.

File Access and Metadata 🔑

Test Sets

Metadata: The .json files (e.g., pew_test.json, gapminder_train.json) contain comprehensive metadata for each data point. Key fields include:
- topic_name: The general topic of the article/data (e.g., "Politics & Policy").
- topic_link: URL to the topic page on the source website.
- intent: The title or main subject of the specific article or visualization.
- article_link: Direct URL to the source article.
- paragraph_table_pair: An array linking narrative paragraphs (paragraph) to associated data visualizations. Each entry contains:
  - table_id or table_path: Identifier for the source data file.
  - chart_image: Path to the corresponding chart image (e.g., multiColumn/imgs/128.png).
  - table: Raw text data associated with the chart.
  - chart_type: Type of the visualization (e.g., "bar", "line").
  - title: Title of the chart.
  - vis_spec: Visualization specifications (often in a structured format like JSON within the string).
  - gem_table, gpt_table: Processed or alternative table formats generated by GPT-4o and Gemini-1.5-Pro (present in Pew data).
Chart Images: Located in the imgs/ subdirectories within pew/ or gap/. These are typically .png files named according to their table_id.
Data Files: Located in the data/ subdirectories within pew/ or gap/. These are usually .txt or .csv files containing the raw data used for the charts.

Train Sets

Metadata: The .json files (e.g., tableau_train.json) contain metadata for training stories. Key fields include:
- tab_id: Unique identifier for the Tableau story (e.g., "tab_001").
- topic_name: The general topic (e.g., "Environment", "Economy").
- story_link: A link to a Google Drive folder containing associated files.
- intent: The title of the Tableau story.
- paragraph_table_pair: Links narrative paragraphs (paragraph) to visualizations. Includes:
  - table_path: Path to the data file.
  - chart_image: Path to the chart image.
  - chart_type: Type of visualization.
  - vis_spec: Visualization specifications.
Files: Located within the tab/ subdirectory:
- imgs/: Contains chart images (.png).
- workbooks/: Contains Tableau workbook files (.twbx).
- data/: Contains the source data files (often .csv).

How to Access Files

Each tab_* folder contains:
- .png chart images.
- .csv or .twbx Tableau workbook files.
- Metadata that links the narrative to visualizations.
JSON Files (tableau_test.json, gapminder_test.json, pew_test.json, pew_train.json) contain relevant mappings:
- Topic categories
- Chart-specific narrative intents
- Data paths for visualization and narrative context

Example Usage

You can find an example implementation of the multi-step LLM-Agent framework with GPT-Agent in the following notebook: Sample LLM-Agentic framework implementation.

For a detailed description of the framework, please refer to the paper.

📬 Contact

For any queries, feel free to reach out at saidulis@yorku.ca ✉️

Citation 📜

If you use this dataset in your research, please cite the following paper:

@inproceedings{islam-etal-2024-datanarrative,
    title = "{D}ata{N}arrative: Automated Data-Driven Storytelling with Visualizations and Texts",
    author = "Islam, Mohammed Saidul  and
      Laskar, Md Tahmid Rahman  and
      Parvez, Md Rizwan  and
      Hoque, Enamul  and
      Joty, Shafiq",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "[https://aclanthology.org/2024.emnlp-main.1073/](https://aclanthology.org/2024.emnlp-main.1073/)",
    doi = "10.18653/v1/2024.emnlp-main.1073",
    pages = "19253--19286",
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Test		Test
Train		Train
README.md		README.md
story_generation_multi_agent_framework.ipynb		story_generation_multi_agent_framework.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataNarrative 📊📖

📁 Repository Structure

Dataset Overview

1. Train Set

2. Test Set

File Access and Metadata 🔑

Test Sets

Train Sets

How to Access Files

Example Usage

📬 Contact

Citation 📜

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataNarrative 📊📖

📁 Repository Structure

Dataset Overview

1. Train Set

2. Test Set

File Access and Metadata 🔑

Test Sets

Train Sets

How to Access Files

Example Usage

📬 Contact

Citation 📜

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages