Skip to content

Latest commit

 

History

History
223 lines (168 loc) · 6.64 KB

File metadata and controls

223 lines (168 loc) · 6.64 KB

Data Manifests

This directory releases lightweight split manifests for the SkillOpt paper splits. These manifests are not full runnable benchmark payloads. To evaluate a benchmark, first materialize the full examples from the raw data source when needed, then point --split_dir at the split directory listed below.

In this README, "coverage" describes which part of the upstream benchmark the manifest references. It does not mean the released manifest directory contains the full runnable examples.

Layout

Every released manifest directory uses the same file layout:

data/<benchmark>_<manifest_type>/
|-- split_manifest.json
|-- train/items.json
|-- val/items.json
`-- test/items.json

split_manifest.json records source metadata, split counts, and item fields. Each items.json contains only stable IDs or source-path hints.

Released Splits

Manifest directory Benchmark Counts Coverage Raw data source split_dir
searchqa_id_split/ SearchQA 400 / 200 / 1400 Official HF dataset IDs lucadiliello/searchqa data/searchqa_split
livemathematicianbench_id_split/ LiveMathematicianBench 35 / 18 / 124 Four official monthly files LiveMathematicianBench/LiveMathematicianBench data/livemathematicianbench_split
docvqa_id_split/ DocVQA 107 / 53 / 374 10% subset of validation lmms-lab/DocVQA data/docvqa/splits
officeqa_id_split/ OfficeQA 50 / 24 / 172 OfficeQA Full databricks/officeqa data/officeqa_split
spreadsheetbench_id_split/ SpreadsheetBench 80 / 40 / 280 SpreadsheetBench Verified 400 KAKA22/SpreadsheetBench data/spreadsheetbench_split
alfworld_path_split/ ALFWorld 39 / 18 / 134 ALFWorld json_2.1.1 paths alfworld/alfworld data/alfworld_path_split

Counts are ordered as train / val / test.

Direct Use

Only alfworld_path_split/ can be used directly as --split_dir from this release, because the ALFWorld loader reads gamefile and task_type from the split items.

This does not mean the ALFWorld raw data is included. You still need to download ALFWorld separately with alfworld-download and set $ALFWORLD_DATA to the data root containing json_2.1.1.

The other manifest directories are lookup manifests. They intentionally omit full example fields such as questions, answers, contexts, images, or task instructions. Materialize those benchmarks into the split_dir paths listed above before running SkillOpt.

Lookup Keys

The manifests are sufficient to locate the corresponding raw examples after the raw data has been downloaded or otherwise made available:

Benchmark Manifest lookup key
SearchQA Match items.json[].id to the key field in lucadiliello/searchqa.
LiveMathematicianBench Open source_file, then match no; the manifest id is <month>:<no>.
DocVQA Match questionId within the official DocVQA validation split; image_path records the expected local image path.
OfficeQA Match uid in officeqa_full.csv; source_files and source_docs identify the supporting document.
SpreadsheetBench Match id; spreadsheet_path identifies the referenced spreadsheet directory.
ALFWorld Resolve gamefile relative to $ALFWORLD_DATA.

Manifest Item Examples

SearchQA:

{
  "id": "221c83e6630f4e7983da48fa28da1882"
}

LiveMathematicianBench:

{
  "id": "202602:22",
  "month": "202602",
  "no": 22,
  "paper_link": "http://arxiv.org/abs/2602.10700v1",
  "source_file": "data/202602/qa_202602_final.json"
}

DocVQA:

{
  "id": "50877",
  "questionId": "50877",
  "docId": "14724",
  "image_path": "data/docvqa_images/q50877_d14724.png",
  "source_split": "validation"
}

OfficeQA:

{
  "id": "UID0002",
  "uid": "UID0002",
  "category": "easy",
  "source_files": "treasury_bulletin_1944_01.txt"
}

SpreadsheetBench:

{
  "id": "32438",
  "spreadsheet_path": "spreadsheet/32438",
  "instruction_type": "Cell-Level Manipulation"
}

ALFWorld:

{
  "id": "train:0000",
  "gamefile": "json_2.1.1/train/.../game.tw-pddl",
  "task_type": "look_at_obj_in_light"
}

Benchmark Notes

SearchQA

searchqa_id_split/ is an ID-only manifest. Each released id exactly matches the key field in lucadiliello/searchqa.

Materialized examples must include the fields consumed by the SearchQA environment, including:

question
context
answers

LiveMathematicianBench

livemathematicianbench_id_split/ was generated from these raw files:

data/202511/qa_202511_final.json
data/202512/qa_202512_final.json
data/202601/qa_202601_final.json
data/202602/qa_202602_final.json

The manifest stores IDs in the loader format:

<month>:<no>

Materialized examples must include:

question
choices
correct_choice
theorem_type
theorem
sketch
paper_link

DocVQA

docvqa_id_split/ records docvqa_validation_10pct: a 10% subset sampled from the official DocVQA validation split.

source_split: validation
docvqa_validation_10pct: train=107, val=53, test=374

Each manifest item contains question/document IDs plus image location metadata. Materialized examples must provide question, answer or ground_truth, and an image_path that resolves locally.

OfficeQA

officeqa_id_split/ records the split over OfficeQA Full (officeqa_full.csv). The official OfficeQA CSVs are gated on Hugging Face, so materialization requires authorized access.

Each manifest item contains uid, category, source_files, and source_docs hints. Materialized examples must include question and ground_truth or answer.

SpreadsheetBench

spreadsheetbench_id_split/ records the split over SpreadsheetBench Verified 400, from spreadsheetbench_verified_400.tar.gz.

Each manifest item contains task identity metadata such as id, spreadsheet_path, and instruction_type. Materialization must also place the referenced spreadsheet directories at:

data/spreadsheetbench_verified_400

ALFWorld

alfworld_path_split/ records gamefile paths relative to $ALFWORLD_DATA. The source payload is json_2.1.1, which must be downloaded separately with alfworld-download.

This manifest can be used directly as --split_dir after $ALFWORLD_DATA points to the local ALFWorld data root containing json_2.1.1.