README.md

Data Manifests

This directory releases lightweight split manifests for the SkillOpt paper splits. These manifests are not full runnable benchmark payloads. To evaluate a benchmark, first materialize the full examples from the raw data source when needed, then point --split_dir at the split directory listed below.

In this README, "coverage" describes which part of the upstream benchmark the manifest references. It does not mean the released manifest directory contains the full runnable examples.

Layout

Every released manifest directory uses the same file layout:

data/<benchmark>_<manifest_type>/
|-- split_manifest.json
|-- train/items.json
|-- val/items.json
`-- test/items.json

split_manifest.json records source metadata, split counts, and item fields. Each items.json contains only stable IDs or source-path hints.

Released Splits

Manifest directory	Benchmark	Counts	Coverage	Raw data source	`split_dir`
`searchqa_id_split/`	SearchQA	400 / 200 / 1400	Official HF dataset IDs	lucadiliello/searchqa	`data/searchqa_split`
`livemathematicianbench_id_split/`	LiveMathematicianBench	35 / 18 / 124	Four official monthly files	LiveMathematicianBench/LiveMathematicianBench	`data/livemathematicianbench_split`
`docvqa_id_split/`	DocVQA	107 / 53 / 374	10% subset of validation	lmms-lab/DocVQA	`data/docvqa/splits`
`officeqa_id_split/`	OfficeQA	50 / 24 / 172	OfficeQA Full	databricks/officeqa	`data/officeqa_split`
`spreadsheetbench_id_split/`	SpreadsheetBench	80 / 40 / 280	SpreadsheetBench Verified 400	KAKA22/SpreadsheetBench	`data/spreadsheetbench_split`
`alfworld_path_split/`	ALFWorld	39 / 18 / 134	ALFWorld `json_2.1.1` paths	alfworld/alfworld	`data/alfworld_path_split`

Counts are ordered as train / val / test.

Direct Use

Only alfworld_path_split/ can be used directly as --split_dir from this release, because the ALFWorld loader reads gamefile and task_type from the split items.

This does not mean the ALFWorld raw data is included. You still need to download ALFWorld separately with alfworld-download and set $ALFWORLD_DATA to the data root containing json_2.1.1.

The other manifest directories are lookup manifests. They intentionally omit full example fields such as questions, answers, contexts, images, or task instructions. Materialize those benchmarks into the split_dir paths listed above before running SkillOpt.

Lookup Keys

The manifests are sufficient to locate the corresponding raw examples after the raw data has been downloaded or otherwise made available:

Benchmark	Manifest lookup key
SearchQA	Match `items.json[].id` to the `key` field in `lucadiliello/searchqa`.
LiveMathematicianBench	Open `source_file`, then match `no`; the manifest `id` is `<month>:<no>`.
DocVQA	Match `questionId` within the official DocVQA `validation` split; `image_path` records the expected local image path.
OfficeQA	Match `uid` in `officeqa_full.csv`; `source_files` and `source_docs` identify the supporting document.
SpreadsheetBench	Match `id`; `spreadsheet_path` identifies the referenced spreadsheet directory.
ALFWorld	Resolve `gamefile` relative to `$ALFWORLD_DATA`.

Manifest Item Examples

SearchQA:

{
  "id": "221c83e6630f4e7983da48fa28da1882"
}

LiveMathematicianBench:

{
  "id": "202602:22",
  "month": "202602",
  "no": 22,
  "paper_link": "http://arxiv.org/abs/2602.10700v1",
  "source_file": "data/202602/qa_202602_final.json"
}

DocVQA:

{
  "id": "50877",
  "questionId": "50877",
  "docId": "14724",
  "image_path": "data/docvqa_images/q50877_d14724.png",
  "source_split": "validation"
}

OfficeQA:

{
  "id": "UID0002",
  "uid": "UID0002",
  "category": "easy",
  "source_files": "treasury_bulletin_1944_01.txt"
}

SpreadsheetBench:

{
  "id": "32438",
  "spreadsheet_path": "spreadsheet/32438",
  "instruction_type": "Cell-Level Manipulation"
}

ALFWorld:

{
  "id": "train:0000",
  "gamefile": "json_2.1.1/train/.../game.tw-pddl",
  "task_type": "look_at_obj_in_light"
}

Benchmark Notes

SearchQA

searchqa_id_split/ is an ID-only manifest. Each released id exactly matches the key field in lucadiliello/searchqa.

Materialized examples must include the fields consumed by the SearchQA environment, including:

question
context
answers

LiveMathematicianBench

livemathematicianbench_id_split/ was generated from these raw files:

data/202511/qa_202511_final.json
data/202512/qa_202512_final.json
data/202601/qa_202601_final.json
data/202602/qa_202602_final.json

The manifest stores IDs in the loader format:

<month>:<no>

Materialized examples must include:

question
choices
correct_choice
theorem_type
theorem
sketch
paper_link

DocVQA

docvqa_id_split/ records docvqa_validation_10pct: a 10% subset sampled from the official DocVQA validation split.

source_split: validation
docvqa_validation_10pct: train=107, val=53, test=374

Each manifest item contains question/document IDs plus image location metadata. Materialized examples must provide question, answer or ground_truth, and an image_path that resolves locally.

OfficeQA

officeqa_id_split/ records the split over OfficeQA Full (officeqa_full.csv). The official OfficeQA CSVs are gated on Hugging Face, so materialization requires authorized access.

Each manifest item contains uid, category, source_files, and source_docs hints. Materialized examples must include question and ground_truth or answer.

SpreadsheetBench

spreadsheetbench_id_split/ records the split over SpreadsheetBench Verified 400, from spreadsheetbench_verified_400.tar.gz.

Each manifest item contains task identity metadata such as id, spreadsheet_path, and instruction_type. Materialization must also place the referenced spreadsheet directories at:

data/spreadsheetbench_verified_400

ALFWorld

alfworld_path_split/ records gamefile paths relative to $ALFWORLD_DATA. The source payload is json_2.1.1, which must be downloaded separately with alfworld-download.

This manifest can be used directly as --split_dir after $ALFWORLD_DATA points to the local ALFWorld data root containing json_2.1.1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Manifests

Layout

Released Splits

Direct Use

Lookup Keys

Manifest Item Examples

Benchmark Notes

SearchQA

LiveMathematicianBench

DocVQA

OfficeQA

SpreadsheetBench

ALFWorld

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Data Manifests

Layout

Released Splits

Direct Use

Lookup Keys

Manifest Item Examples

Benchmark Notes

SearchQA

LiveMathematicianBench

DocVQA

OfficeQA

SpreadsheetBench

ALFWorld