Skip to content

GCF vs TOON

GCF is smaller on all 6 datasets, more accurate at scale (90.7% vs 68.5% across 10 models and 3 providers), and has five features TOON structurally cannot add. TOON's own official decoder rejects LLM-generated TOON output on 7 of 9 models tested. All token claims tested on TOON's own benchmark with their datasets and their tokenizer.

Feature comparison

FeatureGCFTOON
Tabular encoding (arrays of objects)YesYes
Positional fields (no field names per row)YesYes
Pipe-separated rowsYesComma-separated
Nested object encoding## key sections + key=valueIndented key: value
Semi-uniform data (optional fields)Native (inline nested when present)Falls back to less efficient encoding
Local IDs for cross-referencingYes (@0, @1)No
Edge/relationship encodingYes (@0<@1 calls, ~4 tokens/edge)No (must repeat full identifiers, ~100 tokens/edge)
Session deduplicationYes (92.7% savings by 5th call)No
Delta encodingYes (81.2% savings on re-queries)No
Distance groupingYes (## targets, ## related)No
Graph-native (nodes + edges)Yes (graph profile)No
Generic data (any JSON)Yes (generic profile)Yes
Streaming encodeYes (true zero-buffering, O(1) memory, [?] + trailer)Output-side only (requires full value in memory)
Key folding (dotted paths)NoYes
LLM comprehension at 500 symbols90.7% avg (23 runs, 10 models)68.5% avg
LLM generation (output tokens)75% fewer than JSON40% fewer than JSON
Human-readableDense, agent-optimizedYAML-like, human-friendly
Zero dependenciesYesYes
Language supportGo, TypeScript, Python, Rust, Swift, KotlinTypeScript, Go
MCP proxy (zero-code adoption)YesYes ("Tooner")

Where GCF wins

1. Token efficiency on every data shape

Tested on TOON's own benchmark with their datasets and their tokenizer (gpt-tokenizer, o200k_base):

DatasetGCFTOONWinner
Semi-uniform event logs (2000 records)108,158154,032GCF 42% smaller
E-commerce orders (500, nested items)61,59373,246GCF 19% smaller
Employee records (2000 rows, flat)49,05549,966GCF 2% smaller
Analytics time-series (365 days, flat)8,3989,127GCF 8% smaller
GitHub repos (100 rows, flat)8,5768,744GCF 2% smaller
Deeply nested config (small)616618GCF 0.3% smaller
Mixed-structure total170,367227,896GCF 34% smaller
Flat-only total66,02967,837GCF 3% smaller

GCF wins on all 6 datasets. TOON has no token efficiency advantage on any data shape.

GCF's largest advantage is on semi-uniform data (42% smaller) because TOON's tabular format requires all rows to have identical fields. When data is semi-uniform (e.g., event logs where some records have nested error objects), TOON falls back to its less efficient nested encoding for the entire array. GCF handles this natively: primitive fields encode positionally, nested fields attach inline only when present.

Reproducible: blackwell-systems/toon@gcf-comparison

2. Edge encoding (the structural advantage)

TOON has no concept of references between records. Every relationship must spell out the full identifier of both endpoints:

TOON edges (repeated identifiers):

edges[3]{source,target,type}:
  github.com/org/repo/pkg.NewServer,github.com/org/repo/pkg.AuthMiddleware,calls
  github.com/org/repo/pkg.AuthMiddleware,github.com/org/repo/pkg.ValidateToken,calls
  github.com/org/repo/pkg.ValidateToken,github.com/org/repo/internal.TokenCache,references

GCF edges (local IDs):

## edges [3]
@0<@3 calls
@1<@0 calls
@6<@1 references

Same information. GCF: ~4 tokens per edge. TOON: ~30-100 tokens per edge depending on identifier length. This advantage grows with longer qualified names (common in Java/Go packages) and higher edge density (call graphs, dependency graphs).

This is a structural limitation of TOON. It cannot be fixed without adding a local-ID system, which would make it a different format.

3. Session deduplication (TOON can't do this)

In multi-turn LLM interactions, the same data appears across multiple tool responses. GCF tracks what's been sent and replaces known records with bare references:

Call 1: full declarations

GCF profile=graph tool=context_for_task symbols=15 edges=10 session=true
## targets
@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
@1 fn pkg.ValidateToken 0.72 lsp_resolved
...

Call 5: 92% bare references

GCF profile=graph tool=context_for_task symbols=22 edges=16 session=true
## targets
@0  # previously transmitted
@1  # previously transmitted
@2  # previously transmitted
@18 fn pkg.NewEndpoint 0.88 lsp_resolved
...
CallNew recordsBare refsSavings vs JSON
1100%0%84% (base GCF)
235%65%89%
320%80%91%
58%92%92.7%

TOON retransmits every record every time. It has no session concept. By the 5th tool call in a conversation, GCF is using 92.7% fewer tokens than JSON while TOON is still at ~69%.

This isn't a feature that can be bolted on. Session dedup requires the format to support bare references (@N # previously transmitted), which requires local IDs (@N), which TOON doesn't have.

4. Delta encoding (TOON can't do this)

When the LLM re-queries and the data changed slightly, GCF sends only the diff:

GCF profile=graph tool=context_for_task delta=true base_root=aaa new_root=bbb savings=81%
## removed
fn pkg.OldHandler
## added
@0 fn pkg.NewHandler 0.85 rwr
## edges_removed
pkg.Router -> pkg.OldHandler calls
## edges_added
pkg.Router -> pkg.NewHandler calls

81.2% savings on re-queries in production. TOON must retransmit the entire payload even if one record changed.

5. Distance grouping (semantic structure)

GCF encodes how far each record is from the query center:

## targets       ← direct matches (distance 0)
@0 fn pkg.Auth 0.92 lsp
## related       ← one hop away (distance 1)
@3 fn pkg.Server 0.65 lsp
## extended      ← broader context (distance 2)
@6 type pkg.Cache 0.41 structural

The LLM immediately knows what's most relevant without scanning the entire payload. TOON encodes all records in a flat list with no semantic grouping.

LLM generation: TOON fails, GCF doesn't

28 generation runs across 9 models and 3 providers. Same data, same prompt structure, output validated through real decoders (including TOON's official toon-go library).

ModelGCFTOONJSON
Claude Opus 4.65/50/55/5
Claude Sonnet 4.65/52-3/55/5
GPT-5.54-5/51-2/55/5
GPT-5.45/50/55/5
Gemini 2.5 Pro5/51/55/5
Gemini 3.1 Pro5/50/55/5

Generation Validity by Model

TOON's official decoder rejects the output on 7 of 9 models. The failure is structural: TOON's flat columns require the model to encode semantic categories as integers. When told "this symbol is a target," the model writes target in the distance column. TOON's decoder expects 0. Every model fails to perform this mapping unprompted.

GCF expresses distance through section placement (## targets, ## related). No integer mapping required. The format aligns with how LLMs naturally express grouped data.

The Distance Label Problem

When TOON is given pre-encoded integers (hand-holding the model through the mapping to compensate for their fragile format), performance improves on some models but is still inconsistent. Even in the best case, TOON output is 28% larger than GCF.

GCF output is 63% smaller than JSON and 33% smaller than TOON at 100 symbols. See the full generation data for all runs.

TOON's comprehension benchmarks don't test at scale

TOON's retrieval accuracy benchmark uses datasets of 100 rows or fewer and reports a 1.4 percentage point accuracy improvement over JSON (76.4% vs 75.0%). At this scale, all formats perform similarly because JSON's structural noise hasn't yet overwhelmed the model's attention.

GCF's comprehension eval tests at 500 symbols with 200 edges across 10 models and 3 providers (Anthropic, OpenAI, Google):

FormatAvg accuracy (10 models)Tokens
GCF90.7%11,090
TOON68.5%16,378
JSON53.6%53,341

23 runs. GCF wins 22, ties 1, loses 0. Four models achieve 100% on GCF (Sonnet, Gemini 2.5 Pro, Gemini 3.1 Pro, Gemini 3.5 Flash). TOON never hits 100%. The difference between formats is invisible at 100 rows and undeniable at 500.

TOON publishes zero multi-model comprehension data and zero generation validity data.

Reproduce it yourself: git clone github.com/blackwell-systems/gcf-go && cd gcf-go/eval && GOWORK=off go test -run TestComprehension -v -timeout 0

"But GCF isn't human-readable"

Neither is protobuf. Neither are HTTP headers. Readability is a last-mile rendering concern, not a wire format property.

The agent reads GCF (cheap, 79% fewer tokens in the context window), does its work, then calls decode() at the end if a human needs to see the result. The context window savings are already banked. The decode costs one function call.

TOON optimizes for the case where a human is scanning the raw wire format. GCF optimizes for the case where an agent is consuming it and a human can view the decoded output if they need to. The second case is the common case. The first case is debugging.

Where TOON wins

Nowhere. GCF wins on all 6 datasets in TOON's own benchmark. The closest result is deeply nested configuration (616 vs 618, a 2-token difference). TOON's encodeLines() is output-side streaming only (the full value must be in memory before encoding starts). GCF's StreamEncoder is true input-side streaming with zero buffering and O(1) memory per row. See the streaming guide for the full comparison.

The bottom line

GCF does everything TOON does, plus five things TOON structurally cannot add without becoming a different format:

  • Local IDs and edge encoding (requires @N references)
  • Session deduplication (requires bare references, which require local IDs)
  • Delta encoding (requires content-addressed identity)
  • Distance grouping (requires semantic section headers)
  • True streaming encode (requires deferred counts + trailer; TOON spec mandates upfront [N])

On TOON's own benchmark, GCF wins all 6 datasets.

The gap widens over time. First call: GCF saves 34% vs TOON. Fifth call: GCF saves 92.7% vs JSON while TOON is stuck at 69%. No format change can close that gap without adding session state, which requires local IDs, which requires a fundamental redesign.

Try both formats in the playground with your own data.

Get started in 5 minutes with any of 6 languages.