Skip to content

blackwell-systems/gcf-typescript

Repository files navigation

Blackwell Systems CI License

gcf-typescript

TypeScript implementation of GCF — the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.

79% fewer input tokens than JSON. 63% fewer output tokens. 90.7% average comprehension accuracy across 10 models and 3 providers (four models hit 100%). 1,300+ LLM evaluations. Zero training.

Docs: gcformat.com · Playground · GCF vs TOON

Install

npm install @blackwell-systems/gcf

Zero dependencies. TypeScript-first. Includes CLI. Don't want to change code? Use the MCP proxy for zero-code adoption.

CLI

npx @blackwell-systems/gcf encode < payload.json    # JSON to GCF
npx @blackwell-systems/gcf decode < payload.gcf     # GCF to JSON
npx @blackwell-systems/gcf stats  < payload.json    # token comparison
Payload: 50 symbols, 20 edges

  JSON  ██████████████████████████████  4,200 tokens
  GCF   ████████░░░░░░░░░░░░░░░░░░░░░░  1,150 tokens

  Savings: 73% fewer tokens with GCF

Or install globally: npm install -g @blackwell-systems/gcf then use gcf directly.

Library

Quick Start

import { encodeGeneric } from '@blackwell-systems/gcf';

const output = encodeGeneric({
  employees: [
    { id: 1, name: 'Alice', department: 'Engineering', salary: 95000 },
    { id: 2, name: 'Bob', department: 'Sales', salary: 72000 },
  ],
});

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Decode

import { decode } from '@blackwell-systems/gcf';

const p = decode(input);
console.log(p.tool, p.symbols.length, 'symbols', p.edges.length, 'edges');

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

import { Session, encodeWithSession } from '@blackwell-systems/gcf';

const sess = new Session();

const out1 = encodeWithSession(payload1, sess); // full declarations
const out2 = encodeWithSession(payload2, sess); // reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Streaming Encode

Write GCF output incrementally as symbols and edges arrive. Zero buffering, O(1) memory per row. Ideal for MCP servers that walk large graphs or paginate results:

import { StreamEncoder } from '@blackwell-systems/gcf';

const enc = new StreamEncoder(writer, 'context_for_task', { tokenBudget: 5000 });

// Symbols emit immediately as they're discovered.
enc.writeSymbol({ qualifiedName: 'pkg.Auth', kind: 'function', score: 0.95, provenance: 'lsp', distance: 0 });
enc.writeSymbol({ qualifiedName: 'pkg.Server', kind: 'function', score: 0.60, provenance: 'lsp', distance: 1 });

// Edges emit immediately too.
enc.writeEdge({ source: 'pkg.Server', target: 'pkg.Auth', edgeType: 'calls' });

// Close emits the ## _summary trailer with final counts.
enc.close();

Output:

GCF tool=context_for_task budget=5000
## targets
@0 fn pkg.Auth 0.95 lsp
## related
@1 fn pkg.Server 0.60 lsp
## edges [?]
@0<@1 calls
## _summary symbols=2 edges=1 sections=targets:1,related:1,edges:1

The writer is any object with a write(s: string) method (Node.js streams, web WritableStreams, or a simple callback). Standard decode() handles streaming output with no changes.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

import { encodeDelta, type DeltaPayload } from '@blackwell-systems/gcf';

const delta: DeltaPayload = {
  tool: 'context_for_task',
  baseRoot: 'aaa111',
  newRoot: 'bbb222',
  removed: [{ qualifiedName: 'pkg.OldFunc', kind: 'function', score: 0, provenance: '', distance: 0 }],
  added: [{ qualifiedName: 'pkg.NewFunc', kind: 'function', score: 0.85, provenance: 'rwr', distance: 0 }],
  removedEdges: [],
  addedEdges: [],
  deltaTokens: 30,
  fullTokens: 200,
};

const output = encodeDelta(delta);

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any JS value (not just graph payloads) into GCF tabular format:

import { encodeGeneric } from '@blackwell-systems/gcf';

const output = encodeGeneric({
  employees: [
    { id: 1, name: 'Alice', department: 'Engineering', salary: 95000 },
    { id: 2, name: 'Bob', department: 'Sales', salary: 72000 },
  ],
});

Output:

## employees [2]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000

Works on objects, arrays, and primitives. Arrays of uniform objects get tabular rows. Nested objects use ## key section headers.

API

Function Description
encode(p: Payload): string Encode a graph payload to GCF text
encodeGeneric(data: unknown): string Encode any value to GCF tabular format
decode(input: string): Payload Parse GCF text back to a Payload
encodeWithSession(p: Payload, s: Session): string Encode with session deduplication
new StreamEncoder(w, tool, opts) Create a streaming encoder (zero-buffering)
encodeDelta(d: DeltaPayload): string Encode a delta (added/removed only)
new Session() Create a new session tracker

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Tracker for multi-call deduplication
KIND_ABBREV / KIND_EXPAND Bidirectional kind abbreviation maps

Benchmarks

1,300+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.

GCF TOON JSON
Comprehension (23 runs, 10 models) 90.7% 68.5% 53.6%
Generation (28 runs, 9 models) 5/5 1.0/5 5.0/5
Input tokens (500 symbols) 11,090 16,378 53,341
Output tokens (100 symbols) 5,976 8,937 16,121

GCF wins all 6 datasets on TOON's own benchmark. Full results: gcformat.com/guide/benchmarks

Links

More links

License

MIT - Dayna Blackwell