Skip to content

blackwell-systems/gcf-kotlin

Repository files navigation

Blackwell Systems CI License

gcf-kotlin

Kotlin/JVM implementation of GCF -- the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.

79% fewer input tokens than JSON. 63% fewer output tokens. 90.7% average comprehension accuracy across 10 models and 3 providers (four models hit 100%). 1,300+ LLM evaluations. Zero training.

Docs: gcformat.com · Playground · GCF vs TOON

Install

Add the JitPack repository, then the dependency:

Gradle (Kotlin DSL)

repositories {
    maven("https://jitpack.io")
}

dependencies {
    implementation("com.github.blackwell-systems:gcf-kotlin:v0.5.0")
}

Gradle (Groovy)

repositories {
    maven { url 'https://jitpack.io' }
}

dependencies {
    implementation 'com.github.blackwell-systems:gcf-kotlin:v0.5.0'
}

Maven

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

<dependency>
    <groupId>com.github.blackwell-systems</groupId>
    <artifactId>gcf-kotlin</artifactId>
    <version>v0.5.0</version>
</dependency>

Don't want to change code? Use the MCP proxy for zero-code adoption.

Quick Start

import com.blackwellsystems.gcf.*

val output = encodeGeneric(mapOf(
    "employees" to listOf(
        mapOf("id" to 1, "name" to "Alice", "department" to "Engineering", "salary" to 95000),
        mapOf("id" to 2, "name" to "Bob", "department" to "Sales", "salary" to 72000),
    )
))

Output:

## employees [2]{department,id,name,salary}
Engineering|1|Alice|95000
Sales|2|Bob|72000

Graph Profile

val payload = Payload(
    tool = "context_for_task", tokenBudget = 5000, tokensUsed = 1847,
    symbols = listOf(
        Symbol(qualifiedName = "pkg.Auth", kind = "function", score = 0.78, provenance = "lsp", distance = 0),
        Symbol(qualifiedName = "pkg.Server", kind = "function", score = 0.54, provenance = "lsp", distance = 1),
    ),
    edges = listOf(Edge(source = "pkg.Server", target = "pkg.Auth", edgeType = "calls"))
)
val output = encode(payload)

Output:

GCF tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
## targets
@0 fn pkg.Auth 0.78 lsp
## related
@1 fn pkg.Server 0.54 lsp
## edges [1]
@0<@1 calls

Decode

val p = decode(input)
println("${p.tool} ${p.symbols.size} symbols ${p.edges.size} edges")

Throws DecodeException on invalid input.

Session Deduplication

Track transmitted symbols across multiple tool responses. Previously-sent symbols become bare references instead of full declarations:

val session = Session()

val out1 = encodeWithSession(payload1, session) // full declarations
val out2 = encodeWithSession(payload2, session) // reused symbols as "@N  # previously transmitted"

By the 5th call in a session: 92.7% token savings vs JSON.

Streaming Encode

Write GCF output incrementally as symbols and edges arrive. Zero buffering, O(1) memory per row:

val enc = StreamEncoder(writer, "context_for_task", StreamOptions(tokenBudget = 5000))

enc.writeSymbol(Symbol(qualifiedName = "pkg.Auth", kind = "function", score = 0.95, provenance = "lsp", distance = 0))
enc.writeEdge(Edge(source = "pkg.Server", target = "pkg.Auth", edgeType = "calls"))
enc.close()  // emits ## _summary trailer

Output uses [?] deferred counts and ## _summary trailer. Standard decode() handles streaming output with no changes. Thread-safe via @Synchronized.

Delta Encoding

When the consumer already has a prior context pack, send only what changed:

val delta = DeltaPayload(
    tool = "context_for_task",
    baseRoot = "aaa111",
    newRoot = "bbb222",
    removed = listOf(Symbol(qualifiedName = "pkg.OldFunc", kind = "function")),
    added = listOf(Symbol(qualifiedName = "pkg.NewFunc", kind = "function", score = 0.85, provenance = "rwr")),
    deltaTokens = 30,
    fullTokens = 200
)

val output = encodeDelta(delta)

81.2% savings on re-queries where the pack changed slightly.

Generic Encoding

Encode any value (not just graph payloads) into GCF tabular format:

val data = mapOf(
    "employees" to listOf(
        mapOf("id" to 1, "name" to "Alice", "department" to "Engineering", "salary" to 95000),
        mapOf("id" to 2, "name" to "Bob", "department" to "Sales", "salary" to 72000),
    )
)
val output = encodeGeneric(data)

Output:

## employees [2]{department,id,name,salary}
Engineering|1|Alice|95000
Sales|2|Bob|72000

Works on maps, lists, and primitives. Arrays of uniform maps get tabular rows. Nested maps use ## key section headers.

API

Function Description
encode(payload: Payload): String Encode a graph payload to GCF text
encodeGeneric(data: Any?): String Encode any value to GCF tabular format
decode(input: String): Payload Parse GCF text back to a Payload
encodeWithSession(payload: Payload, session: Session?): String Encode with session deduplication
encodeDelta(delta: DeltaPayload): String Encode a delta (added/removed only)
Session() Create a new session tracker (thread-safe)

Types

Type Purpose
Payload Full GCF payload: tool, budget, symbols, edges, pack root
Symbol Graph node: qualified name, kind, score, provenance, distance
Edge Directed relationship: source, target, edge type
DeltaPayload Diff between two packs: added/removed symbols and edges
Session Thread-safe tracker for multi-call deduplication
Components Score breakdown: blast radius, confidence, recency, distance
DecodeException Thrown on invalid GCF input
kindAbbrev / kindExpand Bidirectional kind abbreviation maps

Benchmarks

1,300+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.

GCF TOON JSON
Comprehension (23 runs, 10 models) 90.7% 68.5% 53.6%
Generation (28 runs, 9 models) 5/5 1.0/5 5.0/5
Input tokens (500 symbols) 11,090 16,378 53,341
Output tokens (100 symbols) 5,976 8,937 16,121

GCF wins all 6 datasets on TOON's own benchmark. Full results: gcformat.com/guide/benchmarks

Links

More links

License

MIT - Dayna Blackwell