Structured codebase context for LLMs. Graphite turns JVM bytecode into a queryable program graph — so AI agents can understand your codebase without reading every file.
LLMs working with code face a fundamental constraint: context windows are finite, but codebases are not.
Dumping source files into a prompt is wasteful. Most tokens describe boilerplate, imports, and formatting — not the relationships that matter. An LLM trying to understand "what calls this method?" or "what constants flow into this API?" must read hundreds of files to answer questions that a graph can answer in milliseconds.
Graphite builds a program graph from compiled bytecode — nodes are program elements (methods, fields, constants, call sites), edges are relationships (dataflow, calls, type hierarchy). LLMs query the graph instead of reading source code.
Before Graphite: Feed 500 source files (~2M tokens) to find AB test IDs.
With Graphite: Query graph.callSites(pattern) → get 23 constants in 12 tokens.
| Relationship | Example | LLM Use Case |
|---|---|---|
| Dataflow | x = 42; foo(x) → constant 42 flows to foo |
Track config values, feature flags, API keys |
| Call graph | UserService.save() calls Repository.insert() |
Understand execution paths without reading source |
| Type hierarchy | AdminUser extends User implements Auditable |
Resolve polymorphism, find implementations |
| Annotations | @GetMapping("/api/users") on listUsers() |
Discover endpoints, serialization rules, DI config |
| Lambda/method ref | items.stream().map(User::getName) |
Trace functional pipelines |
| Resources | config/application.yml inside a fat JAR |
Cross-reference code with config files |
| Task | Raw Source | Graphite Query | Reduction |
|---|---|---|---|
| Find all AB test IDs | ~500 files, 2M tokens | callSites + backwardSlice → 23 results |
99.99% |
| Map REST endpoints | ~200 controllers, 800K tokens | memberAnnotations scan → structured list |
99.9% |
| Find dead code | Entire codebase, 5M tokens | branchScopes + callSites → dead paths |
99.99% |
| Resolve type hierarchy | ~100 files per type chain | supertypes / subtypes → direct answer |
99% |
Graphite uses Cypher (the industry-standard graph query language) for querying. The Cypher engine is in the graphite-cypher module, powered by an ANTLR-based openCypher parser.
Tools like GitNexus, Aider, and most LLM code assistants use Tree-sitter for codebase understanding. Tree-sitter parses syntax — it sees text structure, not program semantics.
| Capability | Tree-sitter | Graphite |
|---|---|---|
| "What type is this variable?" | No — sees var x = foo(), can't resolve foo's return type |
Yes — full type resolution from bytecode |
| "What values flow into this parameter?" | No — can't cross method boundaries | Yes — inter-procedural backward slice |
| "Does this interface have implementations?" | Heuristic grep for class names | Yes — complete type hierarchy from class metadata |
| "What does this lambda actually call?" | No — invokedynamic is invisible in source |
Yes — MethodHandle extraction from bootstrap args |
| "Is this field used via reflection/DI?" | No — annotation semantics are opaque | Yes — annotation values are queryable data |
"What's the real type of Object fields?" |
No — requires dataflow across methods | Yes — cross-method field assignment tracking |
| Controller inheritance | No — can't resolve inherited annotations | Yes — walks type hierarchy for endpoint discovery |
The fundamental issue: Tree-sitter operates on syntax (one file at a time, no type resolution, no cross-file dataflow). Graphite operates on semantics (compiled bytecode with full type information, inter-procedural analysis, resolved generics).
For LLMs, this difference is critical. A syntax tree tells you what code looks like. A program graph tells you what code does.
# Install via Homebrew
brew tap johnsonlee/tap
brew install graphite graphite-explore
# Build a graph from your JAR
graphite build app.jar -o /data/app-graph --include com.example
# Query with Cypher
graphite query /data/app-graph \
"MATCH (c:IntConstant)-[:DATAFLOW*]->(cs:CallSiteNode)
WHERE cs.callee_class =~ 'com.example.*'
RETURN c.value, cs.callee_name"
# JSON output (for LLM consumption)
graphite query --format json /data/app-graph \
"MATCH (n:CallSiteNode) RETURN n.callee_name LIMIT 10"
# Launch the web UI
graphite-explore /data/app-graph --port 8080// Build graph from bytecode
val graph = JavaProjectLoader(LoaderConfig(
includePackages = listOf("com.example")
)).load(Path.of("/path/to/app.jar"))
// Cypher query
val result = graph.query("""
MATCH (c:IntConstant)-[:DATAFLOW*]->(cs:CallSiteNode)
WHERE cs.callee_class =~ 'com.example.*'
RETURN c.value, cs.callee_name
""")
result.rows.forEach { row ->
println("${row["c.value"]} -> ${row["cs.callee_name"]}")
}
// Programmatic query DSL
val results = Graphite.from(graph).query {
findArgumentConstants {
method {
declaringClass = "com.example.ab.AbClient"
name = "getOption"
}
argumentIndex = 0
}
}
// Annotations, dataflow analysis
val annotations = graph.memberAnnotations("com.example.User", "name")
val slice = DataFlowAnalysis(graph).backwardSlice(nodeId)
slice.constants() // all constant values that reach this node// Save to disk (WebGraph compressed format)
GraphStore.save(graph, Path.of("/data/app-graph"))
// Load — auto-adaptive based on graph size:
// < 1M nodes → eager (all in heap, fastest queries)
// >= 1M nodes → mmap (nodes off heap, 75% less memory)
val graph = GraphStore.load(Path.of("/data/app-graph"))
// Or force a specific strategy
val graph = GraphStore.load(dir, GraphStore.LoadMode.EAGER) // always in-heap
val graph = GraphStore.load(dir, GraphStore.LoadMode.MAPPED) // always mmapgraph.resources.list("**/*.xml").forEach { entry ->
println(entry.path) // e.g., "config/application.yml"
}graphite/
├── graphite-core/ # Graph interface, nodes, edges, analysis
├── graphite-cypher/ # Cypher query engine (ANTLR parser + executor)
├── graphite-sootup/ # SootUp bytecode → graph builder
├── graphite-webgraph/ # WebGraph disk persistence (BVGraph + LAW tools)
├── graphite-query/ # CLI: build, query, Cypher
└── graphite-explore/ # CLI: web visualization
Graphs are persisted using the WebGraph ecosystem:
| Data | Format |
|---|---|
| Adjacency | BVGraph (2-4 bits/edge) |
| Edge labels | Byte array in BVGraph order |
| Strings | FrontCodedStringList (prefix compression) |
| Node data | Compact binary with string table indices |
| Metadata | Compact binary with string table indices |
| Capability | Description |
|---|---|
| Constant tracking | Direct, local variable, field, cross-class, enum |
| Auto-boxing | Integer.valueOf() transparent handling |
| Lambda / method ref | invokedynamic → actual target resolution |
| Functional dispatch | Callbacks, return values, fields, varargs, conditionals |
| Controller inheritance | Endpoint discovery follows class hierarchy |
| Generic type analysis | ApiResponse<PageData<User>> nested structure |
| Branch reachability | Dead code via condition constant analysis |
| Annotations | Generic memberAnnotations() for any framework |
| Cypher queries | graph.query("MATCH ...") -- full openCypher read grammar |
| Resource access | Files inside JAR/WAR/fat JAR (nested JARs) |
Pluggable via GraphiteExtension SPI (ServiceLoader):
class MyExtension : GraphiteExtension {
override fun visit(sootClass: SootClass, context: GraphiteContext) {
// Extract domain-specific metadata during graph building
context.addMemberAnnotation(className, memberName, annotationFqn, values)
}
}Register in META-INF/services/io.johnsonlee.graphite.sootup.GraphiteExtension.
repositories {
mavenCentral()
}
dependencies {
implementation("io.johnsonlee.graphite:core:1.0.0-rc13")
implementation("io.johnsonlee.graphite:sootup:1.0.0-rc13")
// Optional: Cypher query support (graph.query("MATCH ..."))
implementation("io.johnsonlee.graphite:cypher:1.0.0-rc13")
// Optional: disk persistence (WebGraph format)
implementation("io.johnsonlee.graphite:webgraph:1.0.0-rc13")
}Connect LLMs to Graphite via Model Context Protocol:
npx graphite-mcpConfigure in Claude Code (~/.claude/settings.json):
{
"mcpServers": {
"graphite": {
"command": "npx",
"args": ["graphite-mcp"],
"env": { "GRAPHITE_URL": "http://localhost:8080" }
}
}
}Start the Explorer first, then LLMs can query the graph:
# Start Explorer
graphite-explore /path/to/saved-graph
# LLM can now use tools: cypher, nodes, methods, call_sites, annotations, etc.Copyright 2026 Johnson Lee
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0