Summary
New engine-level GraphBatchImporter that provides 6-11x speedup over the standard edge creation API for bulk graph loading scenarios. Designed to close the performance gap with Neo4j's batch importer.
Problem
Creating edges one-at-a-time through the standard Vertex API (even with transaction batching) is bottlenecked by:
- Random I/O to edge segments (each edge touches source + destination vertex)
- Per-record overhead: lock acquisition, validation, event callbacks, page space search
- Unsorted access patterns causing page cache thrashing at scale
Solution
GraphBatchImporter buffers edges in flat primitive arrays, then flushes them in optimized sorted passes:
try (GraphBatchImporter importer = GraphBatchImporter.builder(db)
.withExpectedEdgeCount(1_000_000) // auto-tunes batch size
.withLightEdges(false)
.withWAL(false)
.build()) {
RID[] vertices = importer.createVertices("Person", 100_000);
for (Edge e : edgeSource)
importer.newEdge(vertices[e.src], "KNOWS", vertices[e.dst],
"weight", e.weight, "timestamp", e.ts);
}
// All edges connected (both OUT and IN) when close() returns
Key Optimizations
┌───────────────────────────┬───────────────┬──────────────────────────────────────────────────────────────────────────────────────┐
│ Optimization │ Impact │ Description │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Sorted flush │ ~2-3x │ Outgoing edges sorted by source vertex → sequential segment access │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Deferred incoming edges │ ~1.5x │ IN edges accumulated across flushes, connected in one sorted pass at close() │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Vectorized segment writes │ ~1.3x │ addManyAtEndDirect() writes N edges with single getUsed/setUsed pair │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Exact-sized segments │ ~1.2x │ Pre-compute VLQ bytes needed → no wasted space in segments │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Fill-then-persist │ ~1.2x │ Segments filled in memory before createRecord (1 I/O vs 2) │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Deferred vertex updates │ ~1.4x │ Head chunk pointers batch-updated at close() instead of per-edge │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Bulk record creation │ ~1.3x (props) │ LocalBucket.createRecordsBulk() bypasses per-record lock/validate/findAvailableSpace │
├───────────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Auto-tuned batch size │ ~1.5-2x │ withExpectedEdgeCount() sets optimal batch = edgeCount/5 (100K-5M range) │
└───────────────────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
Benchmark Results
500K edges, 50K vertices:
┌──────────────────────────────────┬───────────┬───────────┬─────────┐
│ Method │ Time (ms) │ Edges/sec │ Speedup │
├──────────────────────────────────┼───────────┼───────────┼─────────┤
│ Standard API (tx/1000) │ 12,162 │ 41K │ 1.0x │
├──────────────────────────────────┼───────────┼───────────┼─────────┤
│ Old GraphImporter (integration) │ 8,623 │ 58K │ 1.4x │
├──────────────────────────────────┼───────────┼───────────┼─────────┤
│ GraphBatchImporter (light edges) │ 2,031 │ 246K │ 6.0x │
├──────────────────────────────────┼───────────┼───────────┼─────────┤
│ Standard API + properties │ 14,267 │ 35K │ 1.0x │
├──────────────────────────────────┼───────────┼───────────┼─────────┤
│ GraphBatchImporter + properties │ 2,149 │ 233K │ 6.6x │
└──────────────────────────────────┴───────────┴───────────┴─────────┘
Scaling with batch size (500K edges):
┌────────────┬───────────┬─────────┐
│ Batch Size │ Edges/sec │ Speedup │
├────────────┼───────────┼─────────┤
│ 50K │ 175K │ 4.0x │
├────────────┼───────────┼─────────┤
│ 100K │ 320K │ 7.3x │
├────────────┼───────────┼─────────┤
│ 200K │ 366K │ 8.4x │
├────────────┼───────────┼─────────┤
│ 400K │ 421K │ 9.6x │
├────────────┼───────────┼─────────┤
│ 800K │ 504K │ 11.5x │
└────────────┴───────────┴─────────┘
10M edges, 1M vertices (best batch = 2M):
┌────────────┬───────────┬────────────────────┐
│ Batch Size │ Edges/sec │ Speedup │
├────────────┼───────────┼────────────────────┤
│ 2M │ 153K │ best at this scale │
└────────────┴───────────┴────────────────────┘
API
Builder Options
┌────────────────────────────────────┬─────────┬──────────────────────────────────────────────────────┐
│ Method │ Default │ Description │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withBatchSize(int) │ 100,000 │ Max edges buffered before auto-flush │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withExpectedEdgeCount(int) │ — │ Auto-tunes batch size to edgeCount/5 (100K-5M) │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withLightEdges(boolean) │ true │ Use light edges when no properties │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withBidirectional(boolean) │ true │ Connect IN edges (deferred to close()) │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withWAL(boolean) │ false │ Write-Ahead Logging during import │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withCommitEvery(int) │ 50,000 │ Edges per commit within a flush (0 = once per flush) │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withEdgeListInitialSize(int) │ 2048 │ Initial edge segment size in bytes │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withPreAllocateEdgeChunks(boolean) │ true │ Pre-allocate edge segments at vertex creation │
├────────────────────────────────────┼─────────┼──────────────────────────────────────────────────────┤
│ withParallelFlush(boolean) │ false │ Parallel edge connection (experimental) │
└────────────────────────────────────┴─────────┴──────────────────────────────────────────────────────┘
Key Methods
- createVertices(typeName, count) — Bulk vertex creation, returns RID array
- newEdge(srcRID, edgeType, dstRID, properties...) — Buffer an edge (auto-flushes at batch size)
- flush() — Manual flush of buffered edges
- close() — Flushes remaining edges, connects all deferred IN edges, batch-updates vertex head chunks
Files Changed
- New: engine/.../graph/GraphBatchImporter.java (1704 lines) — Main importer
- New: engine/.../graph/GraphBatchImporterTest.java (616 lines) — 8 unit tests
- New: integration/.../GraphImporterBenchmarkTest.java (445 lines) — 3-way benchmark
- Modified: engine/.../engine/LocalBucket.java — Added createRecordsBulk() for bulk page writes
- Modified: engine/.../graph/EdgeSegment.java — Added addAtEndDirect(), addManyAtEndDirect() interfaces
- Modified: engine/.../graph/MutableEdgeSegment.java — Implemented vectorized segment writes
Recommended Batch Sizes
┌─────────────┬─────────────────────────────┬──────────────────┐
│ Graph Size │ Recommended │ RAM for Importer │
├─────────────┼─────────────────────────────┼──────────────────┤
│ < 1M edges │ Use withExpectedEdgeCount() │ ~200MB │
├─────────────┼─────────────────────────────┼──────────────────┤
│ 1-5M edges │ 500K-1M │ ~300MB │
├─────────────┼─────────────────────────────┼──────────────────┤
│ 5-20M edges │ 1-2M │ ~500MB │
├─────────────┼─────────────────────────────┼──────────────────┤
│ 20M+ edges │ 2-5M │ ~1GB │
└─────────────┴─────────────────────────────┴──────────────────┘
Future Optimizations
- Pre-serialized edge template: For same-schema edges, pre-compute binary layout once, stamp values per edge
- Direct page cursor: Maintain page+offset cursor instead of per-record findAvailableSpace
- Buffer pool for serialization: Avoid copyOfContent() per edge by using a buffer pool
- Parallel sort+flush: Sort and connect edges across multiple threads partitioned by bucket
Summary
New engine-level GraphBatchImporter that provides 6-11x speedup over the standard edge creation API for bulk graph loading scenarios. Designed to close the performance gap with Neo4j's batch importer.
Problem
Creating edges one-at-a-time through the standard Vertex API (even with transaction batching) is bottlenecked by:
Solution
Key Optimizations
Benchmark Results
500K edges, 50K vertices:
Scaling with batch size (500K edges):
10M edges, 1M vertices (best batch = 2M):
API
Builder Options
Key Methods
Files Changed
Recommended Batch Sizes
Future Optimizations