Not everybody can use Java embedded to do fast batch graph import, so we should have a new streaming http endpoint to dump tons of vertices and edges in CSV and JSONL format.
POST /api/v1/batch/{database}
Should support two input formats: JSONL (newline-delimited JSON) and CSV. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.
JSONL Format
{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30}
{"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25}
{"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}
CSV Format
@type,@class,@id,name,age
vertex,Person,t1,Alice,30
vertex,Person,t2,Bob,25
---
@type,@class,@from,@to,since
edge,KNOWS,t1,t2,2020
In both formats, vertices come first, then edges. Vertices can have temporary IDs (@id) that edges reference via @from/@to. Edges can also reference existing database RIDs directly (e.g., #12:0).
Temporary ID Mapping
The response includes an idMapping object so you know what RIDs were assigned:
{
"verticesCreated": 2,
"edgesCreated": 1,
"elapsedMs": 42,
"idMapping": {"t1": "#9:0", "t2": "#9:1"}
}
Tuning via Query Parameters
All GraphBatch configuration options are exposed as query parameters:
| Parameter |
Default |
Description |
batchSize |
100000 |
Max edges buffered before auto-flush |
lightEdges |
false |
Property-less edges stored as connectivity only (saves ~33% I/O) |
wal |
false |
Enable Write-Ahead Logging for crash safety |
parallelFlush |
true |
Parallelize edge connection across async threads |
preAllocateEdgeChunks |
true |
Pre-allocate edge segments on vertex creation |
edgeListInitialSize |
2048 |
Initial segment size in bytes (64–8192) |
bidirectional |
true |
Connect both outgoing and incoming edges |
commitEvery |
50000 |
Edges per sub-transaction within a flush |
expectedEdgeCount |
0 |
Hint for auto-tuning batch size |
Examples
curl (JSONL):
curl -X POST "http://localhost:2480/api/v1/batch/mydb?lightEdges=true" \
-u root:password \
-H "Content-Type: application/x-ndjson" \
--data-binary @graph-data.jsonl
curl (CSV):
curl -X POST "http://localhost:2480/api/v1/batch/mydb" \
-u root:password \
-H "Content-Type: text/csv" \
--data-binary @graph-data.csv
Python:
import requests
data = (
'{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}\n'
'{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}\n'
'{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}\n'
)
resp = requests.post(
"http://localhost:2480/api/v1/batch/mydb?lightEdges=true",
auth=("root", "password"),
headers={"Content-Type": "application/x-ndjson"},
data=data,
)
print(resp.json())
# {'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}
JavaScript (Node.js):
const resp = await fetch("http://localhost:2480/api/v1/batch/mydb", {
method: "POST",
headers: {
"Content-Type": "application/x-ndjson",
Authorization: "Basic " + btoa("root:password"),
},
body: [
'{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}',
'{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}',
'{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}',
].join("\n"),
});
console.log(await resp.json());
Tip: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single createVertices() call. Interleaving types forces smaller batches.
Tip: The endpoint is NOT atomic by design — GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.
Not everybody can use Java embedded to do fast batch graph import, so we should have a new streaming http endpoint to dump tons of vertices and edges in CSV and JSONL format.
Should support two input formats: JSONL (newline-delimited JSON) and CSV. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.
JSONL Format
{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30} {"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25} {"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}CSV Format
In both formats, vertices come first, then edges. Vertices can have temporary IDs (
@id) that edges reference via@from/@to. Edges can also reference existing database RIDs directly (e.g.,#12:0).Temporary ID Mapping
The response includes an
idMappingobject so you know what RIDs were assigned:{ "verticesCreated": 2, "edgesCreated": 1, "elapsedMs": 42, "idMapping": {"t1": "#9:0", "t2": "#9:1"} }Tuning via Query Parameters
All GraphBatch configuration options are exposed as query parameters:
batchSizelightEdgeswalparallelFlushpreAllocateEdgeChunksedgeListInitialSizebidirectionalcommitEveryexpectedEdgeCountExamples
curl (JSONL):
curl (CSV):
Python:
JavaScript (Node.js):