Add a new GraphBatchLoad client-streaming gRPC RPC that exposes the same GraphBatch-based bulk graph loading as the HTTP POST /api/v1/batch/{database} endpoint (see #3675).
Motivation
The HTTP batch endpoint allows high-performance bulk loading of vertices and edges, but gRPC clients have no equivalent. The existing gRPC Insert RPCs (BulkInsert, InsertStream, InsertBidirectional) are generic record-insert operations with no graph awareness — no temporary ID mapping, no vertex→edge linking, no GraphBatch tuning parameters.
Design
RPC signature: rpc GraphBatchLoad (stream GraphBatchChunk) returns (GraphBatchResult)
Client-streaming: the client sends one or more GraphBatchChunk messages, then receives a single GraphBatchResult when the import completes.
Proto messages
message GraphBatchOptions {
int32 batch_size = 1; // default 100000
bool light_edges = 2; // default false
bool wal = 3; // default false
optional bool parallel_flush = 4; // default true (unset = true)
optional bool pre_allocate_edge_chunks = 5; // default true (unset = true)
int32 edge_list_initial_size = 6; // default 2048
optional bool bidirectional = 7; // default true (unset = true)
int32 commit_every = 8; // default 50000
int32 expected_edge_count = 9; // default 0
}
message GraphBatchRecord {
enum Kind { VERTEX = 0; EDGE = 1; }
Kind kind = 1;
string type_name = 2;
string temp_id = 3; // vertex temp ID (for edge references)
string from_ref = 4; // edge source: temp ID or "#bucket:pos"
string to_ref = 5; // edge target: temp ID or "#bucket:pos"
map<string, GrpcValue> properties = 6;
}
message GraphBatchChunk {
string database = 1;
DatabaseCredentials credentials = 2;
GraphBatchOptions options = 3;
repeated GraphBatchRecord records = 4;
}
message GraphBatchResult {
int64 vertices_created = 1;
int64 edges_created = 2;
int64 elapsed_ms = 3;
map<string, string> id_mapping = 4; // temp_id → RID
}
Protocol
- First chunk must contain
database (and optionally credentials and options)
- All VERTEX records must appear before any EDGE records (across all chunks)
- Vertices can have temporary IDs (
temp_id) that edges reference via from_ref/to_ref
- Edges can also reference existing database RIDs directly (e.g.,
#12:0)
- The response includes an
id_mapping of temp IDs to assigned RIDs
Tuning parameters
All GraphBatch.Builder parameters from the HTTP endpoint are exposed via GraphBatchOptions:
| Parameter |
Default |
Description |
batch_size |
100000 |
Max edges buffered before auto-flush |
light_edges |
false |
Property-less edges stored as connectivity only |
wal |
false |
Enable Write-Ahead Logging |
parallel_flush |
true |
Parallelize edge connection across async threads |
pre_allocate_edge_chunks |
true |
Pre-allocate edge segments on vertex creation |
edge_list_initial_size |
2048 |
Initial segment size in bytes (64–8192) |
bidirectional |
true |
Connect both outgoing and incoming edges |
commit_every |
50000 |
Edges per sub-transaction within a flush |
expected_edge_count |
0 |
Hint for auto-tuning batch size |
Important notes
- The endpoint is NOT atomic by design (same as the HTTP batch endpoint).
GraphBatch commits internally in chunks for maximum throughput.
- For very large batches with many temp IDs, the
id_mapping response may exceed the default gRPC message size limit (4 MB). Callers should increase maxInboundMessageSize or avoid temp IDs when mapping is not needed.
Related
Add a new
GraphBatchLoadclient-streaming gRPC RPC that exposes the sameGraphBatch-based bulk graph loading as the HTTPPOST /api/v1/batch/{database}endpoint (see #3675).Motivation
The HTTP batch endpoint allows high-performance bulk loading of vertices and edges, but gRPC clients have no equivalent. The existing gRPC Insert RPCs (
BulkInsert,InsertStream,InsertBidirectional) are generic record-insert operations with no graph awareness — no temporary ID mapping, no vertex→edge linking, noGraphBatchtuning parameters.Design
RPC signature:
rpc GraphBatchLoad (stream GraphBatchChunk) returns (GraphBatchResult)Client-streaming: the client sends one or more
GraphBatchChunkmessages, then receives a singleGraphBatchResultwhen the import completes.Proto messages
Protocol
database(and optionallycredentialsandoptions)temp_id) that edges reference viafrom_ref/to_ref#12:0)id_mappingof temp IDs to assigned RIDsTuning parameters
All
GraphBatch.Builderparameters from the HTTP endpoint are exposed viaGraphBatchOptions:batch_sizelight_edgeswalparallel_flushpre_allocate_edge_chunksedge_list_initial_sizebidirectionalcommit_everyexpected_edge_countImportant notes
GraphBatchcommits internally in chunks for maximum throughput.id_mappingresponse may exceed the default gRPC message size limit (4 MB). Callers should increasemaxInboundMessageSizeor avoid temp IDs when mapping is not needed.Related