Batch HTTP endpoint

Not everybody can use Java embedded to do fast batch graph import, so we should have a new streaming http endpoint to dump tons of vertices and edges in CSV and JSONL format.

```
POST /api/v1/batch/{database}
```

Should support two input formats: **JSONL** (newline-delimited JSON) and **CSV**. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.

### JSONL Format

```jsonl
{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30}
{"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25}
{"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}
```

### CSV Format

```csv
@type,@class,@id,name,age
vertex,Person,t1,Alice,30
vertex,Person,t2,Bob,25
---
@type,@class,@from,@to,since
edge,KNOWS,t1,t2,2020
```

In both formats, vertices come first, then edges. Vertices can have temporary IDs (`@id`) that edges reference via `@from`/`@to`. Edges can also reference existing database RIDs directly (e.g., `#12:0`).

### Temporary ID Mapping

The response includes an `idMapping` object so you know what RIDs were assigned:

```json
{
  "verticesCreated": 2,
  "edgesCreated": 1,
  "elapsedMs": 42,
  "idMapping": {"t1": "#9:0", "t2": "#9:1"}
}
```

### Tuning via Query Parameters

All GraphBatch configuration options are exposed as query parameters:

| Parameter | Default | Description |
| --- | --- | --- |
| `batchSize` | 100000 | Max edges buffered before auto-flush |
| `lightEdges` | false | Property-less edges stored as connectivity only (saves ~33% I/O) |
| `wal` | false | Enable Write-Ahead Logging for crash safety |
| `parallelFlush` | true | Parallelize edge connection across async threads |
| `preAllocateEdgeChunks` | true | Pre-allocate edge segments on vertex creation |
| `edgeListInitialSize` | 2048 | Initial segment size in bytes (64–8192) |
| `bidirectional` | true | Connect both outgoing and incoming edges |
| `commitEvery` | 50000 | Edges per sub-transaction within a flush |
| `expectedEdgeCount` | 0 | Hint for auto-tuning batch size |

### Examples

**curl (JSONL):**

```bash
curl -X POST "http://localhost:2480/api/v1/batch/mydb?lightEdges=true" \
  -u root:password \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @graph-data.jsonl
```

**curl (CSV):**

```bash
curl -X POST "http://localhost:2480/api/v1/batch/mydb" \
  -u root:password \
  -H "Content-Type: text/csv" \
  --data-binary @graph-data.csv
```

**Python:**

```python
import requests

data = (
    '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}\n'
    '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}\n'
    '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}\n'
)

resp = requests.post(
    "http://localhost:2480/api/v1/batch/mydb?lightEdges=true",
    auth=("root", "password"),
    headers={"Content-Type": "application/x-ndjson"},
    data=data,
)
print(resp.json())
# {'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}
```

**JavaScript (Node.js):**

```javascript
const resp = await fetch("http://localhost:2480/api/v1/batch/mydb", {
  method: "POST",
  headers: {
    "Content-Type": "application/x-ndjson",
    Authorization: "Basic " + btoa("root:password"),
  },
  body: [
    '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}',
    '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}',
    '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}',
  ].join("\n"),
});
console.log(await resp.json());
```

> **Tip**: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single `createVertices()` call. Interleaving types forces smaller batches.

> **Tip**: The endpoint is NOT atomic by design — GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch HTTP endpoint #3675

JSONL Format

CSV Format

Temporary ID Mapping

Tuning via Query Parameters

Examples

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parameter	Default	Description
`batchSize`	100000	Max edges buffered before auto-flush
`lightEdges`	false	Property-less edges stored as connectivity only (saves ~33% I/O)
`wal`	false	Enable Write-Ahead Logging for crash safety
`parallelFlush`	true	Parallelize edge connection across async threads
`preAllocateEdgeChunks`	true	Pre-allocate edge segments on vertex creation
`edgeListInitialSize`	2048	Initial segment size in bytes (64–8192)
`bidirectional`	true	Connect both outgoing and incoming edges
`commitEvery`	50000	Edges per sub-transaction within a flush
`expectedEdgeCount`	0	Hint for auto-tuning batch size

Uh oh!

Batch HTTP endpoint #3675

Description

JSONL Format

CSV Format

Temporary ID Mapping

Tuning via Query Parameters

Examples

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions