Skip to content

#3678 feat(grpc): add GraphBatchLoad RPC for bulk graph loading#3680

Merged
robfrank merged 3 commits intomainfrom
feat/3678-add-graph-batch-grpc
Mar 18, 2026
Merged

#3678 feat(grpc): add GraphBatchLoad RPC for bulk graph loading#3680
robfrank merged 3 commits intomainfrom
feat/3678-add-graph-batch-grpc

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a new GraphBatchLoad client-streaming gRPC RPC that mirrors the HTTP POST /api/v1/batch/{database} endpoint (issue Batch HTTP endpoint #3675)
  • Introduces dedicated proto messages: GraphBatchOptions, GraphBatchRecord, GraphBatchChunk, GraphBatchResult
  • Supports streaming vertices and edges with temporary ID mapping and all GraphBatch tuning parameters
  • Includes 3 integration tests (vertex+edge, vertex-only, empty stream) — all 99 grpcw tests pass

Test plan

  • Proto compiles and generates Java sources (mvn generate-sources in grpc module)
  • grpcw module compiles cleanly (mvn clean compile)
  • graphBatchLoadVerticesAndEdges — streams 2 vertices + 1 edge, verifies counts and ID mapping
  • graphBatchLoadVerticesOnly — streams 2 vertices only, verifies 0 edges
  • graphBatchLoadEmptyStream — immediately completes, verifies empty result
  • Full grpcw test suite passes (99 tests, 0 failures)

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the gRPC API by introducing a client-streaming endpoint for efficient bulk loading of graph data. This new capability allows users to stream large volumes of vertices and edges, including support for temporary ID resolution and configurable batching options, thereby improving performance and flexibility for graph data ingestion.

Highlights

  • New gRPC Endpoint: Introduced a new client-streaming gRPC RPC named GraphBatchLoad to facilitate bulk loading of graph data, mirroring the existing HTTP POST /api/v1/batch/{database} endpoint.
  • Protocol Buffer Definitions: Added dedicated Protocol Buffer messages (GraphBatchOptions, GraphBatchRecord, GraphBatchChunk, GraphBatchResult) to support the new batch loading functionality, defining the structure for data transfer and configuration.
  • Graph Batch Processing Logic: Implemented the server-side logic for GraphBatchLoad, handling streaming of vertices and edges, managing temporary ID mapping, and applying various graph batch tuning parameters.
  • Comprehensive Integration Tests: Included three new integration tests to validate the GraphBatchLoad RPC, covering scenarios such as loading both vertices and edges, loading only vertices, and handling an empty input stream.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review

Good addition — the streaming design is sound and the integration tests cover the happy paths well. A few issues need addressing before merge:


🐛 Bug: Proto3 boolean defaults break configureGraphBatchOptions

Critical. In proto3, all boolean fields default to false. The fields parallel_flush, pre_allocate_edge_chunks, and bidirectional are documented as defaulting to true, but when a client sends a GraphBatchOptions message without setting these fields, they arrive as false, causing:

if (!opts.getParallelFlush())           // always true when field not set!
    builder.withParallelFlush(false);   // silently disables parallelism

A client that omits GraphBatchOptions is protected (early return), but any client that sends any option (e.g. just batch_size) will unexpectedly get parallelFlush=false, preAllocateEdgeChunks=false, and bidirectional=false.

Fix options:

  1. Use optional bool parallel_flush = 4; (proto3 optional tracks presence) and check opts.hasParallelFlush() before applying.
  2. Invert the semantics to negation fields: disable_parallel_flush, disable_pre_allocate, unidirectional.
  3. Use a sentinel int32: 0=unset, 1=false, 2=true.

🐛 Bug: resolveRef crashes on empty from_ref/to_ref

private RID resolveRef(final String ref, ...) {
    if (ref.charAt(0) == '#') {  // StringIndexOutOfBoundsException if ref is ""

Proto3 string fields default to "". If a client sends an edge record without setting from_ref or to_ref, this throws StringIndexOutOfBoundsException, which leaks as a confusing INTERNAL error rather than a clear validation error.

if (ref == null || ref.isEmpty())
    throw new IllegalArgumentException("Edge record is missing from_ref or to_ref");

⚠️ Issue: closeQuietly on error may commit partial data

When onNext catches an exception, it calls resp.onError() then closeQuietly(batch). But GraphBatch.close() flushes buffered edges and connects incoming edges — partial data could be committed to the database even when returning an error to the client. Consider whether GraphBatch needs an abort()/rollback() path, or at minimum, document this as a known limitation.


⚠️ Issue: Unnecessary atomics in sequential callbacks

gRPC StreamObserver callbacks are guaranteed to be called sequentially, so AtomicBoolean inEdgePhase and AtomicReference<String> currentTypeRef don't need to be atomic. Plain boolean[] and String[] (mutable wrapper) or a small state object would have less overhead and be less misleading:

final boolean[] inEdgePhase = { false };
final String[] currentType = { null };

⚠️ Issue: getDatabase called with empty database name on non-first chunks

While getDatabase is only called on the first chunk (correct), the code checks batch == null to detect the first chunk. If a client accidentally sends no records in the first chunk (just options/credentials), then sends records in the second chunk, batch is already initialized — but what if the first chunk has no database field? The getDatabase("") call in chunk 1 will fail, which is fine, but this edge case might warrant a guard:

if (chunk.getDatabase().isEmpty())
    throw new IllegalArgumentException("First chunk must contain the database name");

📋 Design constraint not documented in proto

The "all vertices must come before edges" constraint is enforced in code but not documented in the proto comments. A client sending interleaved vertices and edges will get a confusing error mid-stream. Add this to the GraphBatchChunk or GraphBatchRecord comments:

// IMPORTANT: all VERTEX records must appear before any EDGE records across all chunks.
// Interleaving is not supported and will result in an error.

📋 Minor: elapsed_ms assertion too strict in tests

assertThat(result.getElapsedMs()).isGreaterThan(0);

This can flake for a very fast operation completing within the same millisecond. Prefer isGreaterThanOrEqualTo(0).


📋 Minor: Missing test coverage

  • No test for what happens when edges reference unknown temp IDs (error path)
  • No test for interleaved vertex/edge records in the same chunk (the constraint violation path)
  • No test for direct RID references in from_ref/to_ref (the #bucket:pos path)

These error paths are worth covering given the complexity of the state machine.


Positive notes

  • The proto message design is clean and well-structured
  • Flow control via call.disableAutoInboundFlowControl() + call.request(1) is the correct back-pressure pattern
  • flushVertexBatch batching before calling createVertices is a good performance optimization
  • The empty stream case is handled gracefully

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new client-streaming gRPC RPC, GraphBatchLoad, for bulk graph loading, mirroring the HTTP POST /api/v1/batch/{database} endpoint. It includes new proto messages for options, records, chunks, and results, supporting streaming vertices and edges with temporary ID mapping and all GraphBatch tuning parameters. The changes also include integration tests to verify the functionality of the new RPC. The code introduces a new service method and associated data structures for handling graph batch loads, including vertex and edge processing, temporary ID mapping, and configuration options. The tests cover vertex+edge, vertex-only, and empty stream scenarios.

counts[1]++;
} else {
if (inEdgePhase.get())
throw new IllegalArgumentException("Vertex record received after edges. All vertices must appear before edges");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This exception message could be improved to provide more context to the user. Consider including the database name or other relevant information to help the user identify the source of the error.

}
}
} catch (final Exception e) {
resp.onError(Status.INTERNAL.withDescription("graphBatchLoad: " + e.getMessage()).asException());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding more context to the error message, such as the database name or the specific operation that failed, to aid in debugging.

}
} catch (final Exception e) {
resp.onError(Status.INTERNAL.withDescription("graphBatchLoad: " + e.getMessage()).asException());
closeQuietly(batchRef.get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding more context to the error message, such as the database name or the specific operation that failed, to aid in debugging.

Comment on lines +1779 to +1781
if (batch != null) {
try { batch.close(); } catch (final Exception ignored) { }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The closeQuietly method catches Exception but doesn't log it. At least log the exception at the DEBUG level to understand why the batch is failing to close.

      try { batch.close(); } catch (final Exception ignored) { LogManager.instance().log(this, Level.DEBUG, "Error during GraphBatch close", ignored); }

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review

Good addition — the feature mirrors the HTTP batch endpoint cleanly and the test coverage is solid. A few issues worth addressing before merging:


Bug: resolveRef passes null as database to RID constructor

// ArcadeDbGrpcService.java
return new RID(null, bucketId, position);

When null is passed, RID falls back to DatabaseContext.INSTANCE.getActiveDatabase() (thread-local). In a gRPC streaming handler there is no guarantee that the current thread has the database set in thread-local — especially in the test graphBatchLoadWithDirectRidReferences where only edges are streamed (no prior batch.createVertices() call that would set the context).

Fix: pass the database explicitly:

return new RID(dbRef.get(), bucketId, position);

Data consistency: closeQuietly commits partial data on error

When an exception is thrown mid-stream in onNext, the code calls resp.onError(...) then closeQuietly(batch). Since GraphBatch.close() flushes and commits whatever has been accumulated, partial data is persisted even on failure.

GraphBatch has no abort() / rollback path today, so a partial solution would be to at minimum document this behaviour prominently in the proto comment and method Javadoc so callers can plan accordingly. Longer term, adding an abort() to GraphBatch that rolls back without flushing would make the contract clean.


Spurious call.request(1) after sending error

} catch (final Exception e) {
    resp.onError(...);
    closeQuietly(batchRef.get());
    return;               // <-- return is in catch, but finally still runs
} finally {
    if (!cancelled.get())
        call.request(1);  // called even after resp.onError()
}

After resp.onError() the stream is terminated from the client's perspective. Requesting another message is harmless in practice, but it's semantically wrong and could log warnings in the gRPC runtime. Track the error state with a flag (alongside cancelled) and skip the request:

} finally {
    if (!cancelled.get() && !errorSent.get())
        call.request(1);
}

Dead variable dbRef

dbRef is set once in onNext but never read afterwards. It can be removed; the database is available via batchRef.get() internals (and the Database db local is enough for initialization).


configureGraphBatchOptions cannot explicitly enable parallel_flush / bidirectional

if (opts.hasParallelFlush() && !opts.getParallelFlush())
    builder.withParallelFlush(false);

The condition only fires when the field is explicitly set to false, which is correct for the common case because the builder already defaults to true. However, this means sending parallel_flush: true is a complete no-op. That is fine today but it would be worth a comment explaining the asymmetry, e.g.:

// Only need to disable: builder defaults to true; setting true explicitly is a no-op
if (opts.hasParallelFlush() && !opts.getParallelFlush())

Potential gRPC message-size breach for large batches

GraphBatchResult.id_mapping returns all temp-id → RID pairs. For a million-vertex import that is a ~50 MB string map, easily exceeding the default 4 MB gRPC message limit. Consider:

  • Omitting the map from the result by default and having the client query for RIDs separately if needed, or
  • Documenting the limit and advising callers not to use temp IDs for very large batches.

Minor

  • private GrpcValue sv(final String s) in the test class is used in only two tests and is just an alias for stringValue(). Remove it and call stringValue() directly to keep the test helper surface minimal.
  • Proto comment says credentials is "required on first chunk" but there is no enforcement. The database field is validated; it would be consistent to validate credentials is non-default as well (or update the comment to say "optional if channel-level auth is used").
  • graphBatchLoadWithDirectRidReferences sends only edges in the first chunk, which means batch == null on entry to onNext and the batch is initialized with inEdgePhase[0] = false. The edge records then immediately trigger the transition path. The test passes today, but it relies on correct ordering detection being deferred; a short inline comment in the test would clarify this intent.

Test coverage: good

The six tests cover the happy path, vertex-only, empty stream, unknown temp ID, ordering violation, and direct RID references. That is comprehensive. Only suggestion: add an assertion on result.getEdgesCreated() in graphBatchLoadUnknownTempIdReturnsError to confirm no edges leaked into the database (i.e., verify the partial-data concern is at least observable in tests).

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Mar 18, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-8.72% 66.72%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ca9db5a) 110408 81751 74.04%
Head commit (36109e0) 141902 (+31494) 92700 (+10949) 65.33% (-8.72%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3680) 580 387 66.72%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review

This PR bundles two distinct features: a new gRPC GraphBatchLoad RPC and several OpenCypher count push-down optimizations. They are unrelated and would benefit from being separate PRs, but I'll review both below.


gRPC GraphBatchLoad

Potential bug — mid-stream exception leaves transaction open

In ArcadeDbGrpcService.graphBatchLoad(), if an exception is thrown mid-stream (e.g., unknown temp ID, malformed RID), the catch block calls responseObserver.onError(...). However, it's not clear whether the database transaction started by GraphBatch is reliably rolled back in all error paths. If GraphBatch opens a transaction internally and the streaming context is aborted without a matching onCompleted, orphaned transactions could exhaust the transaction pool or leave the database in an inconsistent state. Please verify that GraphBatch is always cleaned up (e.g., via try-finally or try-with-resources).

Hardcoded batch size

The vertex flush threshold of 10_000 per type is hardcoded. Since this directly affects memory usage under load, it should be exposed as a named constant or a configurable option (e.g., GraphBatchOptions.batchSize). At minimum, extract it to a private static final int VERTEX_FLUSH_BATCH_SIZE = 10_000;.

"Vertices before edges" protocol constraint

The proto definition requires all vertex records to appear before edge records in the stream, and this is enforced server-side. This constraint is non-obvious and should be documented in the proto file itself (a comment on the GraphBatchLoad RPC or on the GraphBatchRecord message). Right now a client author has no way of knowing this unless they read the Java implementation.

GraphBatchResult missing vertex/edge counts

The HTTP POST /api/v1/batch endpoint returns detailed stats (vertices created, edges created, errors). The GraphBatchResult message currently returns only vertexCount and edgeCount with a string message. Consider adding an errorCount field or a repeated errors field so clients can distinguish partial failures.

Test coverage

The new tests (graphBatchLoadUnknownTempIdReturnsError, graphBatchLoadVertexAfterEdgeReturnsError) cover error paths, which is great. Missing:

  • A test for graphBatchLoadWithDirectRidReferences mixing both temp IDs and direct RIDs in the same stream.
  • A test verifying transaction isolation: if the stream fails mid-way, no vertices or edges should have been committed.

OpenCypher Count Push-Down Optimizations

PropertyAccessExpression — unguarded rid.asVertex()

if (variable instanceof RID rid) {
  final Object rawValue = rid.asVertex().get(propertyName);
  return convertFromStorage(rawValue);
}

rid.asVertex() can return null if the record has been deleted or if the RID points to a non-vertex record (e.g., an edge). This will throw a NullPointerException. There should be a null check:

final Vertex v = rid.asVertex();
if (v == null)
  return null;
final Object rawValue = v.get(propertyName);

Also: algorithm procedures may legitimately store edge RIDs. If an edge RID ends up in this branch, asVertex() returns null (or throws, depending on the implementation). It would be safer to use rid.getRecord() and check instanceof Vertex before casting, or handle the edge case explicitly.

CountStarJoinStep — potential long overflow in degree product

For star-join patterns, the count is computed as the product of per-arm degrees:

count *= armDegree;  // (conceptually)

For a high-degree central node (e.g., 10M+ edges per arm), the product can overflow long. Consider adding an overflow guard:

if (count > Long.MAX_VALUE / armDegree) return Long.MAX_VALUE; // or throw
count *= armDegree;

AlgoPageRank tolerance regression

The tolerance in AlgoPageRankTest was relaxed from 1e-6 to 1e-4. A 100× regression in convergence precision between the CSR fast-path and the OLTP path is a sign of a real algorithmic difference that should be investigated rather than papered over with a looser assertion. Please add a comment explaining why the two paths produce different results and whether this is acceptable for production use.

callStep.countOnlyOptimization — shared empty Result object

The countOnlyOptimization path returns N references to a single shared EmptyResult or similar object. If downstream steps mutate the result object (which some pipeline steps do), this will produce subtle corruption bugs. Either use immutable result objects, or document clearly that results from the count-only path must not be mutated.

tryOptimizeStarCountStar — missing tests for edge cases

The new optimizer path handles star-join patterns with OPTIONAL MATCH. Tests cover the happy path (starJoinQ4PatternUsesOptimizedStep, starJoinQ7OptionalMatchUsesOptimizedStep), but missing:

  • A query where the central node has 0 edges (the max(1, degree) for OPTIONAL MATCH arms matters here).
  • A star-join with 3+ arms.
  • A pattern that looks like a star-join but should NOT use the optimized path (regression guard).

CypherExecutionPlantryOptimizeStarCountStar before regular optimizer

Inserting a new fast-path before the existing optimizer dispatch is fragile. If the star-join pattern detection has a false positive (matches a query it shouldn't), it silently produces wrong results instead of falling through to the correct path. Consider adding a assert verifyCorrectness(plan) or at minimum a comment explaining the mutual exclusivity of tryOptimizeStarCountStar and the existing optimizations.


Minor / Style

  • RESULT_COUNT_HINT_VAR in CommandContext is a string constant used as a "magic variable" shared between procedures and CallStep. This coupling is fragile — a typo in any procedure silently disables the optimization with no warning. Consider using a typed wrapper or a dedicated interface (CountHintProvider) that procedures implement, instead of passing an untyped long through the context map.
  • Several test files updated only to change node.name AS name in RETURN clauses — these appear correct but it would be worth a comment in the test or a brief note explaining that this is required because the new RID lazy-loading path is now triggered.
  • AlgoPageRankTest tolerance relaxation should include a // TODO: comment or a linked issue if the CSR/OLTP convergence difference is a known limitation.

Summary

Area Severity Item
gRPC Medium Transaction cleanup on mid-stream error
gRPC Low Hardcoded batch size 10,000
gRPC Low Missing protocol constraint doc in proto
OpenCypher Medium rid.asVertex() NPE in PropertyAccessExpression
OpenCypher Medium long overflow in star-join degree product
OpenCypher Medium PageRank precision regression (1e-6 → 1e-4)
OpenCypher Medium Shared mutable result in count-only path
OpenCypher Low Missing edge-case tests for star-join optimizer
General Low RESULT_COUNT_HINT_VAR magic-string coupling

The gRPC feature and the OpenCypher optimizations are both well-structured and show clear design intent. Addressing the NPE in PropertyAccessExpression and verifying transaction cleanup in graphBatchLoad are the most important fixes before merge.

🤖 Generated with Claude Code

robfrank and others added 3 commits March 18, 2026 16:58
…ading

Add a new GraphBatchLoad RPC to the gRPC module that exposes the same
GraphBatch-based bulk loading as the HTTP POST /api/v1/batch endpoint.
Supports streaming vertices and edges with temporary ID mapping, and
all GraphBatch tuning parameters (batchSize, lightEdges, WAL, etc.).

Closes #3678

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use optional bool for proto fields that default to true (parallel_flush,
  pre_allocate_edge_chunks, bidirectional) to fix proto3 default semantics
- Guard against empty from_ref/to_ref in resolveRef
- Guard against empty database name on first chunk
- Replace unnecessary AtomicBoolean/AtomicReference with plain arrays
- Document vertex-before-edge constraint in proto comments
- Add error-path tests (unknown temp ID, vertex-after-edge, direct RID refs)
- Fix elapsed_ms assertion to avoid potential flake

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Pass database explicitly to RID constructor in resolveRef instead of
  relying on thread-local (safer for gRPC thread pool)
- Skip call.request(1) after sending error to avoid spurious requests
- Document partial-commit behavior of closeQuietly (by design, same as HTTP)
- Document configureGraphBatchOptions asymmetry for true-default fields
- Update credentials proto comment to reflect optional channel-level auth
- Add message-size warning for large id_mapping responses
- Remove sv() alias, use stringValue() consistently in tests
- Add clarifying comment to direct RID reference test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robfrank robfrank force-pushed the feat/3678-add-graph-batch-grpc branch from 36109e0 to 425efec Compare March 18, 2026 15:58
@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

test

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

PLACEHOLDER_TO_EDIT

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

full review here

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review - Part 1 of 2

The prior reviews covered the main design issues well. This pass focuses on a few items not yet addressed.

Bug: Double-close risk on GraphBatch when transport error follows server error

When onNext throws (e.g. unknown temp ID), the server calls resp.onError() and then closeQuietly(batchRef.get()). The batch is now closed. However, if a transport error (client disconnect, network reset) arrives shortly after, gRPC will invoke the server-side onError callback on the request stream, which calls closeQuietly(batchRef.get()) on the already-closed batch.

GraphBatch.close() calls flush(), connectDeferredIncomingEdges(), and batchUpdateVertexHeadChunks(). If any of these are non-idempotent (edge buffer state not zeroed after close), a second call can double-commit deferred incoming edges or corrupt vertex head pointers.

Fix: null out the reference immediately after closing, so a second call is a no-op:

closeQuietly(batchRef.getAndSet(null));

Use this everywhere closeQuietly is currently called (onNext catch, onError, onCompleted catch).

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

SPDX-License-Identifier: Apache-2.0

ArcadeDB
Copyright 2021-present Arcade Data Ltd

This product includes software developed at
Arcade Data Ltd (https://arcadedata.com/).

================================================================================

This product includes components originally developed for OrientDB, an Open
Source Multi-Model NoSQL Database.
Copyright 2010-2020 OrientDB LTD (https://orientdb.com)

ArcadeDB was created by Luca Garulli, the same founder of OrientDB, and written
from scratch after SAP's acquisition of OrientDB in 2020. ArcadeDB reuses
OrientDB's SQL engine (now heavily modified) and some utility classes.

OrientDB source code is available at: https://github.com/orientechnologies/orientdb

================================================================================

This product includes openCypher grammar files and Technology Compatibility Kit
(TCK) test suite, developed by the openCypher community.

Copyright (c) "Neo4j"
Neo4j Sweden AB [https://neo4j.com]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

openCypher project website: https://opencypher.org/

Cypher® is a registered trademark of Neo4j Inc.

The included openCypher TCK tests contain the following attribution notice:

"This work was created by the collective efforts of the openCypher community.
Without limiting the terms of Section 6, any Derivative Work that is not
approved by the public consensus process of the openCypher Implementers Group
should not be described as 'Cypher' (and Cypher® is a registered trademark of
Neo4j Inc.) or as 'openCypher'. Extensions by implementers or prototypes or
proposals for change that have been documented or implemented should only be
described as 'implementation extensions to Cypher' or as 'proposed changes to
Cypher that are not yet approved by the openCypher community'."

The following files contain openCypher content:

  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Parser.g4
  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Lexer.g4
  • engine/src/test/resources/opencypher/tck/features/**/*.feature (220 test files)

This product includes software from the OpenCypher-Gremlin project.

Copyright 2018-2019 Neueda

The following file is a security-patched override of unmaintained upstream code:
gremlin/src/main/java/org/opencypher/gremlin/translation/groovy/StringTranslationUtils.java

OpenCypher-Gremlin project: https://github.com/opencypher/cypher-for-gremlin

================================================================================

This product includes software developed by The Apache Software Foundation
(https://www.apache.org/).

Apache Lucene
Copyright 2001-2024 The Apache Software Foundation

This product includes software developed by
The Apache Software Foundation (https://www.apache.org/).

Includes software from other Apache copyright owners, as follows:

  • ICU4J, (c) 1995-2014 International Business Machines Corporation and others
  • ANTLR4 grammar from C parser (BSD 3-clause license)
  • JFlex lexer generator (BSD license)

Apache ANTLR4
Copyright 2012-2021 The ANTLR Project

Apache TinkerPop (Gremlin)
Copyright 2015-2024 The Apache Software Foundation

Apache Commons Compress
Copyright 2002-2024 The Apache Software Foundation

Apache Commons Lang
Copyright 2001-2024 The Apache Software Foundation

Apache Commons Codec
Copyright 2002-2024 The Apache Software Foundation

Apache HttpComponents
Copyright 1999-2024 The Apache Software Foundation

================================================================================

This product includes software developed by GraalVM (Oracle).

GraalVM SDK, Truffle API, and JavaScript implementation
Copyright (c) 2012, 2024, Oracle and/or its affiliates

https://www.graalvm.org/

================================================================================

This product includes software developed by Google Inc.

Protocol Buffers
Copyright 2008 Google Inc.

Guava Libraries
Copyright (C) 2010-2024 The Guava Authors

Gson
Copyright 2008-2024 Google Inc.

gRPC
Copyright 2015-2024 The gRPC Authors

================================================================================

This product includes software developed by Red Hat, Inc.

Undertow
Copyright 2013 Red Hat, Inc., and individual contributors

https://undertow.io/

================================================================================

This product includes software developed by the Netty project.

Netty
Copyright 2014 The Netty Project

https://netty.io/

================================================================================

This product includes JUnit 5 test framework.

JUnit 5
Copyright 2015-2024 the original author or authors

https://junit.org/junit5/

================================================================================

This product includes Testcontainers framework.

Testcontainers
Copyright (c) 2015-2024 Richard North and other authors

https://testcontainers.com/

================================================================================

This product includes software developed by FasterXML, LLC.

Jackson JSON processor
Copyright (c) 2007-2024 Tatu Saloranta and other contributors

https://github.com/FasterXML/jackson

================================================================================

For the complete list of third-party components and their licenses, see the
ATTRIBUTIONS.md file in the root directory of this distribution.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review test

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

SPDX-License-Identifier: Apache-2.0

ArcadeDB
Copyright 2021-present Arcade Data Ltd

This product includes software developed at
Arcade Data Ltd (https://arcadedata.com/).

================================================================================

This product includes components originally developed for OrientDB, an Open
Source Multi-Model NoSQL Database.
Copyright 2010-2020 OrientDB LTD (https://orientdb.com)

ArcadeDB was created by Luca Garulli, the same founder of OrientDB, and written
from scratch after SAP's acquisition of OrientDB in 2020. ArcadeDB reuses
OrientDB's SQL engine (now heavily modified) and some utility classes.

OrientDB source code is available at: https://github.com/orientechnologies/orientdb

================================================================================

This product includes openCypher grammar files and Technology Compatibility Kit
(TCK) test suite, developed by the openCypher community.

Copyright (c) "Neo4j"
Neo4j Sweden AB [https://neo4j.com]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

openCypher project website: https://opencypher.org/

Cypher® is a registered trademark of Neo4j Inc.

The included openCypher TCK tests contain the following attribution notice:

"This work was created by the collective efforts of the openCypher community.
Without limiting the terms of Section 6, any Derivative Work that is not
approved by the public consensus process of the openCypher Implementers Group
should not be described as 'Cypher' (and Cypher® is a registered trademark of
Neo4j Inc.) or as 'openCypher'. Extensions by implementers or prototypes or
proposals for change that have been documented or implemented should only be
described as 'implementation extensions to Cypher' or as 'proposed changes to
Cypher that are not yet approved by the openCypher community'."

The following files contain openCypher content:

  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Parser.g4
  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Lexer.g4
  • engine/src/test/resources/opencypher/tck/features/**/*.feature (220 test files)

This product includes software from the OpenCypher-Gremlin project.

Copyright 2018-2019 Neueda

The following file is a security-patched override of unmaintained upstream code:
gremlin/src/main/java/org/opencypher/gremlin/translation/groovy/StringTranslationUtils.java

OpenCypher-Gremlin project: https://github.com/opencypher/cypher-for-gremlin

================================================================================

This product includes software developed by The Apache Software Foundation
(https://www.apache.org/).

Apache Lucene
Copyright 2001-2024 The Apache Software Foundation

This product includes software developed by
The Apache Software Foundation (https://www.apache.org/).

Includes software from other Apache copyright owners, as follows:

  • ICU4J, (c) 1995-2014 International Business Machines Corporation and others
  • ANTLR4 grammar from C parser (BSD 3-clause license)
  • JFlex lexer generator (BSD license)

Apache ANTLR4
Copyright 2012-2021 The ANTLR Project

Apache TinkerPop (Gremlin)
Copyright 2015-2024 The Apache Software Foundation

Apache Commons Compress
Copyright 2002-2024 The Apache Software Foundation

Apache Commons Lang
Copyright 2001-2024 The Apache Software Foundation

Apache Commons Codec
Copyright 2002-2024 The Apache Software Foundation

Apache HttpComponents
Copyright 1999-2024 The Apache Software Foundation

================================================================================

This product includes software developed by GraalVM (Oracle).

GraalVM SDK, Truffle API, and JavaScript implementation
Copyright (c) 2012, 2024, Oracle and/or its affiliates

https://www.graalvm.org/

================================================================================

This product includes software developed by Google Inc.

Protocol Buffers
Copyright 2008 Google Inc.

Guava Libraries
Copyright (C) 2010-2024 The Guava Authors

Gson
Copyright 2008-2024 Google Inc.

gRPC
Copyright 2015-2024 The gRPC Authors

================================================================================

This product includes software developed by Red Hat, Inc.

Undertow
Copyright 2013 Red Hat, Inc., and individual contributors

https://undertow.io/

================================================================================

This product includes software developed by the Netty project.

Netty
Copyright 2014 The Netty Project

https://netty.io/

================================================================================

This product includes JUnit 5 test framework.

JUnit 5
Copyright 2015-2024 the original author or authors

https://junit.org/junit5/

================================================================================

This product includes Testcontainers framework.

Testcontainers
Copyright (c) 2015-2024 Richard North and other authors

https://testcontainers.com/

================================================================================

This product includes software developed by FasterXML, LLC.

Jackson JSON processor
Copyright (c) 2007-2024 Tatu Saloranta and other contributors

https://github.com/FasterXML/jackson

================================================================================

For the complete list of third-party components and their licenses, see the
ATTRIBUTIONS.md file in the root directory of this distribution.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Java CI - deploy
License

How to contribute to ArcadeDB

Contribution License

ArcadeDB is licensed under the Apache License 2.0. By submitting a contribution (pull request, patch, or any other form), you agree that your contribution is licensed under the same terms, as stated in Section 5 of the Apache License 2.0. This means:

  • You keep your copyright. You are not transferring ownership of your code to anyone.
  • You grant a perpetual license. Your contribution is irrevocably available under Apache 2.0, including copyright and patent grants (Sections 2 and 3).
  • No CLA required. We deliberately do not use a Contributor License Agreement. This keeps the copyright distributed among all contributors, which structurally prevents any single entity from changing the project's license. See our Governance for details.

By submitting a contribution, you represent that you have the right to license it under the Apache License 2.0 — i.e., the work is your own, or you have permission from the copyright holder.

Did you find a bug?

  • Do not open up a GitHub issue if the bug is a security vulnerability, and instead write to support -at- arcadedb.com.

  • Ensure the bug was not already reported by searching on GitHub under Issues.

  • If you're unable to find an open issue addressing the problem, open a new one. Be sure to include a title and clear description, as much relevant information as possible, and a code sample or an executable test case demonstrating the expected behavior that is not occurring. Importantly, note the ArcadeDB version, JVM type and version exhibiting the issue.

Did you write a patch that fixes a bug?

  • Open a new GitHub pull request with the patch.

  • Ensure the PR description clearly describes the problem and solution. Include the relevant issue number if applicable.

Did you fix whitespace, format code, or make a purely cosmetic patch?

Changes that are cosmetic in nature and do not add anything substantial to the stability, functionality, or testability will generally not be accepted.

Do you intend to add a new feature or change an existing one?

  • Suggest your change in the ArcadeDB Community and start writing code.

  • Do not open an issue on GitHub until you have collected positive feedback about the change. GitHub issues are primarily intended for bug reports and fixes.

Are you adding a new dependency?

When adding new third-party dependencies to ArcadeDB:

  1. Check License Compatibility - Ensure the license is compatible with Apache License 2.0

    • ✅ Compatible: Apache 2.0, MIT, BSD (2/3-Clause), EPL, UPL, EDL, public domain
    • ❌ Incompatible: GPL, AGPL, proprietary licenses without explicit permission
  2. Update Attribution Files:

    • Add the dependency to ATTRIBUTIONS.md in the appropriate section
    • If the dependency is Apache-licensed and includes a NOTICE file, incorporate required notices into the main NOTICE file
    • If introducing a new license type, add the full license text to the LICENSES/ directory
  3. Maintain SPDX Compliance - All new source files must include:

    • SPDX-FileCopyrightText: with copyright holder and years
    • SPDX-License-Identifier: with the SPDX license identifier (e.g., Apache-2.0)

For questions about licensing, please ask in the discussion forum before adding the dependency.

Prepare your environment

Pre-commit

This project uses pre-commit. Every developer should install it locally, please
follow installation instructions for your operative system.

Developer guide

In order to use the latest distribution before the official release, you have to build it from the latest SNAPSHOT version in
the main branch of GitHub. All you need is:

  • Java v21 for the runtime or major
  • Apache Maven, to build the distribution
  • Docker, to build the container images

It's super easy, just follow the following steps:

(1) Download the sources

If you have git installed, execute the following command on a shell or command prompt:

$ git clone git@github.com:ArcadeData/arcadedb.git

With this command above you have just cloned the main branch continaing the latest updates. If you don't have git installed, you
can always download the latest souces
from GitHub as a zip file. After downloaded, unzip the file on
a local folder.

(2) Now build the project using Maven.

Assure you have Apache Maven installed on your computer and execute the following commands:

$ cd arcadedb
$ mvn clean install -DskipTests

The build should take less than a minute.

(3) Use the fresh built distribution

You will find a fresh built distribution under the directory package/target/arcadedb-<version>.dir/arcadedb-<version>/.

For example, if the latest release is the 25.1.1-SNAPSHOT, then the release will be available under the
directory package/target/arcadedb-25.1.1-SNAPSHOT.dir/arcadedb-25.1.1-SNAPSHOT/

Some build targets

Build and run default tests:

$ mvn clean install

To run additional integration test locally use:

$ mvn clean install -P integration

Rebuild Docker images for e2e tests:

$ mvn clean install -DskipTests -Pdocker

Build and run in fail-at-end mode:

$ mvn clean install -fae

Print dependency tree:

$ mvn dependency:tree

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

line1\nline2\n### heading

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

ArcadeDB Plugin Architecture

Overview

ArcadeDB supports a plugin architecture that allows extending the server functionality through isolated plugins. Each plugin runs in its own class loader, enabling plugins to have different versions of dependencies without conflicts.

Plugin Types

ArcadeDB includes the following built-in plugins:

  • Gremlin - Apache TinkerPop Gremlin graph traversal language support
  • PostgreSQL Wire Protocol - PostgreSQL protocol compatibility
  • MongoDB Wire Protocol - MongoDB query language compatibility
  • Redis Wire Protocol - Redis command compatibility
  • gRPC - gRPC protocol support

Architecture

Class Loading

The plugin system uses isolated class loaders with the following strategy:

  1. Server API Classes (com.arcadedb.*) - Loaded from parent class loader (shared across all plugins)
  2. Plugin Classes - Loaded from plugin's own JAR first (isolated)
  3. Other Classes - Fall back to parent class loader if not found in plugin JAR

This approach ensures:

  • Plugins can use different versions of third-party libraries
  • Server APIs are shared for consistency and communication
  • Memory efficiency through shared core classes

Components

PluginManager

  • Discovers plugins from lib/plugins/ directory
  • Manages plugin lifecycle (start, stop)
  • Coordinates plugin loading with server initialization

PluginClassLoader

  • Custom class loader that isolates plugin dependencies
  • Parent-first delegation for server API classes
  • Child-first delegation for plugin-specific classes

PluginDescriptor

  • Metadata container for each plugin
  • Tracks plugin state and lifecycle
  • Associates plugin with its class loader

Plugin Lifecycle

  1. Discovery - Scan lib/plugins/ directory for JAR files
  2. Class Loading - Create isolated class loader for each plugin JAR
  3. Service Discovery - Use Java ServiceLoader to find ServerPlugin implementations
  4. Configuration - Call configure() with server instance and configuration
  5. Starting - Call startService() based on installation priority
  6. Running - Plugin provides functionality
  7. Stopping - Call stopService() in reverse order
  8. Cleanup - Close class loaders and release resources

Installation Priorities

Plugins are started in phases based on their installation priority:

  1. BEFORE_HTTP_ON - Before HTTP server starts (default)
  2. AFTER_HTTP_ON - After HTTP server starts
  3. AFTER_DATABASES_OPEN - After databases are loaded

Creating a Plugin

1. Implement ServerPlugin Interface

package com.example.myplugin;

import com.arcadedb.ContextConfiguration;
import com.arcadedb.server.ArcadeDBServer;
import com.arcadedb.server.ServerPlugin;

public class MyPlugin implements ServerPlugin {
  private ArcadeDBServer server;
  private ContextConfiguration configuration;

  @Override
  public void configure(ArcadeDBServer arcadeDBServer, ContextConfiguration configuration) {
    this.server = arcadeDBServer;
    this.configuration = configuration;
    // Initialize your plugin configuration
  }

  @Override
  public void startService() {
    // Start your plugin services
    System.out.println("MyPlugin started!");
  }

  @Override
  public void stopService() {
    // Stop your plugin services and clean up resources
    System.out.println("MyPlugin stopped!");
  }

  @Override
  public INSTALLATION_PRIORITY getInstallationPriority() {
    return INSTALLATION_PRIORITY.AFTER_HTTP_ON;
  }
}

2. Create Service Provider Configuration

Create file: src/main/resources/META-INF/services/com.arcadedb.server.ServerPlugin

Content:

com.example.myplugin.MyPlugin

3. Build Plugin JAR

mvn clean package

4. Deploy Plugin

Copy the plugin JAR to the lib/plugins/ directory in your ArcadeDB installation:

cp target/myplugin-1.0.0.jar $ARCADEDB_HOME/lib/plugins/

5. Start ArcadeDB

The plugin will be automatically discovered and loaded when ArcadeDB starts:

cd $ARCADEDB_HOME
bin/server.sh

Check the logs for:

[INFO] Discovered plugin: myplugin from myplugin-1.0.0.jar
[INFO] - myplugin plugin started

Plugin Dependencies

Server API Dependencies

Plugin POMs should include server dependencies with provided scope:

<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-server</artifactId>
    <version>${arcadedb.version}</version>
    <scope>provided</scope>
</dependency>

Plugin-Specific Dependencies

Plugin-specific dependencies use normal compile scope and will be packaged with the plugin:

<dependency>
    <groupId>com.example</groupId>
    <artifactId>my-library</artifactId>
    <version>1.0.0</version>
    <scope>compile</scope>
</dependency>

Building Distributions with Plugins

Maven Assembly

The Maven assembly descriptor automatically places plugin JARs in lib/plugins/:

<dependencySet>
    <outputDirectory>lib/plugins</outputDirectory>
    <includes>
        <include>com.arcadedb:arcadedb-gremlin</include>
        <include>com.arcadedb:arcadedb-postgresw</include>
        <include>com.arcadedb:arcadedb-mongodbw</include>
        <include>com.arcadedb:arcadedb-redisw</include>
        <include>com.arcadedb:arcadedb-grpcw</include>
    </includes>
    <useTransitiveDependencies>false</useTransitiveDependencies>
</dependencySet>

Advanced Topics

Accessing Server Resources

Plugins have full access to the ArcadeDB server instance:

@Override
public void configure(ArcadeDBServer arcadeDBServer, ContextConfiguration configuration) {
    this.server = arcadeDBServer;

    // Access databases
    ServerDatabase db = server.getDatabase("mydb");

    // Access HTTP server for custom endpoints
    HttpServer httpServer = server.getHttpServer();

    // Access security
    ServerSecurity security = server.getSecurity();
}

Thread Context Class Loader

The PluginManager automatically sets the thread context class loader during plugin operations:

  • During configure() - Set to plugin's class loader
  • During startService() - Set to plugin's class loader
  • During stopService() - Set to plugin's class loader

This ensures proper class loading for frameworks that use the thread context class loader.

HTTP Endpoint Registration

Plugins can register custom HTTP endpoints:

@Override
public void registerAPI(HttpServer httpServer, PathHandler routes) {
    routes.addExactPath("/api/myplugin", exchange -> {
        exchange.getResponseHeaders().put(Headers.CONTENT_TYPE, "application/json");
        exchange.getResponseSender().send("{\"status\":\"ok\"}");
    });
}

Troubleshooting

Plugin Not Discovered

Check that:

  1. Plugin JAR is in lib/plugins/ directory
  2. META-INF/services/com.arcadedb.server.ServerPlugin file exists
  3. Service file contains correct plugin class name
  4. Plugin class implements ServerPlugin interface

ClassNotFoundException

If you see ClassNotFoundException for server classes:

  • Ensure server dependencies use provided scope
  • Check that class is in com.arcadedb.* package

If you see ClassNotFoundException for plugin classes:

  • Ensure dependency is included with compile scope
  • Check that JAR contains the required class

Plugin Conflicts

If two plugins have conflicting dependencies:

  • This is the main benefit of isolated class loaders
  • Each plugin can use its own version
  • Ensure server API classes match across all plugins

Migration from Legacy Plugin Loading

The new plugin system is backward compatible with the legacy configuration-based loading. Both systems can coexist:

Legacy Method (still supported)

arcadedb.server.plugins=gremlin:com.arcadedb.server.gremlin.GremlinServerPlugin

New Method (recommended)

  1. Place plugin JAR in lib/plugins/
  2. Include META-INF/services file
  3. No configuration needed

Best Practices

  1. Use Provided Scope - Server dependencies should always use provided scope
  2. Clean Shutdown - Implement proper cleanup in stopService()
  3. Thread Safety - Make plugin implementations thread-safe
  4. Logging - Use LogManager.instance().log() for consistent logging
  5. Error Handling - Handle exceptions gracefully, don't crash the server
  6. Resource Management - Close all resources in stopService()
  7. Configuration - Use ContextConfiguration for plugin settings

Examples

See the built-in plugins for complete examples:

  • gremlin/ - Complex plugin with custom graph manager
  • postgresw/ - Network protocol plugin
  • mongodbw/ - Query language compatibility plugin
  • redisw/ - Simple protocol plugin
  • grpcw/ - gRPC service plugin

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

SPDX-License-Identifier: Apache-2.0

ArcadeDB
Copyright 2021-present Arcade Data Ltd

This product includes software developed at
Arcade Data Ltd (https://arcadedata.com/).

================================================================================

This product includes components originally developed for OrientDB, an Open
Source Multi-Model NoSQL Database.
Copyright 2010-2020 OrientDB LTD (https://orientdb.com)

ArcadeDB was created by Luca Garulli, the same founder of OrientDB, and written
from scratch after SAP's acquisition of OrientDB in 2020. ArcadeDB reuses
OrientDB's SQL engine (now heavily modified) and some utility classes.

OrientDB source code is available at: https://github.com/orientechnologies/orientdb

================================================================================

This product includes openCypher grammar files and Technology Compatibility Kit
(TCK) test suite, developed by the openCypher community.

Copyright (c) "Neo4j"
Neo4j Sweden AB [https://neo4j.com]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

openCypher project website: https://opencypher.org/

Cypher® is a registered trademark of Neo4j Inc.

The included openCypher TCK tests contain the following attribution notice:

"This work was created by the collective efforts of the openCypher community.
Without limiting the terms of Section 6, any Derivative Work that is not
approved by the public consensus process of the openCypher Implementers Group
should not be described as 'Cypher' (and Cypher® is a registered trademark of
Neo4j Inc.) or as 'openCypher'. Extensions by implementers or prototypes or
proposals for change that have been documented or implemented should only be
described as 'implementation extensions to Cypher' or as 'proposed changes to
Cypher that are not yet approved by the openCypher community'."

The following files contain openCypher content:

  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Parser.g4
  • engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Lexer.g4
  • engine/src/test/resources/opencypher/tck/features/**/*.feature (220 test files)

This product includes software from the OpenCypher-Gremlin project.

Copyright 2018-2019 Neueda

The following file is a security-patched override of unmaintained upstream code:
gremlin/src/main/java/org/opencypher/gremlin/translation/groovy/StringTranslationUtils.java

OpenCypher-Gremlin project: https://github.com/opencypher/cypher-for-gremlin

================================================================================

This product includes software developed by The Apache Software Foundation
(https://www.apache.org/).

Apache Lucene
Copyright 2001-2024 The Apache Software Foundation

This product includes software developed by
The Apache Software Foundation (https://www.apache.org/).

Includes software from other Apache copyright owners, as follows:

  • ICU4J, (c) 1995-2014 International Business Machines Corporation and others
  • ANTLR4 grammar from C parser (BSD 3-clause license)
  • JFlex lexer generator (BSD license)

Apache ANTLR4
Copyright 2012-2021 The ANTLR Project

Apache TinkerPop (Gremlin)
Copyright 2015-2024 The Apache Software Foundation

Apache Commons Compress
Copyright 2002-2024 The Apache Software Foundation

Apache Commons Lang
Copyright 2001-2024 The Apache Software Foundation

Apache Commons Codec
Copyright 2002-2024 The Apache Software Foundation

Apache HttpComponents
Copyright 1999-2024 The Apache Software Foundation

================================================================================

This product includes software developed by GraalVM (Oracle).

GraalVM SDK, Truffle API, and JavaScript implementation
Copyright (c) 2012, 2024, Oracle and/or its affiliates

https://www.graalvm.org/

================================================================================

This product includes software developed by Google Inc.

Protocol Buffers
Copyright 2008 Google Inc.

Guava Libraries
Copyright (C) 2010-2024 The Guava Authors

Gson
Copyright 2008-2024 Google Inc.

gRPC
Copyright 2015-2024 The gRPC Authors

================================================================================

This product includes software developed by Red Hat, Inc.

Undertow
Copyright 2013 Red Hat, Inc., and individual contributors

https://undertow.io/

================================================================================

This product includes software developed by the Netty project.

Netty
Copyright 2014 The Netty Project

https://netty.io/

================================================================================

This product includes JUnit 5 test framework.

JUnit 5
Copyright 2015-2024 the original author or authors

https://junit.org/junit5/

================================================================================

This product includes Testcontainers framework.

Testcontainers
Copyright (c) 2015-2024 Richard North and other authors

https://testcontainers.com/

================================================================================

This product includes software developed by FasterXML, LLC.

Jackson JSON processor
Copyright (c) 2007-2024 Tatu Saloranta and other contributors

https://github.com/FasterXML/jackson

================================================================================

For the complete list of third-party components and their licenses, see the
ATTRIBUTIONS.md file in the root directory of this distribution.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

ArcadeDB

Multi Model DBMS Built for Extreme Performance

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

1
2
3
4
5

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

ArcadeDB

Multi Model DBMS Built for Extreme Performance

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

ArcadeDB

1 similar comment
@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

ArcadeDB

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Test Review

Heading 1

Some content here.

Heading 2

More content.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

short text ### "hello"

@claude
Copy link
Copy Markdown

claude Bot commented Mar 18, 2026

Code Review

The prior reviews covered the main design issues well. This pass focuses on a few items not yet addressed.

Bug: Double-close risk on GraphBatch when transport error follows server error

When onNext throws (e.g. unknown temp ID), the server calls resp.onError() and then closeQuietly(batchRef.get()). The batch is now closed. However, if a transport error (client disconnect, network reset) arrives shortly after, gRPC will invoke the server-side onError callback on the request stream, which calls closeQuietly(batchRef.get()) on the already-closed batch.

GraphBatch.close() calls flush(), connectDeferredIncomingEdges(), and batchUpdateVertexHeadChunks(). If any of these are non-idempotent (edge buffer state not zeroed after close), a second call can double-commit deferred incoming edges or corrupt vertex head pointers.

Fix: null out the reference immediately after closing, so a second call is a no-op:

closeQuietly(batchRef.getAndSet(null));

Use this everywhere closeQuietly is currently called (onNext catch, onError, onCompleted catch).

Missing validation: type_name is never checked for blank

If a client sends a GraphBatchRecord without setting type_name (proto3 default ""), the call propagates to GraphBatch.createVertices("", ...) and fails with a cryptic schema exception or NPE deep inside the engine. A guard at the record-processing level produces a clear error:

if (rec.getTypeName().isEmpty())
throw new IllegalArgumentException("GraphBatchRecord is missing type_name");

Minor: Test query assertion fragile for multi-page results

assertThat(queryResp.getResultsList().get(0).getRecordsList()).hasSize(2);

getResultsList().get(0) assumes a single-page response. For 2 records this is fine in practice, but the intent is clearer if aggregated across all pages:

final long total = queryResp.getResultsList().stream()
.mapToLong(p -> p.getRecordsCount()).sum();
assertThat(total).isEqualTo(2);

Positive notes

  • The proto3 optional bool fix is correct and makes configureGraphBatchOptions sound
  • The resolveRef null/empty guard is the right fix and produces a clear error message
  • The errorSent[0] flag correctly prevents call.request(1) after an error
  • The six integration tests now cover all stated edge cases (happy path, vertices-only, empty stream, unknown temp ID, ordering violation, direct RID references)
  • Flow control via disableAutoInboundFlowControl + request(1) is the correct back-pressure pattern

@robfrank robfrank linked an issue Mar 18, 2026 that may be closed by this pull request
@robfrank robfrank changed the title feat(grpc): add GraphBatchLoad RPC for bulk graph loading #3678 feat(grpc): add GraphBatchLoad RPC for bulk graph loading Mar 18, 2026
@robfrank robfrank merged commit 2d0afdc into main Mar 18, 2026
20 of 23 checks passed
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Mar 18, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-15.68% 0.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ca9db5a) 110408 81751 74.04%
Head commit (e240369) 141900 (+31492) 82824 (+1073) 58.37% (-15.68%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3680) 138 0 0.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 68.57143% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.62%. Comparing base (d3e0447) to head (425efec).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
.../com/arcadedb/server/grpc/ArcadeDbGrpcService.java 68.57% 29 Missing and 15 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3680      +/-   ##
==========================================
- Coverage   65.63%   65.62%   -0.01%     
==========================================
  Files        1557     1557              
  Lines      111125   111265     +140     
  Branches    23299    23333      +34     
==========================================
+ Hits        72933    73019      +86     
- Misses      28354    28384      +30     
- Partials     9838     9862      +24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GraphBatchLoad client-streaming RPC to gRPC module

1 participant