Skip to content

Graph Analytical View (GAV) — CSR-based OLAP acceleration#3618

Merged
lvca merged 70 commits intomainfrom
grap-olap
Mar 15, 2026
Merged

Graph Analytical View (GAV) — CSR-based OLAP acceleration#3618
lvca merged 70 commits intomainfrom
grap-olap

Conversation

@lvca
Copy link
Copy Markdown
Member

@lvca lvca commented Mar 11, 2026

Issue #3617

Introduces Graph Analytical View (GAV), a CSR (Compressed Sparse Row) based analytical overlay on top of the OLTP graph engine. GAV provides 50-70x speedup for graph traversals by maintaining a columnar,
cache-friendly adjacency index in memory alongside the transactional graph.

Key Features

Core GAV Engine (com.arcadedb.graph.olap)

  • GraphAnalyticalView — main entry point with builder pattern
  • CSRAdjacencyIndex + CSRBuilder — double-indexed (forward + backward) sorted neighbor lists
  • NodeIdMapping — per-bucket RID↔internal ID mapping with binary search (~8 bytes/entry)
  • ColumnStore / Column / DictionaryEncoding — columnar property storage with null bitmaps
  • DeltaCollector + DeltaOverlay — SYNCHRONOUS mode captures TX deltas via listeners, applies overlay with zero stale window
  • GraphAlgorithms — PageRank, ArticleRank, connected components, etc. running on CSR
  • SIMD-accelerated vector operations via jdk.incubator.vector with scalar fallback

Update Modes

  • OFF — marks STALE on commit, requires manual rebuild
  • SYNCHRONOUS — overlay-based, no stale window
  • ASYNCHRONOUS — async rebuild on commit

Query Planner Integration

  • GraphTraversalProvider interface + GraphTraversalProviderRegistry — SPI in engine module for query planner to discover GAV providers
  • Cypher optimizer path: GAVExpandAll / GAVExpandInto operators when provider is ready and edge variable not captured
  • Cypher traditional path: MatchRelationshipStep fast path uses GAV/CSR for vertex-only traversal
  • Smart GAV eligibility: computeHopEdgeTrackingNeeds() analyzes per-hop edge type overlap; isEdgeVariableReferenced() detects unused edge variables across all clauses (RETURN, WITH, WHERE, ORDER BY, UNWIND, SET,
    DELETE)
  • Edge type short-circuit: returns immediately if edge type doesn't exist in schema
  • shortestPath() and move() SQL functions also leverage GAV when available

SQL Commands

  • CREATE GRAPH ANALYTICAL VIEW name ON database ...
  • ALTER GRAPH ANALYTICAL VIEW name ...
  • DROP GRAPH ANALYTICAL VIEW name
  • REBUILD GRAPH ANALYTICAL VIEW name
  • Schema persistence via Schema.getExtension() / setExtension() + auto-restore on database open

Graph Algorithms (50+ procedures refactored to use GAV)

  • All algo.* procedures now extend AbstractAlgoProcedure for consistent GAV-accelerated neighbor iteration
  • PageRank, ArticleRank, PersonalizedPageRank got dedicated CSR-optimized paths

Studio UI

  • New "Graph Analytical View" panel in database page
  • Create, rebuild, alter, drop views from the UI
  • Status display (NOT_BUILT, BUILDING, READY, STALE)

Server Integration

  • Auto-restore GAVs on server database open
  • Auto-close GAVs on database close
  • HTTP API endpoints for CRUD operations

Tests

  • GraphAnalyticalViewTest — 2,445 lines covering builder, update modes, delta overlay, persistence, algorithms
  • CSRPerformanceTest, GraphOlapVectorOpsTest, GraphAlgorithmsTest
  • MatchRelationshipStepProfilingTest — 13 tests for GAV usage in query plans (fast path, standard path, edge uniqueness, unused variables)
  • HTTPGraphAnalyticalViewIT — server integration tests
  • GraphOLAPBenchmark — performance benchmarks
  • All 5,604 OpenCypher TCK + custom tests pass (1 pre-existing failure unrelated to GAV)

Comment thread studio/src/main/resources/static/js/studio-database.js Fixed
@lvca lvca changed the title Graph olap -> Graph Analytical View perf: enable GAV/CSR fast path for common Cypher relationship traversals Mar 11, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting /gemini summary.

@lvca lvca changed the title perf: enable GAV/CSR fast path for common Cypher relationship traversals Graph Analytical View (GAV) — CSR-based OLAP acceleration Mar 11, 2026
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Mar 14, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-9.68% 77.97%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (dbe0eb9) 105372 78502 74.50%
Head commit (5453b82) 140399 (+35027) 91010 (+12508) 64.82% (-9.68%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3618) 4776 3724 77.97%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review: Graph Analytical View (GAV)

Impressive and ambitious feature. The SPI abstraction is clean, the delta overlay design is clever, and the test coverage is thorough. Issues found, by severity:


HIGH SEVERITY

1. compacting flag never reset on RejectedExecutionException - GraphAnalyticalView.applyDelta()

After compacting.compareAndSet(false, true), if the executor submit throws RejectedExecutionException (executor shut down concurrently), the exception is likely swallowed in the post-commit callback. The compacting flag stays true permanently, silently disabling all future compaction for this view instance. Fix: wrap the submit in try/catch and call compacting.set(false) in the catch.

2. O(N^2) compaction cost per commit - DeltaOverlay.merge()

merge() deep-copies the entire overlay state on every SYNCHRONOUS commit, even for single-edge transactions. For N commits before the 10000-edge compaction threshold, total work is O(N^2). Under sustained write rates this causes progressive latency and GC pressure. Consider a lazy-fold structure or a lower compaction threshold.


MEDIUM SEVERITY

3. closeExecutor() holds class lock for 30 seconds - GraphAnalyticalView

The synchronized(GraphAnalyticalView.class) block includes awaitTermination(30, SECONDS), blocking buildAsync() and onRelevantCommit() callers for up to 30 seconds during shutdown. Null the static field inside the lock and drain outside it.

4. GAV disabled for all multi-hop anonymous patterns - MatchRelationshipStep.canUseFastPath()

Removing isInternalAnonymousVar blocks GAV for MATCH (a)-[*2..4]->(b) patterns. Space-prefixed anonymous variables do not capture edge data; they only track uniqueness. Consider whether uniqueness tracking can use internal dense IDs instead of full edge objects to re-enable CSR acceleration.

5. HashMap autoboxes int on every edge in NodeIdMapping.getGlobalId()

bucketIdToIdx.get(rid.getBucketId()) autoboxes int on every call - O(E) boxing during CSRBuilder Phase B. Since ArcadeDB bucket IDs are small contiguous integers, a plain int array lookup table would eliminate all boxing and align with the zero-GC goal.

6. One RID allocation per neighbor in the hot query loop - NodeIdMapping.getRID()

provider.getRID(neighbors) allocates one RID per result row. For analytics returning millions of neighbors this contradicts the zero-GC design goal. Lazy materialization (allocate RID only when it escapes the step) would help significantly.


LOW SEVERITY

7. propertySampleSize not volatile but read from background threads

Safe today (written before async work starts) but JMM does not formally guarantee visibility. Making it final (injected via constructor) would make the guarantee explicit.

8. Subtle JMM ordering not documented - MatchRelationshipStep.resolveGavProvider()

The two separate volatile writes (gavProvider then gavProviderResolved) are correct per JMM: a reader seeing gavProviderResolved == true is guaranteed to see the preceding write to gavProvider. This reasoning is non-obvious; a comment explaining the happens-before chain would prevent a future regression.

9. ConcurrentHashMap inside GraphAnalyticalViewRegistry adds no value

All access is guarded by synchronized(REGISTRY). The inner ConcurrentHashMap could be a plain HashMap.

10. Self-loop dedup relies on undocumented OLTP invariant - MatchRelationshipStep.getVertices()

The invariant that each self-loop produces exactly 2 entries is not asserted or referenced. If the OLTP storage layer changes, selfLoopSeen % 2 == 0 silently produces wrong results.

11. WeakHashMap safety-net claim misleading - GraphTraversalProviderRegistry

Providers hold strong references back to the database key so the WeakHashMap behaves as a regular HashMap during normal operation. The Javadoc should clarify the safety net only activates after all providers are unregistered.


MINOR

  • DeltaOverlay constructor takes BitSet by reference without defensive copy; future callers could mutate a live overlay.
  • NodeIdMapping.bucketIdxForNaturalId() returns 0 when numBuckets==0 without a guard.
  • DeltaOverlay.packEdge() sign-extension for src is correct but non-obvious; a short comment would prevent a future incorrect fix.
  • Static EXECUTOR has no shutdown hook fallback; if closeExecutor() is never called in test isolation, the virtual-thread executor leaks for JVM lifetime.

Overall this is high-quality work. Main concerns before merging: compaction flag stuck-state bug (1), quadratic merge cost (2), and per-neighbor RID allocation (6) which causes GC pressure contradicting the zero-GC design principle.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

PR Review: Graph Analytical View (GAV) — CSR-based OLAP acceleration

This is an impressive and well-designed feature. The CSR-backed OLAP overlay is a solid approach to bridging OLTP and analytical workloads. The architecture, documentation, and test coverage are all notably thorough. Below are observations across correctness, performance, and style.


✅ Strengths

  • Snapshot pattern: The single volatile Snapshot reference in GraphAnalyticalView is a clean, correct approach to ensuring readers always see a consistent state across concurrent writes.
  • Thread-safe delta tracking: Using a ConcurrentHashMap keyed by thread ID in DeltaCollector (instead of ThreadLocal) correctly avoids the well-known ThreadLocal leak in long-lived thread pools.
  • SIMD with graceful fallback: GraphOlapVectorOpsProvider properly catches LinkageError and falls back to the scalar implementation — essential since jdk.incubator.vector requires explicit JVM flags.
  • Compaction buffering: The pendingDeltas pattern in applyDelta() to re-apply transaction deltas after a CSR rebuild is subtle but correct.
  • Comprehensive test coverage: 2,445 lines of GAV tests including delta overlay, persistence, and concurrent update modes.
  • Security: All new SQL DDL statements (CREATE/ALTER/DROP/REBUILD GRAPH ANALYTICAL VIEW) correctly call database.checkPermissionsOnDatabase(SecurityDatabaseUser.DATABASE_ACCESS.UPDATE_SCHEMA).

🐛 Potential Bugs

1. useWhenStale is not volatile (thread-safety issue)

GraphAnalyticalView.java, line 187:

private       boolean    useWhenStale;

This field is read by the query planner thread via isReady() (line 829) but written by user code via setUseWhenStale() (line 841). Without volatile, changes are not guaranteed to be visible across threads, creating a data race. It should be declared volatile boolean useWhenStale.

2. Import ordering in LocalDatabase.java

Lines 56–59 insert GAV-related imports out of alphabetical order within the com.arcadedb.graph package block:

import com.arcadedb.graph.Vertex;
import com.arcadedb.graph.VertexInternal;
import com.arcadedb.graph.GraphTraversalProviderRegistry;   // ← out of order
import com.arcadedb.graph.olap.GraphAnalyticalView;
...

These should be sorted alphabetically with the other com.arcadedb.graph.* imports above Vertex and VertexInternal.

3. DeltaOverlay.merge() unchecked array creation

DeltaOverlay.java uses Map<String, Object>[] which requires an unchecked cast from a generic array. While annotated with @SuppressWarnings("unchecked"), this pattern is fragile and can cause heap pollution. Consider using a List<Map<String, Object>> instead of an array to avoid this entirely.


⚠️ Design / Performance Considerations

4. DeltaOverlay memory growth between compactions

Each merge() call creates new copies of all internal maps (addedEdgesPerType, deletedEdgesPerType, neighbor indexes, etc.). Under high write throughput, the overlay can grow significantly between compaction cycles (default threshold: 10,000 delta edges). Consider whether an incremental/in-place update strategy could reduce GC pressure here, even if the result is mutable (since it's swapped atomically anyway).

5. Virtual thread executor shutdown timing

GraphAnalyticalView.closeExecutor() waits 30 seconds for async builds to finish before calling shutdownNow(). The 30-second window applies to all databases simultaneously (the executor is shared). On JVM shutdown with many in-progress builds, this could add 30 seconds to the shutdown time. Consider making the timeout configurable via GlobalConfiguration, or documenting the shutdown latency clearly in the javadoc.

6. onRelevantCommit() resets readyLatch inside a synchronized block but does NOT use buildQueued

buildAsync() (line 287) uses buildQueued.compareAndSet to avoid queuing duplicate builds. But onRelevantCommit() (line 1110) uses a different guard (compacting.compareAndSet). If an ASYNCHRONOUS view has an in-progress rebuild triggered by onRelevantCommit(), a direct call to buildAsync() could queue a second build. The two guards should be unified or cross-checked.

7. Property sampling may silently exclude late-appearing properties

CSRBuilder samples only the first propertySampleSize (default: 100) vertices to detect property types. Vertices that appear later in bucket order and have properties not seen in the sample are silently excluded from the columnar store. This is documented, but consider at minimum logging a WARNING when a property is observed outside the sample window.


📝 Minor / Style

  • GraphAnalyticalView has a public no-arg constructor (line 218) left for "backward compatibility" but this appears to be a new class with no prior API — consider removing it to avoid confusion.
  • FetchFromSchemaGraphAnalyticalViewsStep — good addition for the SELECT metadata path; confirm it has a test covering the STALE status in results.
  • The GAV_USE_WHEN_STALE config defaults to false which is the safe default. The javadoc is clear. Worth noting in release notes that this is a per-database configuration knob.
  • CSRScanOperator with BOTH direction throws IllegalArgumentException mentioning a NeighborView parameter that doesn't exist in the API — the error message should be updated to reflect actual usage.

Summary

The core design is sound and the implementation is careful about the tricky concurrency scenarios (snapshot atomicity, delta buffering during compaction, SIMD fallback). The main actionable items before merge are:

  1. Fix: Make useWhenStale volatile
  2. Fix: Sort imports in LocalDatabase.java
  3. Minor: Fix CSRScanOperator error message for BOTH direction
  4. Consider: Reducing DeltaOverlay memory churn between compactions

Great work overall — this is a well-thought-out feature with real performance impact.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

GAV Code Review test

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Graph Analytical View (GAV) - Code Review

This is a substantial and well-engineered feature addition. The architecture is sound and the implementation shows deep understanding of both graph OLAP patterns and ArcadeDB internals.

Note: Replacing the test comment above with this full review.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Graph Analytical View (GAV) Code Review (1/4 - Architecture)

STRENGTHS: CSR double-indexed layout ideal for OLAP. Snapshot atomic-swap gives lock-free reads. hasAnyProviders fast-path avoids sync cost. DeltaCollector thread-ID-keyed ConcurrentHashMap avoids ThreadLocal leaks. Schema extension persistence is clean.

CONCERNS:
(1) jdk.incubator.vector is incubating and requires --add-modules jdk.incubator.vector at compile+runtime. Not visible in diff -- if missing from engine pom.xml, SIMD silently falls back to scalar. Please verify.
(2) closeExecutor() shared across all databases -- the 30-second drain on last-DB-close can stall server shutdown if an ASYNC build is running. Document as known limitation or use per-database executors.
(3) Three volatile fields in MatchRelationshipStep (gavProvider, gavProviderResolved, gavProviderDebug) are tagged single-threaded but declared volatile. Remove volatiles or adopt double-checked locking for clarity.
(4) DeltaOverlay raw Map-array forces @SuppressWarnings on entire constructors. Narrow scope to just the cast.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Graph Analytical View (GAV) Code Review (2/4 - Thread Safety and Performance)

THREAD SAFETY:
(5) restoreAll() calls clearAll(database) then rebuilds. If two threads open the same database path concurrently this creates a race between clearAll and subsequent register calls in the builder. Verify the LocalDatabase write lock covers the entire open() including restoreAll.
(6) DeltaCollector.scheduleSyncCallback captures delta by reference in the lambda (not a snapshot). The comment explains the frozen copy is made at commit time, which is correct, but this relies on record event callbacks for a single transaction context being called from a single thread. Assert or document this assumption explicitly.

PERFORMANCE:
(7) GraphTraversalProviderRegistry.getProviders() creates a new ArrayList snapshot inside a synchronized block on every call. This allocation occurs on every query traversal when a provider is registered. CopyOnWriteArrayList is already thread-safe for reads -- return an unmodifiable view outside the synchronized block to eliminate the per-query allocation.
(8) GraphAlgorithms.parallelForRange() creates new Thread[] and spawns daemon threads on every invocation. For algorithms called repeatedly (e.g., PageRank in a procedure loop), thread-creation overhead is significant. The shared GAV virtual-thread executor would be a better fit here.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Graph Analytical View (GAV) Code Review (3/4 - Correctness and Code Style)

CORRECTNESS:
(9) CSRAdjacencyIndex has no bounds checking on nodeId. outDegree(nodeId) will throw ArrayIndexOutOfBoundsException if nodeId >= nodeCount with no informative error message. Add a defensive check or assert in dev mode to help diagnose edge cases with overflow nodes from DeltaOverlay.
(10) The bwdToFwd map in Snapshot is stored but its usage is not visible in the reviewed diff. If currently unused, remove it to avoid dead code.
(11) Schema short-circuit in MatchRelationshipStep is now lazy (checked after first pull from predecessor). A test covering MERGE followed by MATCH on the same type in the same query would be a valuable regression guard.

CODE STYLE (per CLAUDE.md):
(12) Import ordering in GraphAnalyticalView.java: java.util.HashMap appears out of alphabetical order relative to other java.util imports.
(13) The hardcoded DEFAULT_AVG_DEGREE = 10.0 in CypherOptimizer will make GAV-backed expand appear cheaper than OLTP expand even when actual average degree is below 10 (sparse graphs). GAV could be selected even when OLTP is faster. Please track as a known limitation and consider a configurable override via GlobalConfiguration.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Graph Analytical View (GAV) Code Review (4/4 - Tests, Security, Summary)

TEST COVERAGE:
(14) CSRPerformanceTest appears to belong in the performance/ package (like the existing GraphOLAPBenchmark) to exclude it from regular CI runs.
(15) MatchRelationshipStepProfilingTest introspects internal counters (fastPathCount, gavPathCount). These are useful for verifying the optimization is triggered but fragile against refactoring. Consider complementing with behavioral tests that assert correct query results.
(16) No test covers concurrent reads during ASYNCHRONOUS rebuild -- a STALE-mode query should return correct pre-commit data, not corrupted partial state.

SECURITY: No concerns identified. HTTP endpoints for GAV CRUD will be protected by the existing DatabaseAbstractHandler authentication/authorization layer.

SUMMARY - Key action items before merge:

  • CRITICAL: Verify --add-modules jdk.incubator.vector is in engine pom.xml compiler and runtime arguments.
  • IMPORTANT: Clarify volatile fields in MatchRelationshipStep -- remove if single-threaded, or document why needed.
  • IMPORTANT: Move CSRPerformanceTest to performance/ package.
  • MINOR: Document the shared-executor 30-second shutdown stall as a known operational limitation.
  • MINOR: Track DEFAULT_AVG_DEGREE = 10.0 cost model assumption as a follow-up for statistics-based estimation.

Overall: very well done. The CSR + delta-overlay design and query planner integration are particularly impressive pieces of work.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Mar 15, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-9.38% 77.79%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (9212b12) 105406 78666 74.63%
Head commit (5e66cdd) 140465 (+35059) 91661 (+12995) 65.26% (-9.38%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3618) 4808 3740 77.79%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review — Graph Analytical View (GAV) CSR-based OLAP Acceleration

This is a substantial and architecturally sound feature. The core design — immutable Snapshot published via a single volatile write, delta overlay for SYNCHRONOUS mode, per-bucket CSR alignment with ArcadeDB's storage model — is well thought-out. The comments below are ranked by severity.


🔴 Bug — useWhenStale missing volatile

File: GraphAnalyticalView.java:187

private       boolean    useWhenStale;        // ← not volatile
private volatile UpdateMode updateMode;        // ← volatile (correct)

useWhenStale is read from query-planner threads via isReady()/isStale() but written by user code (e.g. ALTER GRAPH ANALYTICAL VIEW). Without volatile, the JVM can cache the stale value in a register indefinitely. Should be volatile like updateMode.


🔴 Bug — isConnectedOLTP() BOTH-direction logic is incorrect

File: GAVExpandInto.java:130-144

private boolean isConnectedOLTP(final Vertex source, final Vertex target) {
    final Vertex.DIRECTION arcadeDirection = direction.toArcadeDirection();
    for (final Edge edge : source.getEdges(arcadeDirection, edgeTypes)) {
        // When arcadeDirection == BOTH, this evaluates to edge.getOutVertex() (the else-branch)
        final Vertex other = arcadeDirection == Vertex.DIRECTION.OUT ? edge.getInVertex() : edge.getOutVertex();
        if (other.getIdentity().equals(target.getIdentity()))
            return true;
        if (arcadeDirection == Vertex.DIRECTION.BOTH) {
            // This re-checks both endpoints — so `other` above only checked the OUT vertex
            final Vertex out = edge.getOutVertex();
            final Vertex in = edge.getInVertex();
            if (out.getIdentity().equals(target.getIdentity()) || in.getIdentity().equals(target.getIdentity()))
                return true;
        }
    }
    return false;
}

For BOTH direction: the other variable resolves to edge.getOutVertex() (since the ternary has no BOTH branch). The first if short-circuits on the out-vertex only; then the BOTH block re-checks both. The first check is redundant for BOTH direction. More importantly, the source.getEdges(BOTH, ...) returns edges where source can be either IN or OUT, so checking only edge.getOutVertex() first may miss valid connections. The logic should be simplified: for BOTH, always check both endpoints; remove the confused initial other check.

This operator is also 0% covered by tests (per codecov), so this bug would go undetected.


🟠 Performance — O(N²) DeltaOverlay growth in SYNCHRONOUS mode

File: DeltaOverlay.java:125-215

merge() deep-copies every collection from the previous overlay on every committed transaction:

final Map<RID, Integer> newOverflowIds = new HashMap<>(overflowNodeIds);         // O(overflow nodes)
final List<RID> overflowRIDsList = new ArrayList<>(Arrays.asList(overflowIdToRID)); // O(overflow nodes)
for (final var entry : addedEdgesPerType.entrySet())
    newAddedEdges.put(entry.getKey(), new ArrayList<>(entry.getValue()));          // O(total added edges)

After k transactions in SYNCHRONOUS mode, the i-th copy costs O(i × edges/tx), making the total O(k² × edges/tx) before compaction. For a 10,000-edge compaction threshold with 10 edges/tx, you copy ~500K edge entries across 1,000 transactions.

A persistent/structural approach (e.g. a chain of delta layers without copying, or an append-only edge list), or at minimum a smaller compaction threshold default, would help. As-is, SYNCHRONOUS mode can cause sustained GC pressure on high write-throughput workloads — directly contrary to the project's zero-GC goal.


🟠 Performance — getProviders() allocates on every traversal

File: GraphTraversalProviderRegistry.java:85-93

synchronized (REGISTRY) {
    final CopyOnWriteArrayList<GraphTraversalProvider> list = REGISTRY.get(key);
    return list != null ? Collections.unmodifiableList(new ArrayList<>(list)) : Collections.emptyList();
}

This is called on the hot query path. Since the inner list is already a CopyOnWriteArrayList whose iterator snapshot is thread-safe outside the lock, wrapping it again in new ArrayList<> allocates a new object on every traversal. findProvider() (also in this file) handles iteration without an extra allocation — getProviders() should do the same or return the COWAL directly.


🟠 Missing — GAVExpandInto has zero test coverage

File: GAVExpandInto.java — 0% per codecov

The ExpandInto operator handles the MATCH (a)-[r]-(b) WHERE id(a)=$x AND id(b)=$y pattern. Without tests:

  • The BOTH-direction bug above is invisible
  • OLTP fallback path (when one endpoint is outside the GAV mapping) is untested
  • Edge variable not captured path is untested

Please add at least one test verifying (a) CSR fast path, (b) OLTP fallback, (c) BOTH direction.


🟠 Design — closeExecutor() holds class lock for 30 s

File: GraphAnalyticalView.java:120-134

public static void closeExecutor() {
    synchronized (GraphAnalyticalView.class) {
        // ...
        if (!exec.awaitTermination(30, TimeUnit.SECONDS))  // holds class lock for up to 30 s
            exec.shutdownNow();
    }
}

Any thread that calls getExecutor() during this window blocks for up to 30 seconds. If multiple databases close concurrently (e.g. during server shutdown), each will contend on this lock. Consider releasing the class lock before awaitTermination:

public static void closeExecutor() {
    final ExecutorService exec;
    synchronized (GraphAnalyticalView.class) {
        exec = EXECUTOR;
        if (exec == null || exec.isShutdown()) return;
        exec.shutdown();
        EXECUTOR = null;
    }
    // wait outside the lock
    try {
        if (!exec.awaitTermination(30, TimeUnit.SECONDS))
            exec.shutdownNow();
    } catch (final InterruptedException e) {
        exec.shutdownNow();
        Thread.currentThread().interrupt();
    }
}

🟡 Potential issue — onRelevantCommit() and applyDelta() use different guards

File: GraphAnalyticalView.java:1118-1162 and 1200-1266

onRelevantCommit() (ASYNCHRONOUS mode) guards with compacting.compareAndSet(false, true). applyDelta() (SYNCHRONOUS mode) also guards compaction with compacting.compareAndSet(false, true). These flags are shared, but the guards mean different things in each context:

  • In ASYNCHRONOUS: compacting prevents duplicate full rebuilds
  • In SYNCHRONOUS: compacting prevents duplicate compaction tasks

If the update mode is changed from ASYNCHRONOUS to SYNCHRONOUS at runtime while a rebuild is in progress, the compacting guard may misfire. Consider whether buildQueued and compacting should be merged into a single state machine to make transitions explicit and safe.


🟡 Potential dead code — bwdToFwd in Snapshot

File: GraphAnalyticalView.java:154

final Map<String, int[]>  bwdToFwd;

This field is stored in every Snapshot but the search agent could not find usage sites. If it's unused, it adds memory overhead per snapshot swap. If it's intended for a future feature, a TODO comment would clarify intent.


🟡 Minor — import ordering inconsistency

File: GraphAnalyticalView.java:36-50

Imports are not in alphabetical order (e.g. java.util.HashMap appears after java.util.List). The project's Prettier config enforces formatting for files with @format pragma — check whether this file should have that pragma.


✅ What's done well

  • Snapshot atomicity: the single volatile Snapshot reference is the right pattern for publishing compound CSR state without readers seeing partial updates.
  • NodeIdMapping design: replacing HashMap<Integer, Integer> with sorted long[] arrays and binary search is exactly the right trade-off — avoids autoboxing on the hot path and reduces memory by ~4× per node.
  • DeltaOverlay secondary indexes: the neighbor index and deleted-edge count index are O(1) lookups built once per merge, correctly amortizing the cost.
  • RejectedExecutionException handling: all three async submission sites (buildAsync, onRelevantCommit, applyDelta) correctly reset their guard flags in the catch block — no permanent lock-out.
  • Test coverage: 2,445-line GraphAnalyticalViewTest covering builder, persistence, delta overlay, and algorithms is impressive. The profiling tests for query plan selection are a nice touch.

@lvca lvca merged commit 19f5635 into main Mar 15, 2026
8 of 13 checks passed
@lvca lvca deleted the grap-olap branch March 15, 2026 20:29
@claude
Copy link
Copy Markdown

claude Bot commented Mar 15, 2026

Code Review: Graph Analytical View (GAV) — CSR-based OLAP acceleration

This is a large and impressive feature adding a CSR-based OLAP overlay to ArcadeDB. The architecture is well-thought-out, with clean separation between the OLTP graph, the analytical layer, and the query planner integration. Below are observations grouped by severity.


Critical / Correctness Issues

1. Silent delta loss on buffer overflow (GraphAnalyticalView.applyDelta())

When pendingDeltas exceeds MAX_PENDING_DELTAS, the buffer is set to null. Subsequent calls to applyDelta() skip buffering entirely due to the if (pendingDeltas != null) guard, silently discarding committed transaction changes without any warning. A stale view is expected in OFF mode, but in SYNCHRONOUS mode this means the overlay diverges silently from reality. Consider logging a warning when deltas are dropped, and explicitly transitioning to STALE state in this scenario.

2. Mutable collections in DeltaOverlay private constructor

The private constructor accepts HashMap, HashSet, ArrayList etc. and stores them without defensive copying. The merge() method constructs new collections from the old ones — if the caller retains a reference and mutates after construction, overlay state can be corrupted. Since the class is package-private this is manageable, but worth an explicit comment or assertion that the inputs are not retained by the caller post-construction.

3. NodeIdMapping.addNode() vs compact() race

addNode() checks if (positionsBuilder == null) without synchronization. If compact() is called concurrently from another thread, positionsBuilder can become null between the null-check and the array access, causing a NullPointerException or data corruption. This is only relevant during the build phase if builds can run concurrently — likely safe today, but fragile for future changes.

4. CSRAdjacencyIndex — no bounds validation on degree/offset accessors

outDegree(nodeId), inDegree(nodeId), outOffset(nodeId), etc. access fwdOffsets[nodeId + 1] and bwdOffsets[nodeId + 1] without checking that nodeId < nodeCount. An invalid nodeId will throw ArrayIndexOutOfBoundsException with no useful message. A precondition check would improve debuggability significantly.


Concurrency / Threading Issues

5. GraphAnalyticalView.setUpdateMode() — listener registration gap

setUpdateMode() synchronizes on this, but the interleaved sequence of unregisterChangeListeners() -> async build -> registerChangeListeners() creates a window where committed transactions are not captured. Transactions during the build period could produce TxDeltas that arrive after unregister but before register, causing those changes to be missed in SYNCHRONOUS mode.

6. Global executor shutdown timeout

closeExecutor() uses a 30-second drain timeout that applies across all databases. If one database closes with a long-running build in progress, another database closing simultaneously may have its builds forcibly terminated. Consider per-database executor instances, or at minimum document this shared-pool behavior.

7. hasAnyProviders flag in GraphTraversalProviderRegistry

The fast-path volatile flag is read without holding synchronized(REGISTRY). While the volatile guarantees visibility, the flag can be transiently stale. This will not cause correctness issues (an empty list just means no CSR acceleration), but a comment explaining why the TOCTOU is acceptable would help future maintainers.


Performance Considerations

8. NodeIdMapping — binary search for bucket index on every lookup

bucketIdxForNaturalId() does a binary search on bucketBase for every call to getGlobalId(). For queries doing millions of vertex lookups this adds O(log m) per lookup where m is the number of buckets. The bucketIdToIdx[] array already exists for O(1) lookup — verify it is used on the hot path and the binary search is only a fallback.

9. Unbounded bucketIdToIdx growth for sparse bucket IDs

If a database uses sparse bucket IDs (e.g., bucket ID = 1,000,000), bucketIdToIdx is allocated as a 1M-entry array of mostly -1 values (~4 MB per view). With multiple GAVs this could add up. A HashMap fallback for sparse bucket ranges would be safer.

10. DeltaOverlay.buildNeighborIndex() — unbounded array growth

Neighbor arrays in the delta index start at size 5 and double on overflow. For a node with many new edges in a single transaction, repeated doubling can cause significant temporary allocations. Consider using an initial capacity hint based on the degree observed during the TxDelta scan.


Design / Code Quality

11. GraphAnalyticalView.closeExecutor() is static with global state

The shared executor EXECUTOR is a static field. If users embed multiple ArcadeDB instances in the same JVM, all databases share one executor pool, and closeExecutor() shuts it down for everyone. A reference-counted or per-isolate design would be safer.

12. DeltaOverlay — packEdge() magic constant

((long) src << 32) | (tgt & 0xFFFFFFFFL) — the 0xFFFFFFFFL mask is correct but unexplained. A named constant like UNSIGNED_INT_MASK with a comment explaining why it is needed (tgt can be negative as a signed int) would improve maintainability.

13. NodeIdMapping.applyReordering() — no permutation validation

If oldToNewMapping has duplicates or is otherwise invalid, newToOld[oldToNewMapping[i]] = i silently produces an inconsistent state. An assertion in debug/test builds would catch bugs early.

14. DB_AFTER_OPEN callback error handling

In LocalDatabase.open(), the DB_AFTER_OPEN callback failure is caught and logged at SEVERE, but execution continues. For a failure during open, this could leave the database in an inconsistent state. Document why continuing is safe, or propagate the exception to abort the open.

15. Grammar: new keywords not added to identifier rule

VERTEX, EDGE, REBUILD, and ANALYTICAL are used in the new DDL grammar but not added to the identifier fallback rule (only MODE, GRAPH, AUTO, PROPERTIES, COMPACTION, THRESHOLD were added). If any of these strings appear as user identifiers in queries (e.g., a property or type named vertex or edge), the parser will fail on existing queries. Each new keyword should either be added to identifier or confirmed safe to not appear in identifier positions.


Test Coverage

The test suite is thorough — GraphAnalyticalViewTest at 2,445 lines is excellent. A few additional scenarios worth covering:

  • Concurrent modification during rebuild — commit transactions while an async rebuild is in progress, verify the final view reflects all changes
  • Delta loss scenario — explicitly test the MAX_PENDING_DELTAS overflow path to verify STALE transition rather than silent divergence
  • Sparse bucket IDs — verify NodeIdMapping handles non-contiguous bucket IDs without excessive memory usage
  • Multi-GAV interactions — two GAVs on the same database sharing GraphTraversalProviderRegistry should not interfere with each other

Summary

High-quality, well-architected feature with good test coverage. Priority items to address:

  1. Silent delta loss in SYNCHRONOUS mode (item 1) — correctness issue
  2. Missing keywords in identifier rule (item 15) — could break existing queries with vertex/edge as identifiers
  3. Memory unboundedness for sparse bucket IDs (item 9)
  4. Concurrency edge cases in listener registration (item 5) and NodeIdMapping build phase (item 3)

The SPI design via GraphTraversalProvider, the builder pattern for GraphAnalyticalView, and the query planner integration (both optimizer and traditional paths) are all well-structured. The 50-70x traversal speedup claim is credible given the CSR architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants