Skip to content

fix: Cypher -> SET n:Label on same node across row fanout must be idempotent#4025

Merged
robfrank merged 3 commits intomainfrom
fix/4017-set-label-idempotent
Apr 29, 2026
Merged

fix: Cypher -> SET n:Label on same node across row fanout must be idempotent#4025
robfrank merged 3 commits intomainfrom
fix/4017-set-label-idempotent

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

Summary

Fixes #4017

  • SET n:Label crashed when the same node appeared on multiple rows (e.g. cartesian MATCH (n:Person) MATCH (m:Person) WHERE n.age > m.age SET n:Employee). Row 1 deleted the old vertex and created a new one with a fresh RID; row 2 still held the deleted vertex reference and failed with "Record not found" or "Duplicated key".
  • Added a labelReplacements: Map<RID, Vertex> field to the ResultSet anonymous class in SetStep.syncPull(), alongside the existing writtenDocs map. On each call to applyLabels, if the incoming vertex's RID was already replaced on a prior row, the current row's variable is redirected to the already-replaced vertex and the existing type-equality check returns early (idempotent). Otherwise the normal delete+create flow runs and the replacement is recorded for future rows.

Test plan

  • setLabelIdempotentOnSameNodeAcrossRowFanout - exact reproducer from the issue (single label, cartesian MATCH)
  • setMultipleLabelsIdempotentOnSameNodeAcrossRowFanout - stronger case with two labels (SET n:Employee:Checked)
  • Full OpenCypher* suite: 5406 tests, 0 failures, 0 regressions

🤖 Generated with Claude Code

…mpotent

When the same node appears on multiple rows (e.g. cartesian MATCH), repeated
SET n:Label crashed: row 1 deleted the old vertex and created a new one with a
fresh RID; row 2 still held the deleted vertex and failed with "Record not found"
or "Duplicated key".

Fix: add a labelReplacements map (Map<RID,Vertex>) to the ResultSet anonymous
class in SetStep.syncPull(), alongside the existing writtenDocs map. On each call
to applyLabels, look up the incoming vertex's RID. If a prior replacement exists,
redirect the row's variable to the already-replaced vertex and let the existing
type-equality check return early (idempotent). Otherwise proceed with the normal
delete+create flow and record old RID → new vertex for future rows.

Fixes #4017

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 28, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 8 complexity

Metric Results
Complexity 8

View in Codacy

🟢 Coverage 77.27% diff coverage · -8.63% coverage variation

Metric Results
Coverage variation -8.63% coverage variation
Diff coverage 77.27% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (9a5b0e1) 120132 88786 73.91%
Head commit (48d049a) 151423 (+31291) 98839 (+10053) 65.27% (-8.63%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4025) 22 17 77.27%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Code Review - PR #4025: Cypher SET n:Label idempotency on row fanout

Overview

This PR fixes a real crash bug (issue #4017) where SET n:Label on nodes that appear multiple times in a cartesian row fanout would fail because the first row deleted and recreated the vertex with a new RID, leaving subsequent rows holding a stale (deleted) reference. The fix adds a labelReplacements: Map<RID, Vertex> map alongside the existing writtenDocs map to redirect stale references to the already-replaced vertex.

The approach is sound and consistent with the existing writtenDocs pattern. The fix is minimal and targeted.


Correctness

Looks good:

  • originalRid is captured before the redirect, so labelReplacements.put(originalRid, newVertex) always maps from the original RID to the latest replacement, even if the vertex was already replaced once (chained replacement via row 3+).
  • propagateUpdateToSameNodeAliases is correctly called with the (now-replaced) vertex so all aliases in the current result row are updated.
  • The early-return idempotency check uses the redirected vertex (the new one), so its getTypeName() is current.

Potential issue - interaction between writtenDocs and labelReplacements:

If a SET clause combines a property assignment and a label assignment on the same variable in that order - e.g. SET n.role = 'emp', n:Employee - and the same node appears on multiple rows:

  • Row 1: applyPropertySet saves RID1 into writtenDocs[RID1]. Then applyLabels deletes RID1, creates RID2, records labelReplacements[RID1] = RID2.
  • Row 2: applyPropertySet calls resolveLatestDoc, finds RID1 in writtenDocs, updates the result to point to the deleted mutableRID1, then attempts mutableDoc.save() on it - which would likely fail.

The PR description only covers pure label-fanout. If the combined case is also broken before this PR, that should be noted; if it was working before, it now regresses. A test or explicit exclusion comment would help clarify this. The safest fix would be to remove writtenDocs[RID1] (or insert writtenDocs[RID2]) after a label replacement, but this may be out of scope for this PR.


Test Coverage

Good:

  • setLabelIdempotentOnSameNodeAcrossRowFanout - exact reproducer.
  • setMultipleLabelsIdempotentOnSameNodeAcrossRowFanout - multi-label variant.
  • Both tests assert on the full ordered result, not just absence of exception.

Minor - inaccurate javadoc comment in test:

// The cartesian MATCH produces Alice twice and Charlie twice (age > peer)

Alice only qualifies once (age 30 > Bob's 25). The cartesian product yields the pairs (Alice, Bob), (Charlie, Bob), (Charlie, Alice) - so Alice appears once and Charlie twice. The assertion containsExactly("Alice", "Charlie", "Charlie") is correct; only the comment is wrong.

Missing test - edge preservation under fanout:
The existing tests check that the replaced vertex has the right name but do not verify that edges are preserved correctly after the label replacement. A test with connected vertices (e.g., a Person with a WORKS_AT edge) would give stronger confidence that the delete+copy-edges+create flow is correct for the fanout case specifically.


Code Quality

Comment cleanup is good: The old comment explaining why label SET was not covered by writtenDocs has been removed cleanly, and the new field comment explains the new mechanism concisely.

Consistent with existing style: The new map follows the same anonymous-ResultSet-field pattern as writtenDocs. The final keyword is used correctly. No unnecessary abstraction added.

Minor - applyLabels signature is now long:

private void applyLabels(final SetClause.SetItem item, final Result result,
    final Map<RID, Vertex> labelReplacements) {

The two-line signature is fine given the project's printWidth: 160 style. No action needed.


Performance

The HashMap lookup/put is O(1) and only executed on the LABELS branch. No regression for the common single-row case. For very large fanouts with many distinct label-changed nodes the map grows, but this mirrors the writtenDocs pattern already accepted in the codebase.


Summary

  • Bug fix is correct for the stated scenario (pure label SET + row fanout).
  • One potential concern around the combined property+label SET order interaction; worth a comment or test to clarify whether it is in scope.
  • One inaccurate javadoc comment in the first test (Alice appears once, not twice).
  • Edge-preservation test under the fanout case would strengthen confidence.

Overall this is a solid, targeted fix. Recommend addressing the test comment and considering a note on the writtenDocs/labelReplacements interaction before merging.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a labelReplacements map to SetStep to track vertex replacements occurring during label updates, which is necessary because ArcadeDB replaces the vertex (changing its RID) when labels are modified. This ensures that subsequent rows in a fanout operation correctly reference the new vertex instance. The feedback highlights the need to pre-resolve all vertex aliases in a row to prevent property updates from failing on deleted records and recommends a recursive lookup in the replacement map to handle scenarios where a vertex is replaced multiple times.

Comment on lines 131 to 132
if (setClause == null || setClause.isEmpty())
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The labelReplacements map should be used to pre-resolve all vertex aliases in the current row before processing any SetItem. This is critical because other SET operations (like property updates) rely on resolveLatestDoc, which currently only checks writtenDocs. If a node was replaced by a label change in a previous row, its RID has changed, and resolveLatestDoc will fail to find the latest state or attempt to modify a deleted record. By pre-resolving all aliases at the start of the row, we ensure that all subsequent operations (properties, maps, and labels) observe the correct, latest vertex instance.

    if (setClause == null || setClause.isEmpty())
      return;

    if (!labelReplacements.isEmpty()) {
      for (final String propName : result.getPropertyNames()) {
        final Object propValue = result.getProperty(propName);
        if (propValue instanceof Vertex v) {
          Vertex replacement = labelReplacements.get(v.getIdentity());
          if (replacement != null) {
            Vertex next;
            while ((next = labelReplacements.get(replacement.getIdentity())) != null)
              replacement = next;
            ((ResultInternal) result).setProperty(propName, replacement);
          }
        }
      }
    }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented. Added the pre-resolution block at the start of applySetOperations (before the transaction begin) that walks all result properties, looks up any Vertex value whose RID appears in labelReplacements, and chain-traverses to the current head. This ensures resolveLatestDoc in property/map SET operations always sees the live vertex rather than the deleted original, even when upstream steps re-deliver the old RID on a new row. Commit: 48d049a.

Comment on lines +326 to +330
final Vertex prior = labelReplacements.get(originalRid);
if (prior != null) {
propagateUpdateToSameNodeAliases(result, vertex, prior);
vertex = prior;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The lookup in labelReplacements must be recursive to handle cases where a node is replaced multiple times within the same query (e.g., multiple label changes across different rows). Without recursion, if a node was replaced from RID A to B, and then from B to C, a later row seeing RID A would only redirect to the deleted RID B, causing a crash when attempting to load its labels. Additionally, if the pre-resolution logic is added to applySetOperations, this block becomes redundant for the initial variable resolution but still serves as a safety check for multiple label updates in the same row.

Suggested change
final Vertex prior = labelReplacements.get(originalRid);
if (prior != null) {
propagateUpdateToSameNodeAliases(result, vertex, prior);
vertex = prior;
}
Vertex prior = labelReplacements.get(originalRid);
if (prior != null) {
Vertex next;
while ((next = labelReplacements.get(prior.getIdentity())) != null)
prior = next;
propagateUpdateToSameNodeAliases(result, vertex, prior);
vertex = prior;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the chain traversal as suggested. The loop follows prior.getIdentity() through labelReplacements until it finds the head, so a vertex replaced A->B->C across rows is always resolved to C. Same pattern also applied inside the pre-resolution block added to applySetOperations. Commit: 48d049a.

robfrank and others added 2 commits April 28, 2026 21:15
… writtenDocs entry on label replace, add edge-preservation fanout test

- Fix inaccurate javadoc in setLabelIdempotentOnSameNodeAcrossRowFanout: Alice
  qualifies for one pair (Alice>Bob), not twice
- Pass writtenDocs into applyLabels so the stale MutableDocument entry for the
  old RID is removed after vertex replacement, preventing subsequent rows from
  reading the replaced document's pre-replacement state
- Add setLabelPreservesEdgesAcrossRowFanout test: verifies that edges copied
  during the first label-replacement are still present after the second fanout
  hit redirects to the already-replaced vertex

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in SetStep

- At the start of each row in applySetOperations, walk labelReplacements to
  update any result property that still holds a deleted (old-RID) vertex so
  that property-SET operations via resolveLatestDoc observe the live vertex
- In applyLabels, follow the replacement chain recursively so multi-step
  replacements (A->B->C) are resolved to the current head regardless of how
  many label changes have been applied in prior rows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Code Review - fix: Cypher SET n:Label idempotency across row fanout

Overview

This PR fixes a real crash: SET n:Label on a cartesian-MATCH result set deleted a vertex on row 1, then row 2 tried to operate on the same (now-deleted) vertex and threw Record not found or Duplicated key. The fix introduces a labelReplacements: Map<RID, Vertex> in the ResultSet anonymous class (mirroring the existing writtenDocs pattern) to redirect later rows to the already-replaced vertex.

The approach is sound and follows existing conventions well.


Correctness

The core fix is correct. When prior != null:

  1. propagateUpdateToSameNodeAliases updates the current result row's aliases to point to the replacement.
  2. vertex = prior redirects the local reference.
  3. The type-equality check vertex.getTypeName().equals(newTypeName) short-circuits because prior already carries the new type - clean idempotency.

One subtle concern - writtenDocs not seeded with the new vertex RID: After newVertex.save(), the new vertex is not added to writtenDocs under its new RID. If a subsequent row in the same fanout does a property SET on the (now-replaced) vertex, resolveLatestDoc won't find it in writtenDocs via the new RID. The result alias was updated by propagateUpdateToSameNodeAliases, so the property SET would still operate on newVertex - but the cross-row accumulation for the new RID is untracked. Low risk in practice but worth considering:

// After newVertex.save() and before vertex.delete():
writtenDocs.put(newVertex.getIdentity(), newVertex.modify()); // optional, for combined SET n.prop+n:Label safety

Pre-existing Issue (not introduced here, but surfaced)

Edge properties are silently dropped during label replacement. Lines 354-357 copy edge topology but not edge properties:

for (final Edge edge : vertex.getEdges(Vertex.DIRECTION.OUT))
    newVertex.newEdge(edge.getTypeName(), edge.getVertex(Vertex.DIRECTION.IN));

Any properties stored on the original edges are lost. This existed before this PR, but the new fanout test (setLabelPreservesEdgesAcrossRowFanout) only verifies that the edge exists, not that its properties survive. A follow-up issue/test would be useful.


Minor Observations

allLabels.contains(label) is O(n) on a List (line 336-337). Label lists are tiny in practice so this is harmless, but a LinkedHashSet would be idiomatic for deduplication while preserving insertion order:

final Set<String> allLabelsSet = new LinkedHashSet<>(existingLabels);
allLabelsSet.addAll(item.getLabels());
final List<String> allLabels = new ArrayList<>(allLabelsSet);

vertex = prior reassignment after pattern-match variable (line 329). Reassigning a variable introduced by instanceof pattern matching is valid Java 16+ but slightly surprising to read. Consider a shadowing rename for clarity - though this is purely stylistic and doesn't affect correctness.

Old comment removed appropriately. The previous comment saying "Label SET is intentionally not covered by the writtenDocs pattern" is gone - good, since the fix now does interact with writtenDocs (via writtenDocs.remove(originalRid)).


Test Coverage

Three new tests - good coverage:

  • setLabelIdempotentOnSameNodeAcrossRowFanout - single label, cartesian MATCH (direct reproducer)
  • setMultipleLabelsIdempotentOnSameNodeAcrossRowFanout - multiple labels in one SET
  • setLabelPreservesEdgesAcrossRowFanout - verifies edges survive the fanout path

The acknowledged gap (combined SET n.prop = v, n:Label across fanout) doesn't need to be fixed now, but tracking it in a follow-up issue would be helpful so it isn't forgotten.


Summary

The fix is well-targeted, follows existing patterns, is well-tested, and resolves the crash. The edge-property loss is pre-existing and out of scope. The writtenDocs seeding note above is the only functional improvement worth considering before merge; everything else is minor polish.

Verdict: Approve with optional minor suggestions.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Code Review - PR #4025: fix: Cypher SET n:Label idempotency on row fanout

Overview

This PR correctly fixes a crash bug (#4017): during a cartesian MATCH, SET n:Label deleted+recreated the vertex on row 1 with a new RID; row 2 still held the old (deleted) reference and threw Record not found or Duplicated key. The fix adds a labelReplacements: Map<RID, Vertex> alongside the existing writtenDocs map to redirect later rows to the already-replaced vertex.

The approach mirrors the existing writtenDocs pattern well.


Issues

1. Duplicated chain-traversal logic (minor refactor suggestion)

The same "walk the replacement chain to its head" loop appears verbatim in two places:

  • applySetOperations (pre-resolution pass)
  • applyLabels (local resolution)

Extracting it avoids future drift:

private Vertex resolveLatestVertex(final Vertex vertex, final Map<RID, Vertex> replacements) {
  Vertex current = vertex;
  Vertex next;
  while ((next = replacements.get(current.getIdentity())) != null)
    current = next;
  return current;
}

2. New vertex RID not seeded into writtenDocs (low-risk gap)

After newVertex.save(), writtenDocs only has the old RID removed (writtenDocs.remove(originalRid)). If a combined SET n.prop = v, n:Label query hits the same fanout, the new vertex RID isn't in writtenDocs, so the cross-row accumulation path won't find it. The alias update via propagateUpdateToSameNodeAliases makes property SET still land on the right object, but it is untracked. The existing comment acknowledges ordering-dependent behaviour here - consider a follow-up issue to track it.

3. Test javadoc comment is inaccurate

In setLabelIdempotentOnSameNodeAcrossRowFanout:

// Alice qualifies for one pair (n=Alice > m=Bob), Charlie for two pairs ...
// Expected rows: Alice, Charlie, Charlie.

Alice's cartesian result is one row, Charlie's is two (Charlie > Alice, Charlie > Bob). The comment says "Alice appears once" which is correct, but the assertThat(names).containsExactly("Alice", "Charlie", "Charlie") assertion has Alice first - the comment should clarify the ORDER BY guarantees that ordering (Alice < Charlie alphabetically), not that Alice appears fewer times. Minor wording fix.

4. Pre-existing issue surfaced by new test: edge properties are silently dropped

setLabelPreservesEdgesAcrossRowFanout verifies the edge exists but not that edge properties survive. The edge copy loop in applyLabels:

for (final Edge edge : vertex.getEdges(Vertex.DIRECTION.OUT))
    newVertex.newEdge(edge.getTypeName(), edge.getVertex(Vertex.DIRECTION.IN));

...copies topology only - any properties on edges are lost. This is pre-existing (out of scope here), but worth a follow-up issue.


Positives

  • Three well-targeted tests covering the exact reproducer, the multi-label variant, and edge preservation.
  • The writtenDocs.remove(originalRid) call is the right defensive move - prevents stale MutableDocument reads.
  • The idempotency short-circuit via vertex.getTypeName().equals(newTypeName) is clean and correct.
  • Old misleading comment ("Label SET is intentionally not covered by the writtenDocs pattern") is properly removed.

Summary

The fix is correct, well-tested, and follows existing patterns. The main actionable items before merge:

  1. Must fix: Correct the inaccurate javadoc comment in the first test.
  2. Nice to have: Extract the chain-traversal into a helper to avoid duplication.
  3. Optional: Open follow-up issues for (a) combined SET prop+label fanout ordering and (b) edge property loss on label replacement.

Verdict: Approve with minor suggestions.


Reviewed with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 63.63636% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.24%. Comparing base (9a5b0e1) to head (48d049a).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...adedb/query/opencypher/executor/steps/SetStep.java 63.63% 5 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4025      +/-   ##
==========================================
- Coverage   64.91%   64.24%   -0.67%     
==========================================
  Files        1597     1597              
  Lines      120132   120168      +36     
  Branches    25580    25587       +7     
==========================================
- Hits        77981    77204     -777     
- Misses      31377    32296     +919     
+ Partials    10774    10668     -106     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank robfrank merged commit bdbf985 into main Apr 29, 2026
26 of 31 checks passed
tae898 pushed a commit to humemai/arcadedb-embedded-python that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repeated SET label on the same node across row fanout may fail instead of acting idempotently

1 participant