Skip to content

fix(#4333): persist migratedFileIds in schema.json to survive restart#4342

Merged
robfrank merged 3 commits into
mainfrom
fix/4333-persist-migrated-file-ids
May 26, 2026
Merged

fix(#4333): persist migratedFileIds in schema.json to survive restart#4342
robfrank merged 3 commits into
mainfrom
fix/4333-persist-migrated-file-ids

Conversation

@robfrank

Copy link
Copy Markdown
Collaborator

Closes #4333

Summary

LocalSchema.migratedFileIds (the old->new fileId map written by LSM/vector index compaction) was a heap-only ConcurrentHashMap that was never included in schema.json. After a restart the map was always empty, so TransactionManager.applyChanges could not distinguish a safe compaction-migration skip from an unknown missing-file condition, silently dropping WAL pages at FINE level after every compaction+restart.

Changes:

  • LocalSchema.toJSON(): serialize migratedFileIds as a JSON object under the key "migratedFileIds" (only emitted when non-empty, so existing schema files without the key remain compatible)
  • LocalSchema loading: restore migratedFileIds from the JSON block on open, clearing the map first for idempotency
  • TransactionManager.applyChanges(): consult the migration map when a fileId is missing; log FINE for known compaction-migrations (safe skip, data already in the new file), WARNING for unknown missing files (deleted by schema change or incomplete migration map)

Test plan

  • MigratedFileIdsPersistenceTest#migratedFileIdsArePersisted - after compaction triggers setMigratedFileId, close and reopen the database; migration map must still be populated
  • MigratedFileIdsPersistenceTest#migratedFileIdsRoundTrip - round-trip the map through close+reopen and verify exact old->new entries
  • LSMTreeIndexCompactionTest - all existing compaction tests pass
  • LockFilesInOrderFileMigrationTest - existing file-migration regression test passes
  • ConcurrentCompactionTest - concurrent compaction tests pass

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses issue #4333 by persisting the migratedFileIds map in schema.json to prevent WAL recovery from silently skipping pages of missing files after a restart. It serializes and deserializes the map in LocalSchema and updates TransactionManager.applyChanges() to check this map, logging a warning for genuinely unknown missing files. Feedback points out that setMigratedFileId needs to call saveConfiguration() to ensure the updated map is actually written to disk. Additionally, checking for null values when deserializing migratedFileIds is recommended to avoid potential JSONExceptions, and the warning log level for missing files should be reconsidered to avoid noise during normal Raft replay or crash recovery.

Comment on lines +1925 to +1930
if (!migratedFileIds.isEmpty()) {
final JSONObject migratedJSON = new JSONObject();
for (final Map.Entry<Integer, Integer> entry : migratedFileIds.entrySet())
migratedJSON.put(String.valueOf(entry.getKey()), entry.getValue());
root.put("migratedFileIds", migratedJSON);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While migratedFileIds is now correctly serialized and deserialized, setMigratedFileId(int, int) does not call saveConfiguration() or mark the schema as dirty.

If a compaction occurs and updates the migration map, but no other schema changes happen before the database is closed, the schema will not be considered dirty (dirtyGeneration.get() > savedGeneration will be false) and close() will not persist the updated map to schema.json.

To ensure the migration map is always persisted, setMigratedFileId should call saveConfiguration() (which will either save immediately or postpone it by marking the schema dirty if a transaction is active):

  public void setMigratedFileId(final int oldFileId, final int newFileId) {
    LogManager.instance().log(this, Level.FINE, "Migrating file id %d to %d", null, oldFileId, newFileId);
    migratedFileIds.put(oldFileId, newFileId);
    saveConfiguration();
  }

Comment on lines +1831 to +1835
if (root.has("migratedFileIds")) {
final JSONObject migratedJSON = root.getJSONObject("migratedFileIds");
for (final String key : migratedJSON.keySet())
migratedFileIds.put(Integer.parseInt(key), migratedJSON.getInt(key));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If migratedFileIds is present in the JSON but has a null value (e.g., "migratedFileIds": null), root.getJSONObject("migratedFileIds") will throw a JSONException. Using !root.isNull(...) is safer and prevents potential crashes during schema loading.

Suggested change
if (root.has("migratedFileIds")) {
final JSONObject migratedJSON = root.getJSONObject("migratedFileIds");
for (final String key : migratedJSON.keySet())
migratedFileIds.put(Integer.parseInt(key), migratedJSON.getInt(key));
}
if (root.has("migratedFileIds") && !root.isNull("migratedFileIds")) {
final JSONObject migratedJSON = root.getJSONObject("migratedFileIds");
for (final String key : migratedJSON.keySet())
migratedFileIds.put(Integer.parseInt(key), migratedJSON.getInt(key));
}

Comment on lines +342 to +345
else
LogManager.instance().log(this, Level.WARNING,
"Skipping page for missing file %d during transaction apply, file was deleted or migration map is incomplete", null,
txPage.fileId);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging a WARNING for any missing file not present in the migration map might be too noisy. During normal Raft replay or crash recovery, files deleted intentionally by schema changes (e.g., DROP TYPE or DROP INDEX) will also trigger this warning, potentially flooding the logs with false positives. Consider if INFO or keeping FINE is more appropriate, or if there is a way to track intentionally deleted files.

@codacy-production

codacy-production Bot commented May 26, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 13 complexity

Metric Results
Complexity 13

View in Codacy

🟢 Coverage 64.71% diff coverage

Metric Results
Coverage variation Report missing for a68f6631
Diff coverage 64.71% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (a68f663) Report Missing Report Missing Report Missing
Head commit (2789ec0) 155481 103252 66.41%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4342) 17 11 64.71%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

1 Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

test comment please ignore

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review - fix(#4333): persist migratedFileIds in schema.json

The fix is correct in its core approach: serializing migratedFileIds to schema.json and restoring it on open so WAL recovery can distinguish safe compaction-migration skips from genuinely missing files. A few issues need attention before merge.


Bug: double saveConfiguration() per compaction

LSMTreeIndex.java and LSMVectorIndex.java both already call saveConfiguration() immediately after setMigratedFileId():

// LSMTreeIndex.java ~line 675
((LocalSchema) database.getSchema()).setMigratedFileId(fileId, newMutableIndex.getFileId());
database.getSchema().getEmbedded().saveConfiguration();  // still here
// LSMVectorIndex.java ~line 4618
((LocalSchema) getDatabase().getSchema()).setMigratedFileId(fileId, newMutableIndex.getFileId());
getDatabase().getSchema().getEmbedded().saveConfiguration();  // still here

With this PR, setMigratedFileId now internally calls saveConfiguration() too, resulting in two consecutive schema-file writes per compaction. Each write re-serializes the full schema JSON. This isn't a correctness bug but it is a needless performance hit on what is already an I/O-intensive path.

Fix: remove the explicit saveConfiguration() calls from LSMTreeIndex and LSMVectorIndex (they are now redundant), or remove it from setMigratedFileId and document that callers are responsible.


Missing assertion in migratedFileIdsRoundTrip

migratedFileIdsArePersisted correctly guards against compaction being a no-op:

final boolean compacted = bucketIndex.compact();
if (!compacted)
  error.set(new AssertionError("compact() returned false"));

migratedFileIdsRoundTrip omits this check - compact() return value is silently discarded. If compaction is skipped for any reason (not enough pages, concurrent activity), getMigratedFileId() returns null and the test fails with a misleading NullPointerException-style message rather than a clear "compaction did not run" message. Add the same guard used in the first test.


docs/4333-persist-migrated-file-ids.md should not be committed

This file is an AI review-cycle tracking document ("Review Cycles", "Cycle 1 - a47f16b", "gemini-code-assist reviewed", "State: timeout"). That kind of meta-tracking belongs in the PR description or issue, not in the repository. It will rot immediately and confuse future contributors. Please drop this file from the PR.


Design note: migratedFileIds grows without bound

Once an old->new file mapping is written to schema.json it is never removed - not even after the old file has been dropped and WAL segments predating the last checkpoint are irrelevant. For a long-lived database with frequent compactions this map will accumulate stale entries forever.

This is a pre-existing design question (the map already lived in memory indefinitely), but persisting it makes the problem permanent. A follow-up should add a cleanup pass - e.g., remove entries whose oldFileId no longer exists on disk after WAL recovery completes.


Minor: log message accuracy at INFO level

The INFO log reads:

"Skipping page for missing file %d during transaction apply, file was deleted or migration map is incomplete"

"migration map is incomplete" suggests a partial write/corruption, which is alarming. Given that the most common benign cause is a schema DROP, something like "file was deleted (schema change or DROP TYPE/INDEX)" reads more clearly for operators.


Summary

Severity Item
Medium Double saveConfiguration() call - redundant disk write per compaction
Medium migratedFileIdsRoundTrip - compaction success not verified, test may give misleading failures
Medium docs/4333-persist-migrated-file-ids.md - tracking document should not be in the repo
Low Map grows without bound (pre-existing, but persistence makes it permanent)
Nit INFO log wording slightly alarming

The core logic (serialize/deserialize migratedFileIds, consult map in applyChanges) is sound. The double-save and the docs file are the most important things to fix before merge.

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review

Summary: This PR correctly identifies and fixes a real bug - LocalSchema.migratedFileIds was heap-only and lost on restart, causing WAL recovery to silently drop compaction-related pages. The fix is straightforward and the approach is sound. A few concerns worth addressing before merge.


Bug: WARNING log for intentional DROP TYPE / DROP INDEX

The original comment in TransactionManager.applyChanges() said:

Referencing a deleted file is expected during Raft replay and crash recovery - schema changes may delete files before their prior TX entries are replayed.

That case (DROP TYPE / DROP INDEX) is still completely valid and now logs at WARNING since the dropped file won't be in migratedFileIds. This will produce operator noise on any cluster that does Raft replay or crash-recovers after a DROP. Example: a user drops an index, the server crashes, and on restart the WAL replay logs a WARNING for every page of that index.

The intent of the new WARNING is good (flag truly unexpected missing files), but currently it fires on both unexpected files and intentional schema-drop files. Consider whether a separate mechanism - such as recording intentionally dropped file IDs in the schema JSON as a droppedFileIds set - is needed to distinguish the two cases, or at minimum add a comment acknowledging the false-positive risk.


Unbounded map growth in schema.json

migratedFileIds has no eviction. Every compaction appends a new oldId -> newId entry. With persistence, the map now grows in schema.json across the lifetime of the database. Long-lived databases with many compactions will see this bloat accumulate indefinitely.

Consider pruning entries whose old file ID no longer appears in any live WAL file, or at least document this as a known limitation so it can be tracked as a follow-up.


Test: second test ignores compact() return value

In migratedFileIdsRoundTrip, the compaction thread does not check whether compact() returned true. If compaction silently bails out (e.g. not enough pages), getMigratedFileId() will return null and assertThat(newFidBefore).isNotNull() fails with a misleading message. The first test handles this correctly - replicate the error.set(new AssertionError("compact() returned false")) guard.

Also, assertThat(done.await(30, TimeUnit.SECONDS)).isTrue() and assertThat(error.get()).isNull() in the second test lack descriptive messages, unlike the first test. Small but inconsistent.


Test code duplication

The two test methods are nearly identical (same schema setup, insert 1000 docs, run compaction in a thread, close/reopen, assert). Consider extracting shared setup into a helper method that takes the type name and property name as parameters. This would make both test bodies much shorter and the intent clearer.


Docs file

docs/4333-persist-migrated-file-ids.md is an issue-specific design document. The PR description already covers this content well. Committing issue-specific analysis docs to the repo tends to become dead weight as the code evolves. Worth considering whether this belongs in the PR description / commit message instead.

Minor: the PR body references a test named migratedFileIdsAreLoadedOnReopen but the actual test method is migratedFileIdsRoundTrip.


What works well

  • Backward-compatible serialization (key omitted when empty, load handles missing key with no-op)
  • migratedFileIds.clear() before repopulation in readConfiguration() ensures idempotency on re-read
  • The serialization uses string keys (required by JSON) with Integer.parseInt deserialization - correct
  • readConfiguration() is synchronized so the clear+populate sequence is safe under concurrent access
  • The getMigratedFileId call in TransactionManager is consistent with existing usage in TransactionContext
  • Test coverage proves the key scenario (persistence survives close+reopen)

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review - fix(#4333): persist migratedFileIds in schema.json to survive restart

Overview

This PR fixes a real correctness bug: LocalSchema.migratedFileIds was heap-only, so after a restart the WAL recovery path in TransactionManager.applyChanges could not distinguish a safe compaction-skip from an unknown missing-file condition. The fix serializes the map to schema.json and restores it on open. Good find and clear root cause analysis.


Issues

1. Double saveConfiguration() after compaction (redundant write)

Both callers of setMigratedFileId already call saveConfiguration() immediately after:

  • LSMTreeIndex.java:677: database.getSchema().getEmbedded().saveConfiguration();
  • LSMVectorIndex.java:4619: getDatabase().getSchema().getEmbedded().saveConfiguration();

Now that setMigratedFileId itself calls saveConfiguration(), those two call sites each trigger two serializations of the schema in quick succession. This is idempotent and not a correctness problem, but it is wasteful since schema serialization is a synchronized operation. Consider removing the now-redundant explicit saveConfiguration() calls in the two index classes, or removing it from setMigratedFileId and documenting that callers are responsible for persisting.

2. migratedFileIds map grows without bounds

There is no eviction or pruning of entries from migratedFileIds. Every compaction permanently adds an entry that persists across restarts. For a long-lived database with frequent compactions, this map will grow indefinitely (unbounded memory and schema.json bloat). There should be a lifecycle event where stale entries can be removed - for example, when the old file is confirmed to no longer appear in any recoverable WAL segment, or when a subsequent compaction replaces oldId -> newId with newId -> newerNewId. At minimum, this should be noted as a known limitation with a follow-up issue.

3. docs/4333-persist-migrated-file-ids.md should not be committed

This file contains internal review-cycle notes ("gemini-code-assist reviewed (COMMENTED)", "State: timeout", links back to the PR being merged). It is meta-documentation about the review process, not documentation about the feature or schema format. It will become stale immediately after merge. PR descriptions and issue comments are the right place for this content.


Minor Points

4. Second test does not assert on compact() return value

In migratedFileIdsRoundTrip, the compaction thread calls bucketIndex.compact() without checking the return value. If compaction is skipped (returns false), the test will proceed and fail with a confusing "newFidBefore is null" message rather than a clear "compaction did not run" failure. The first test (migratedFileIdsArePersisted) handles this correctly - apply the same pattern to the second test.

5. The two tests are nearly identical

Both tests: create a schema, insert 1000 records, compact, close, reopen, and assert the migration entry survived. The only difference is the entity name. Consider merging into a single test or giving the second a distinct scenario (e.g., multiple sequential compactions, or a migration followed by a type drop).


What Works Well

  • The !root.isNull("migratedFileIds") null guard is correct and the backward-compatibility approach (omitting the key when empty) keeps existing schema files compatible.
  • Restoring with migratedFileIds.clear() before repopulating is correct for idempotent reloads.
  • Downgrading the applyChanges log from WARNING to INFO for the unknown-file case is reasonable - this path fires for every DROP TYPE/INDEX replay and WARNING would be noisy.
  • The setMigratedFileId -> saveConfiguration addition correctly addresses the race where a crash between compaction and the next schema change left the map unpersisted.

robfrank added 3 commits May 26, 2026 14:22
LocalSchema.migratedFileIds (old->new fileId map from LSM compaction) was
a heap-only ConcurrentHashMap omitted from toJSON()/schema.json. After a
restart the map was always empty, so TransactionManager.applyChanges could
not distinguish a safe compaction-skip from a genuine missing-file data loss
and silently dropped WAL pages at FINE level.

Changes:
- LocalSchema.toJSON(): serialize migratedFileIds as JSON object under
  "migratedFileIds" key (only emitted when non-empty)
- LocalSchema loading: restore migratedFileIds from the JSON block on open
- TransactionManager.applyChanges: consult the migration map when a fileId
  is missing; log FINE for known migrations, WARNING for unknown ones
…evel

- setMigratedFileId() now calls saveConfiguration() so the migration map
  is persisted even when no other schema change follows compaction
- loading: add !root.isNull("migratedFileIds") guard before getJSONObject()
  to avoid JSONException if the field is explicitly null
- TransactionManager: downgrade unknown-missing-file log from WARNING to INFO
  to avoid noise during intentional schema-drop file deletions in Raft replay
@robfrank robfrank force-pushed the fix/4333-persist-migrated-file-ids branch from 014a198 to 2789ec0 Compare May 26, 2026 12:22
@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 58.82353% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.96%. Comparing base (128872a) to head (2789ec0).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...n/java/com/arcadedb/engine/TransactionManager.java 0.00% 6 Missing ⚠️
...src/main/java/com/arcadedb/schema/LocalSchema.java 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4342      +/-   ##
==========================================
- Coverage   65.37%   64.96%   -0.42%     
==========================================
  Files        1610     1610              
  Lines      123789   123803      +14     
  Branches    26776    26781       +5     
==========================================
- Hits        80925    80426     -499     
- Misses      31429    32048     +619     
+ Partials    11435    11329     -106     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank robfrank merged commit d1ee393 into main May 26, 2026
27 of 40 checks passed
@robfrank robfrank deleted the fix/4333-persist-migrated-file-ids branch May 26, 2026 16:15
@robfrank robfrank added this to the 26.6.1 milestone May 27, 2026
@robfrank robfrank self-assigned this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalSchema.migratedFileIds is not persisted — compaction crashes silently drop in-flight transactions

1 participant