Skip to content

Fix: ensure chunk-size warnings show original value and keep WAL off …#3134

Merged
lvca merged 1 commit intoArcadeData:mainfrom
humemai:fix/chunked-wal-persist
Jan 14, 2026
Merged

Fix: ensure chunk-size warnings show original value and keep WAL off …#3134
lvca merged 1 commit intoArcadeData:mainfrom
humemai:fix/chunked-wal-persist

Conversation

@tae898
Copy link
Contributor

@tae898 tae898 commented Jan 14, 2026

Summary

  • Ensure WAL is explicitly disabled for chunked graph persistence even when a caller-initiated transaction is already active.
  • Keep chunked graph persistence and chunked commit flow intact.
  • Improve warning to report the original misconfigured arcadedb.index.buildChunkSizeMB value before clamping to the 50MB fallback.

Testing

  • Now I don't get the PERSIST error.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tae898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the robustness and diagnostic capabilities of the graph persistence mechanism within LSMVectorIndex. It ensures that Write-Ahead Logging (WAL) is consistently disabled during chunked persistence operations, regardless of the transaction's origin, to prevent potential issues. Additionally, it refines the user feedback for misconfigured chunk sizes by providing more informative warning messages and enriches error logging for persistence failures with crucial contextual data, making debugging more efficient.

Highlights

  • WAL Management: Ensured Write-Ahead Logging (WAL) is explicitly disabled for chunked graph persistence, even when a transaction is already active from a caller.
  • Chunk Size Warning Improvement: Enhanced the warning message for arcadedb.index.buildChunkSizeMB to report the original misconfigured value before it's clamped to the 50MB fallback.
  • Graph Persistence Flow: Maintained the integrity of chunked graph persistence and commit flow, introducing a callback for managing chunk commits and transaction restarts.
  • Error Logging Enhancement: Improved logging for persistence failures by including more contextual information such as node count, vector storage status, and transaction status, and conditionally printing stack traces.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Jan 14, 2026

🧪 CI Insights

Here's what we observed from your CI run for b3f756e.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces several important improvements to the LSMVectorIndex, primarily focusing on robust transaction management during graph persistence and enhanced logging. The changes ensure that WAL (Write-Ahead Log) is consistently disabled for chunked graph persistence, preventing potential data inconsistencies or performance overhead. The new logic for validating arcadedb.index.buildChunkSizeMB and providing informative warnings when misconfigured values are encountered is a welcome addition. Furthermore, the improved error logging with more contextual information will greatly assist in debugging persistence failures. The explicit commit in the else branch of the startedTransaction block is a critical fix for ensuring data durability when transactions are managed by the caller.

Comment on lines +1265 to +1267
getDatabase().getTransaction().setUseWAL(false);
} else {
getDatabase().getTransaction().setUseWAL(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change correctly ensures that WAL is explicitly disabled for graph persistence, regardless of whether the transaction was initiated by the method or was already active. This is crucial for performance during bulk operations and for preventing potential deadlocks or inconsistencies if WAL was inadvertently left enabled.

getDatabase().commit();
LogManager.instance().log(this, Level.FINE, "Vector graph persisted and committed for index: %s", indexName);
} else {
getDatabase().commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Adding getDatabase().commit() in the else branch is a critical fix. Previously, if a transaction was already active (managed by the caller), the graph persistence changes would not be committed, leading to data loss or inconsistency. This ensures that the changes are persisted correctly in all scenarios.

Comment on lines +1254 to +1260
long chunkSizeMB = getDatabase().getConfiguration().getValueAsLong(GlobalConfiguration.INDEX_BUILD_CHUNK_SIZE_MB);
if (chunkSizeMB <= 0) {
final long configuredChunkSize = chunkSizeMB;
chunkSizeMB = 50;
LogManager.instance().log(this, Level.WARNING,
"arcadedb.index.buildChunkSizeMB was %dMB during graph persistence; forcing fallback to 50MB", configuredChunkSize);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The addition of validation for chunkSizeMB and the fallback to a default value of 50MB, along with a warning log, is a good defensive programming practice. It prevents potential issues arising from misconfigured or invalid chunk sizes, such as excessively large single commits that could lead to out-of-memory errors or very slow transactions.

Comment on lines +1318 to +1326
LogManager.instance().log(this, Level.SEVERE,
"PERSIST: Failed to persist graph for %s (nodes=%d, storeVectorsInGraph=%b, txStatus=%s): %s - %s",
indexName,
totalNodes,
metadata.storeVectorsInGraph,
getDatabase().getTransaction().getStatus(),
e.getClass().getSimpleName(),
e.getMessage(),
e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The enhanced error logging for persistence failures, including details like nodes, storeVectorsInGraph, and txStatus, is a significant improvement. This additional context will be invaluable for diagnosing and troubleshooting issues related to graph persistence.

Comment on lines +1327 to +1328
if (LogManager.instance().isDebugEnabled())
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Making e.printStackTrace() conditional on LogManager.instance().isDebugEnabled() is a good practice. It prevents excessive and potentially sensitive stack trace output in production environments while still providing full debug information when needed.

Comment on lines +3179 to +3185
long chunkSizeMB = db.getConfiguration().getValueAsLong(GlobalConfiguration.INDEX_BUILD_CHUNK_SIZE_MB);
if (chunkSizeMB <= 0) {
final long configuredChunkSize = chunkSizeMB;
chunkSizeMB = 50;
LogManager.instance().log(this, Level.WARNING,
"arcadedb.index.buildChunkSizeMB was %dMB; forcing fallback to 50MB for graph persistence chunking", configuredChunkSize);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This chunkSizeMB validation and warning logic is consistent with the changes in buildGraphFromScratchWithRetry and buildGraphWithChunking. It ensures that the chunk size is always a positive value, preventing potential issues with very large or invalid chunks during bulk data loading.

Comment on lines +3254 to +3260
long chunkSizeMB = db.getConfiguration().getValueAsLong(GlobalConfiguration.INDEX_BUILD_CHUNK_SIZE_MB);
if (chunkSizeMB <= 0) {
final long configuredChunkSize = chunkSizeMB;
chunkSizeMB = 50;
LogManager.instance().log(this, Level.WARNING,
"arcadedb.index.buildChunkSizeMB was %dMB during graph persistence; forcing fallback to 50MB", configuredChunkSize);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consistent application of chunkSizeMB validation and warning across all relevant methods (buildGraphFromScratchWithRetry, bulkLoadVectorData, buildGraphWithChunking) is good. This ensures that the chunking mechanism is robust against misconfiguration throughout the index building process.

@tae898
Copy link
Contributor Author

tae898 commented Jan 14, 2026

@lvca

this fix works.

@lvca lvca self-requested a review January 14, 2026 20:15
@lvca lvca added the bug label Jan 14, 2026
@lvca lvca added this to the 26.1.1 milestone Jan 14, 2026
Copy link
Contributor

@lvca lvca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great to me, thanks!

@lvca lvca merged commit 1177353 into ArcadeData:main Jan 14, 2026
11 of 13 checks passed
@tae898 tae898 deleted the fix/chunked-wal-persist branch January 16, 2026 14:57
robfrank pushed a commit that referenced this pull request Feb 11, 2026
…during chunked graph persistence (#3134)

(cherry picked from commit 1177353)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants