Skip to content

#2915 fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close#2916

Merged
lvca merged 3 commits intomainfrom
fix/2915-vector-index-hnsw-persistence
Dec 15, 2025
Merged

#2915 fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close#2916
lvca merged 3 commits intomainfrom
fix/2915-vector-index-hnsw-persistence

Conversation

@robfrank
Copy link
Collaborator

This pull request addresses critical issues with the persistence and discovery of HNSW graph files in the LSMVectorIndex implementation, ensuring that graph data is properly flushed to disk and can be reliably recovered after database restarts. It also adds comprehensive tests to verify these behaviors and improve the robustness of the vector index subsystem.

Persistence and resource management improvements:

  • Ensured the HNSW graph file (graphFile) is properly closed and flushed to disk when the LSMVectorIndex is closed, preventing data loss and resource leaks. Error handling was added to log any exceptions during the close operation.

Logging and debugging enhancements:

  • Added detailed debug-level logging to the graph file discovery process in discoverAndLoadGraphFile(), making it easier to trace file lookup issues and understand index initialization behavior. [1] [2]

Testing and verification:

  • Introduced new tests in LSMVectorIndexTest.java to verify:
    • That the graph file is properly closed and flushed to disk, addressing the bug where graphFile.close() was not previously called.
    • That graph files can be discovered and loaded after a database reload, ensuring index recovery works as expected.
    • (Disabled) That graph persistence is maintained across multiple close/reopen cycles, confirming consistent query results if the graph is properly persisted.
    • Added a helper method for recursive directory cleanup to support test isolation.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical issues related to the persistence and discovery of HNSW graph files within the LSMVectorIndex implementation. By ensuring that graph data is correctly flushed to disk upon database closure and can be reliably reloaded, it significantly enhances the robustness and data integrity of the vector indexing subsystem. The changes also include improved logging for easier debugging of graph file discovery and comprehensive new tests to validate these fixes.

Highlights

  • Persistence Fix: The HNSW graph file (graphFile) is now properly closed and flushed to disk when LSMVectorIndex is closed, preventing data loss and ensuring data integrity.
  • Discovery Enhancement: Detailed debug logging has been added to the discoverAndLoadGraphFile() method, along with a null check for ComponentFile, improving traceability and robustness of graph file discovery.
  • Comprehensive Testing: New tests have been introduced in LSMVectorIndexTest.java to specifically verify the proper flushing of graph files on close and their successful discovery and loading after database restarts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical issue by ensuring the JVector HNSW graph file is properly closed and flushed to disk, preventing data loss on database shutdown. The changes also improve the discovery of graph files and add comprehensive tests to verify the fixes. The implementation is solid, and the tests are thorough. I've provided a couple of suggestions to enhance logging and modernize a test utility method for better maintainability.

Comment on lines +2107 to +2121
private void deleteDirectory(java.io.File directory) {
if (directory.exists()) {
final java.io.File[] files = directory.listFiles();
if (files != null) {
for (final java.io.File file : files) {
if (file.isDirectory()) {
deleteDirectory(file);
} else {
file.delete();
}
}
}
directory.delete();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This helper method for recursively deleting a directory can be simplified and made more robust by using the modern java.nio.file.Files.walk API. This approach avoids manual recursion and is generally preferred for traversing file trees.

  private void deleteDirectory(java.io.File directory) {
    if (directory.exists()) {
      try (java.util.stream.Stream<java.nio.file.Path> walk = java.nio.file.Files.walk(directory.toPath())) {
        walk.sorted(java.util.Comparator.reverseOrder())
            .map(java.nio.file.Path::toFile)
            .forEach(java.io.File::delete);
      } catch (java.io.IOException e) {
        System.err.println("Error deleting directory " + directory.getAbsolutePath() + ": " + e.getMessage());
      }
    }
  }

@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2025

🧪 CI Insights

Here's what we observed from your CI run for ff6ea24.

🟢 All jobs passed!

But CI Insights is watching 👀

@codacy-production
Copy link

codacy-production bot commented Dec 12, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 8f9ac7f1 62.50%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (8f9ac7f) Report Missing Report Missing Report Missing
Head commit (ff6ea24) 75785 48287 63.72%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#2916) 8 5 62.50%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

robfrank added a commit that referenced this pull request Dec 12, 2025
Improvements to code quality and maintainability:

* Enhanced exception logging: Include full stack trace when closing graph file
  for better debugging. Changed from logging only the error message to passing
  the exception object to LogManager for complete context.

* Refactored deleteDirectory() helper: Replaced manual recursive directory
  traversal with modern java.nio.file.Files.walk() API. This approach is more
  robust, efficient, and follows Java best practices for file tree operations.
  - Uses try-with-resources for proper resource management
  - Sorts in reverse order to delete files before directories
  - Provides better exception handling with IOException

All existing tests continue to pass (22/22).

Addresses review comments from PR #2916:
- #2916 (comment)
- #2916 (comment)
Improvements to code quality and maintainability:

* Enhanced exception logging: Include full stack trace when closing graph file
  for better debugging. Changed from logging only the error message to passing
  the exception object to LogManager for complete context.

* Refactored deleteDirectory() helper: Replaced manual recursive directory
  traversal with modern java.nio.file.Files.walk() API. This approach is more
  robust, efficient, and follows Java best practices for file tree operations.
  - Uses try-with-resources for proper resource management
  - Sorts in reverse order to delete files before directories
  - Provides better exception handling with IOException

All existing tests continue to pass (22/22).

Addresses review comments from PR #2916:
- #2916 (comment)
- #2916 (comment)
@robfrank robfrank force-pushed the fix/2915-vector-index-hnsw-persistence branch from ec05c5b to ff6ea24 Compare December 14, 2025 17:32
@robfrank robfrank requested a review from lvca December 15, 2025 12:42
@lvca lvca merged commit 432707d into main Dec 15, 2025
34 of 37 checks passed
mergify bot added a commit to robfrank/linklift that referenced this pull request Jan 9, 2026
….1 [skip ci]

Bumps [com.arcadedb:arcadedb-network](https://github.com/ArcadeData/arcadedb) from 25.11.1 to 25.12.1.
Release notes

*Sourced from [com.arcadedb:arcadedb-network's releases](https://github.com/ArcadeData/arcadedb/releases).*

> 25.12.1
> -------
>
> ArcadeDB 25.12.1 Release Notes
> ==============================
>
> We're excited to announce the release of ArcadeDB v25.12.1! This release includes significant bug fixes, new features, performance improvements, and dependency updates.
>
> Highlights
> ----------
>
> ### Vector Search Enhancements
>
> * **Fixed critical vector quantization bug** ([#3052](https://redirect.github.com/ArcadeData/arcadedb/issues/3052), [#3053](https://redirect.github.com/ArcadeData/arcadedb/issues/3053)) - INT8 and BINARY vector quantization now works correctly across all dimensions
> * **New filtered vector search** ([#3071](https://redirect.github.com/ArcadeData/arcadedb/issues/3071), [#3072](https://redirect.github.com/ArcadeData/arcadedb/issues/3072)) - LSMVectorIndex now supports filtered searches for more precise queries
> * **Better vector type support** ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090)) - Added support for `List<Float>` in vector indexes
> * **Improved compression** ([#2911](https://redirect.github.com/ArcadeData/arcadedb/issues/2911)) - Enhanced compression for LSM vector indexes
> * **Fixed HNSW graph persistence** ([#2916](https://redirect.github.com/ArcadeData/arcadedb/issues/2916)) - Ensures JVector HNSW graph file is properly closed and flushed to disk
>
> ### SQL and Query Improvements
>
> * **Fixed IF statement execution** ([#2775](https://redirect.github.com/ArcadeData/arcadedb/issues/2775)) - SQL scripts with IF statements now execute correctly from console
> * **Fixed index creation with IF NOT EXISTS** ([#1819](https://redirect.github.com/ArcadeData/arcadedb/issues/1819)) - Console no longer errors when creating existing indexes with IF NOT EXISTS clause
> * **Custom function parameter binding** ([#3046](https://redirect.github.com/ArcadeData/arcadedb/issues/3046), [#3049](https://redirect.github.com/ArcadeData/arcadedb/issues/3049)) - Fixed parameter binding for SQL and JavaScript custom functions
> * **SQL method consistency** ([#2964](https://redirect.github.com/ArcadeData/arcadedb/issues/2964), [#2967](https://redirect.github.com/ArcadeData/arcadedb/issues/2967)) - `values()` method now behaves consistently with `keys()` method
> * **CONTAINSANY index fix** ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051)) - Fixed index usage for lists of embedded documents with CONTAINSANY
>
> ### Transaction Management
>
> * **Revised transaction logic** ([#3074](https://redirect.github.com/ArcadeData/arcadedb/issues/3074)) - Improved transaction handling and consistency
> * **Fixed edge index invalidation** ([#3091](https://redirect.github.com/ArcadeData/arcadedb/issues/3091)) - Edge indexes now remain valid in edge-case scenarios
>
> ### New Features
>
> * **Database size API** ([#3045](https://redirect.github.com/ArcadeData/arcadedb/issues/3045)) - Added new `database.getSize()` API method
> * **Version display enhancement** ([#2905](https://redirect.github.com/ArcadeData/arcadedb/issues/2905)) - Server log version number now displayed consistently
>
> What's Changed
> --------------
>
> ### Bug Fixes
>
> * Fix INT8 and BINARY vector quantization offset bug in LSMVectorIndex page loading by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3053](https://redirect.github.com/ArcadeData/arcadedb/pull/3053)
> * fix: revert SQL grammar changes and disable deep level JSON insert tests by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2961](https://redirect.github.com/ArcadeData/arcadedb/pull/2961)
> * [#2915](https://redirect.github.com/ArcadeData/arcadedb/issues/2915) fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#2916](https://redirect.github.com/ArcadeData/arcadedb/pull/2916)
> * fix: make values method behave like keys method by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2967](https://redirect.github.com/ArcadeData/arcadedb/pull/2967)
> * Fix custom function parameter binding for SQL and JavaScript functions by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3049](https://redirect.github.com/ArcadeData/arcadedb/pull/3049)
> * fix CONTAINSANY index use for lists of embedded documents by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#3051](https://redirect.github.com/ArcadeData/arcadedb/pull/3051)
> * fix: support List in vector index by [`@​szekelyszabi`](https://github.com/szekelyszabi) in [ArcadeData/arcadedb#3090](https://redirect.github.com/ArcadeData/arcadedb/pull/3090)
>
> ### Features
>
> * Show version number same as in server log by [`@​gramian`](https://github.com/gramian) in [ArcadeData/arcadedb#2905](https://redirect.github.com/ArcadeData/arcadedb/pull/2905)
> * feat: added new `database.getSize()` api by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#3045](https://redirect.github.com/ArcadeData/arcadedb/pull/3045)
> * Add filtered vector search support to LSMVectorIndex by [`@​Copilot`](https://github.com/Copilot) in [ArcadeData/arcadedb#3072](https://redirect.github.com/ArcadeData/arcadedb/pull/3072)
> * add stars chart by [`@​robfrank`](https://github.com/robfrank) in [ArcadeData/arcadedb#3084](https://redirect.github.com/ArcadeData/arcadedb/pull/3084)
>
> ### Performance Improvements
>
> * Lsm vector fix by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2907](https://redirect.github.com/ArcadeData/arcadedb/pull/2907)
> * perf: improved compression with lsm vectors by [`@​lvca`](https://github.com/lvca) in [ArcadeData/arcadedb#2911](https://redirect.github.com/ArcadeData/arcadedb/pull/2911)

... (truncated)


Commits

* [`6290454`](ArcadeData/arcadedb@6290454) Set release version to 25.12.1
* [`5bdbdfa`](ArcadeData/arcadedb@5bdbdfa) chore: removed system.out
* [`5764b95`](ArcadeData/arcadedb@5764b95) fix: deletion of light edge after last fix
* [`a81163a`](ArcadeData/arcadedb@a81163a) fix: avoid reuse of deleted record in same tx
* [`a42ae5e`](ArcadeData/arcadedb@a42ae5e) perf: avoid conversion of float[] into List<Float> in SQL engine
* [`c8fb3e5`](ArcadeData/arcadedb@c8fb3e5) chore: refactoring conversion functions to float[] in a centralized method
* [`de9bfcf`](ArcadeData/arcadedb@de9bfcf) fix: support List<Float> in vector index ([#3090](https://redirect.github.com/ArcadeData/arcadedb/issues/3090))
* [`9e964ef`](ArcadeData/arcadedb@9e964ef) Merge branch 'main' of <https://github.com/ArcadeData/arcadedb>
* [`07c7d3e`](ArcadeData/arcadedb@07c7d3e) Fixed failing test using java
* [`51a058b`](ArcadeData/arcadedb@51a058b) fix CONTAINSANY index use for lists of embedded documents ([#3051](https://redirect.github.com/ArcadeData/arcadedb/issues/3051))
* Additional commits viewable in [compare view](ArcadeData/arcadedb@25.11.1...25.12.1)
  
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility\_score?dependency-name=com.arcadedb:arcadedb-network&package-manager=maven&previous-version=25.11.1&new-version=25.12.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
Dependabot commands and options
  
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show  ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
robfrank added a commit that referenced this pull request Feb 11, 2026
…sk on database close (#2916)

* #2915 fix: ensure Jvector  HNSW graph file is closed and flushed to disk on database close

* fix: address PR #2916 review comments for HNSW graph persistence

Improvements to code quality and maintainability:

* Enhanced exception logging: Include full stack trace when closing graph file
  for better debugging. Changed from logging only the error message to passing
  the exception object to LogManager for complete context.

* Refactored deleteDirectory() helper: Replaced manual recursive directory
  traversal with modern java.nio.file.Files.walk() API. This approach is more
  robust, efficient, and follows Java best practices for file tree operations.
  - Uses try-with-resources for proper resource management
  - Sorts in reverse order to delete files before directories
  - Provides better exception handling with IOException

All existing tests continue to pass (22/22).

Addresses review comments from PR #2916:
- #2916 (comment)
- #2916 (comment)

* #2915 fix: ensure Jvector  HNSW graph file is closed and flushed to disk on database close

(cherry picked from commit 432707d)
mergify bot added a commit that referenced this pull request Feb 22, 2026
…in /studio in the security-critical group [skip ci]

Bumps the security-critical group in /studio with 1 update: [sweetalert2](https://github.com/sweetalert2/sweetalert2).
Updates `sweetalert2` from 11.26.18 to 11.26.20
Release notes

*Sourced from [sweetalert2's releases](https://github.com/sweetalert2/sweetalert2/releases).*

> v11.26.20
> ---------
>
> [11.26.20](sweetalert2/sweetalert2@v11.26.19...v11.26.20) (2026-02-20)
> -------------------------------------------------------------------------------------------------
>
> ### Bug Fixes
>
> * preserve focus across disableButtons/hideLoading cycle ([#2916](https://redirect.github.com/sweetalert2/sweetalert2/issues/2916)) ([8de9630](sweetalert2/sweetalert2@8de9630))
>
> v11.26.19
> ---------
>
> [11.26.19](sweetalert2/sweetalert2@v11.26.18...v11.26.19) (2026-02-19)
> -------------------------------------------------------------------------------------------------
>
> ### Bug Fixes
>
> * allowEnterKey should focus popup so Esc will work ([#2915](https://redirect.github.com/sweetalert2/sweetalert2/issues/2915)) ([5f7a514](sweetalert2/sweetalert2@5f7a514))


Changelog

*Sourced from [sweetalert2's changelog](https://github.com/sweetalert2/sweetalert2/blob/main/CHANGELOG.md).*

> [11.26.20](sweetalert2/sweetalert2@v11.26.19...v11.26.20) (2026-02-20)
> -------------------------------------------------------------------------------------------------
>
> ### Bug Fixes
>
> * preserve focus across disableButtons/hideLoading cycle ([#2916](https://redirect.github.com/sweetalert2/sweetalert2/issues/2916)) ([8de9630](sweetalert2/sweetalert2@8de9630))
>
> [11.26.19](sweetalert2/sweetalert2@v11.26.18...v11.26.19) (2026-02-19)
> -------------------------------------------------------------------------------------------------
>
> ### Bug Fixes
>
> * allowEnterKey should focus popup so Esc will work ([#2915](https://redirect.github.com/sweetalert2/sweetalert2/issues/2915)) ([5f7a514](sweetalert2/sweetalert2@5f7a514))


Commits

* [`ae4b4c8`](sweetalert2/sweetalert2@ae4b4c8) chore(release): 11.26.20 [skip ci]
* [`8de9630`](sweetalert2/sweetalert2@8de9630) fix: preserve focus across disableButtons/hideLoading cycle ([#2916](https://redirect.github.com/sweetalert2/sweetalert2/issues/2916))
* [`09fba87`](sweetalert2/sweetalert2@09fba87) chore: suggest preConfirm instead of deprecated allowEnterKey [#2914](https://redirect.github.com/sweetalert2/sweetalert2/issues/2914)
* [`7f43c49`](sweetalert2/sweetalert2@7f43c49) chore(release): 11.26.19 [skip ci]
* [`5f7a514`](sweetalert2/sweetalert2@5f7a514) fix: allowEnterKey should focus popup so Esc will work ([#2915](https://redirect.github.com/sweetalert2/sweetalert2/issues/2915))
* [`1c06b6a`](sweetalert2/sweetalert2@1c06b6a) chore: bump jquery to v4
* [`b45fedf`](sweetalert2/sweetalert2@b45fedf) chore: rm Venus Love Dolls from sponsors
* See full diff in [compare view](sweetalert2/sweetalert2@v11.26.18...v11.26.20)
  
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility\_score?dependency-name=sweetalert2&package-manager=npm\_and\_yarn&previous-version=11.26.18&new-version=11.26.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
Dependabot commands and options
  
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show  ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore  major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
- `@dependabot ignore  minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
- `@dependabot ignore ` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore ` will remove all of the ignore conditions of the specified dependency
- `@dependabot unignore  ` will remove the ignore condition of the specified dependency and ignore conditions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants