Skip to content

Add eager vector graph rebuild API to LSMVectorIndex#3146

Merged
lvca merged 1 commit intoArcadeData:mainfrom
humemai:feature/build-vector-graph-now
Jan 15, 2026
Merged

Add eager vector graph rebuild API to LSMVectorIndex#3146
lvca merged 1 commit intoArcadeData:mainfrom
humemai:feature/build-vector-graph-now

Conversation

@tae898
Copy link
Contributor

@tae898 tae898 commented Jan 15, 2026

Summary

Testing

  • Works fine. After the eager build, the "warmup" query is not necessary anymore.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tae898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new public API to the LSMVectorIndex that enables an immediate and forced rebuild of its underlying vector graph. This functionality is particularly useful after bulk data operations, allowing users to ensure the graph is up-to-date without waiting for a lazy build trigger. The implementation includes safeguards against concurrent rebuilds and ensures the graph is reconstructed from a consistent state.

Highlights

  • New API for eager graph rebuild: Introduced buildVectorGraphNow API in LSMVectorIndex to allow immediate, on-demand rebuilding of the vector graph, bypassing lazy build mechanisms.
  • Concurrency control for rebuilds: Implemented a guard using an index availability check to prevent concurrent graph rebuilds, ensuring callers can retry cleanly if the index is busy.
  • Forced rebuild from on-disk state: The eager rebuild process now explicitly resets the mutation counter and rebuilds the graph from on-disk pages, effectively ignoring lazy build thresholds.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Jan 15, 2026

🧪 CI Insights

Here's what we observed from your CI run for 0de08dc.

🟢 All jobs passed!

But CI Insights is watching 👀

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Jan 15, 2026

🧪 CI Insights

Here's what we observed from your CI run for 0de08dc.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new public API buildVectorGraphNow to the LSMVectorIndex for eagerly triggering a rebuild of the vector graph. The implementation is clean and robust, correctly handling concurrency by using an atomic compareAndSet on the index status to prevent simultaneous rebuilds. The use of a try...finally block ensures the index status is always restored to AVAILABLE, even in case of failures. The new API properly delegates to the existing buildGraphFromScratchWithRetry method to perform the rebuild, making it a well-integrated and valuable addition for users who need to ensure the graph is up-to-date after bulk operations.

@lvca
Copy link
Member

lvca commented Jan 15, 2026

How about the existent.build() API?

@tae898
Copy link
Contributor Author

tae898 commented Jan 15, 2026

@lvca

Right now LSMVectorIndex.build() doesn’t actually rebuild the vector graph. It only rewrites the pages unless the index is in the special LOADING state. In normal life the state is IMMUTABLE, so build() exits and the first queries still trigger a lazy graph build. To make it a real rebuild, the code has to flip the state back to LOADING (or otherwise force the graph rebuild) before/inside build(). And the SQL REBUILD INDEX path drops/recreates without restoring vector metadata (dimension, similarity, etc.), which is why rebuilt vector indexes come back broken (dimension 0).

@lvca
Copy link
Member

lvca commented Jan 15, 2026

I see in the code this:

  public long build(final BuildIndexCallback callback, final GraphBuildCallback graphCallback) {
      if (status.compareAndSet(INDEX_STATUS.AVAILABLE, INDEX_STATUS.UNAVAILABLE)) {
        try {
          final DatabaseInternal db = getDatabase();

          // PHASE 1: Mark index as BUILDING and disable WAL
          persistBuildState(BUILD_STATE.BUILDING);
...

So it forces the state to BUILDING every time.

@lvca
Copy link
Member

lvca commented Jan 15, 2026

Ok, I got it now. Merging it

@lvca lvca merged commit 91a86e3 into ArcadeData:main Jan 15, 2026
11 of 13 checks passed
@lvca lvca added enhancement New feature or request fixed / implemented labels Jan 15, 2026
@lvca lvca added this to the 26.1.1 milestone Jan 15, 2026
@tae898 tae898 deleted the feature/build-vector-graph-now branch January 16, 2026 14:57
robfrank pushed a commit that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants