Skip to content

Implemented durable vector databases (golem:vector WIT Interfaces)#108

Merged
vigoo merged 50 commits intogolemcloud:mainfrom
harshtech123:vec
Nov 29, 2025
Merged

Implemented durable vector databases (golem:vector WIT Interfaces)#108
vigoo merged 50 commits intogolemcloud:mainfrom
harshtech123:vec

Conversation

@harshtech123
Copy link
Copy Markdown
Contributor

This pr is indeed for adding vector-databases in our golem-ai !
/claim #21
/closes #21

@iambenkay
Copy link
Copy Markdown
Contributor

🔥🚀 glad someone got around to it, I have been so busy. Will help review this week

@harshtech123
Copy link
Copy Markdown
Contributor Author

thanks @iambenkay !

@harshtech123
Copy link
Copy Markdown
Contributor Author

@afsalthaj could you please review this !

@afsalthaj
Copy link
Copy Markdown

@harshtech123 I am taking a look

@harshtech123
Copy link
Copy Markdown
Contributor Author

thank you so much !

@harshtech123
Copy link
Copy Markdown
Contributor Author

update : CI is green now , thank you !

@harshtech123
Copy link
Copy Markdown
Contributor Author

@harshtech123 I am taking a look

Hii @afsalthaj are you going to finish this anytime soon ?

type FilterFunc = crate::golem::vector::types::FilterExpression;
}

/// When the durability feature flag is off, wrapping with `DurableVector` is just a passthrough
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

#[cfg(feature = "durability")]
mod durable_impl {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the PR looks good. But let's get @vigoo's review on this module.

Copy link
Copy Markdown
Collaborator

@vigoo vigoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not continuing with marking all the places in the durability wrapper where it needs to be made durable, marked the first few.
I think the primary misunderstanding is that the PR only persists "writes" and not "reads" from the database. However we need to persist both, because that's how the Golem application restores an application's state - it just "replays" every past actions without actually making any real communication to the vector database (in this case)

result
} else {
let _unit: Unit = durability.replay::<Unit, VectorError>()?;
Impl::connect_internal(&endpoint, &credentials, &timeout_ms, &options)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. If an operation is persisted it means that during replay we use the persisted result instead of performing the side-effect again. So there are two choices here, depending on what connect_internal does (which I did not check):

  • Either it's result cannot be persisted. In this case it always has to be just called during replay (essentially this should just forward the call to the implementation)
  • If it can be persisted, it case of replay (the else branch) it has to be read back with replay and the result has to be used, instead of calling the internal method

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, it must be persistent otherwise we cannot persist any of the session. (It would mean that to restore the state of our component we need to communicate with the vector database).
So

  • make sure to not call connect_internal in the else branch and instead create the result based on replay's result
  • make sure to wrap the live call to connect_internal with with_persistence_level(PersistenceLevel::PersistNothing otherwise replay will not skip the internal side effects

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this context , made it to replay the persisted result instead of the function and also wrapped the live call with with_persistence_level(PersistenceLevel::PersistNothing ) , thank you !

options: Option<crate::golem::vector::types::Metadata>,
) -> Result<bool, VectorError> {
init_logging();
Impl::test_connection(endpoint, credentials, timeout_ms, options)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

name: String,
) -> Result<crate::golem::vector::collections::CollectionInfo, VectorError> {
init_logging();
Impl::get_collection(name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too


fn collection_exists(name: String) -> Result<bool, VectorError> {
init_logging();
Impl::collection_exists(name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

@harshtech123
Copy link
Copy Markdown
Contributor Author

@vigoo made the read operations durable as well , thanks for this clarification!

@vigoo vigoo merged commit 807f4b3 into golemcloud:main Nov 29, 2025
8 checks passed
@harshtech123 harshtech123 deleted the vec branch November 29, 2025 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Durable Vector Database Provider Components for golem:vector WIT Interface

4 participants