Implemented durable vector databases (golem:vector WIT Interfaces)#108
Implemented durable vector databases (golem:vector WIT Interfaces)#108vigoo merged 50 commits intogolemcloud:mainfrom
Conversation
|
🔥🚀 glad someone got around to it, I have been so busy. Will help review this week |
|
thanks @iambenkay ! |
|
@afsalthaj could you please review this ! |
|
@harshtech123 I am taking a look |
|
thank you so much ! |
|
update : CI is green now , thank you ! |
Hii @afsalthaj are you going to finish this anytime soon ? |
| type FilterFunc = crate::golem::vector::types::FilterExpression; | ||
| } | ||
|
|
||
| /// When the durability feature flag is off, wrapping with `DurableVector` is just a passthrough |
| } | ||
|
|
||
| #[cfg(feature = "durability")] | ||
| mod durable_impl { |
There was a problem hiding this comment.
Mostly the PR looks good. But let's get @vigoo's review on this module.
vigoo
left a comment
There was a problem hiding this comment.
I'm not continuing with marking all the places in the durability wrapper where it needs to be made durable, marked the first few.
I think the primary misunderstanding is that the PR only persists "writes" and not "reads" from the database. However we need to persist both, because that's how the Golem application restores an application's state - it just "replays" every past actions without actually making any real communication to the vector database (in this case)
vector/vector/src/durability.rs
Outdated
| result | ||
| } else { | ||
| let _unit: Unit = durability.replay::<Unit, VectorError>()?; | ||
| Impl::connect_internal(&endpoint, &credentials, &timeout_ms, &options) |
There was a problem hiding this comment.
This is incorrect. If an operation is persisted it means that during replay we use the persisted result instead of performing the side-effect again. So there are two choices here, depending on what connect_internal does (which I did not check):
- Either it's result cannot be persisted. In this case it always has to be just called during replay (essentially this should just forward the call to the implementation)
- If it can be persisted, it case of replay (the else branch) it has to be read back with
replayand the result has to be used, instead of calling the internal method
There was a problem hiding this comment.
That said, it must be persistent otherwise we cannot persist any of the session. (It would mean that to restore the state of our component we need to communicate with the vector database).
So
- make sure to not call
connect_internalin the else branch and instead create the result based on replay's result - make sure to wrap the live call to
connect_internalwithwith_persistence_level(PersistenceLevel::PersistNothingotherwise replay will not skip the internal side effects
There was a problem hiding this comment.
thanks for this context , made it to replay the persisted result instead of the function and also wrapped the live call with with_persistence_level(PersistenceLevel::PersistNothing ) , thank you !
vector/vector/src/durability.rs
Outdated
| options: Option<crate::golem::vector::types::Metadata>, | ||
| ) -> Result<bool, VectorError> { | ||
| init_logging(); | ||
| Impl::test_connection(endpoint, credentials, timeout_ms, options) |
vector/vector/src/durability.rs
Outdated
| name: String, | ||
| ) -> Result<crate::golem::vector::collections::CollectionInfo, VectorError> { | ||
| init_logging(); | ||
| Impl::get_collection(name) |
vector/vector/src/durability.rs
Outdated
|
|
||
| fn collection_exists(name: String) -> Result<bool, VectorError> { | ||
| init_logging(); | ||
| Impl::collection_exists(name) |
|
@vigoo made the read operations durable as well , thanks for this clarification! |
This pr is indeed for adding vector-databases in our golem-ai !
/claim #21
/closes #21