Skip to content

[MEVD] Have batching UpsertAsync return IReadOnlyList instead of IAsyncEnumerable (for database-generated IDs) #10692

@roji

Description

@roji

As an aside, returning IAsyncEnumerable from UpsertBatch may be something we'd want to reexamine. First, it seems that only a small minority of databases actually support database-generated keys/ids (basically the relational databases like PG) - I'm not sure if we've fully mapped out the support, but if that's true, I'm not sure the abstraction should have an API feature which only works for the minority etc.

In addition, I'm not sure if database-generated keys correspond to the way that vector stores are generally used - this would probably be why no native vector database AFAIK actually supports this feature (please correct me if I'm wrong). If that's true, then we're supporting them in the vector database abstraction because relational databases happen to have them for other, non-vector scenarios.

Note that I wouldn't raise this if returning IAsyncEnumerable didn't create API usability issues. Currently, inserting requires the following code:

await foreach (var _ in this.Collection.UpsertBatchAsync(this.TestData))
{
}

This is because one cannot simply await an IAsyncEnumerable. I want to make sure we're not forcing users to do this every time they want to add some records, only because some databases support database key generation, and even then, when vector store usage typically wouldn't require actually fetching the key back at the point where it's inserted.

I'm very open to being convinced otherwise here, but the fact that dedicated vector stores don't support database-generated keys feels like a strong indication that they don't belong on the API surface of the abstraaction.

/cc @dmytrostruk @westey-m @adamsitnik

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codeBuildFeatures planned for next Build conferencememorymemory connectormsft.ext.vectordataRelated to Microsoft.Extensions.VectorData

Type

No type

Projects

Status

Sprint: Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions