Skip to content

.Net: [MEVD] Stop parallelizing batching operations where native database batching isn't supported #11854

@roji

Description

@roji

For our batching operation variants (get, upsert, delete), we try to use native database batching wherever possible to do a single roundtrip to the database. However, where no such native support exists, we fall back to calling the single-record variant concurrently. For example, the Weaviate implementation for batching GetAsync:

var tasks = keys.Select(key => this.GetAsync(key, options, cancellationToken));

var records = await Task.WhenAll(tasks).ConfigureAwait(false);

foreach (var record in records)
{
    if (record is not null)
    {
        yield return record;
    }
}

In API review, it was pointed out that we try to avoid introducing concurrency under the hood in this way, but prefer for users to do that where it makes sense.

Concretely here, this is very unlikely to scale well - calling this for 10000 records and spawning 10000 Tasks is very likely to overload the database and be counter-productive for efficiency. Some limited form of concurrency (with an upper limit) may be useful, but we think this should be an explicit opt-in by the user. For now, we'll remove this, simply doing roundtrip-per-record, and can always introduce an opt-in later for introducing concurrency.

Do this after the switch to abstract base classes (#11775). At that point, we can have simple default implementations (which simply loop over the single-record variant, without concurrency), and these get overridden only where native batching support exists.

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codeBuildFeatures planned for next Build conferencemsft.ext.vectordataRelated to Microsoft.Extensions.VectorData

Projects

Status

Sprint: Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions