memory reindex aborts on transient embedding transport errors instead of retrying or splitting the batch

## Summary

`MemoryManagerEmbeddingOps.embedBatchWithRetry()` currently retries rate-limit style failures, but it does not treat transient transport failures as retryable.

In practice, longer remote memory reindex runs can fail with errors like:

- `TypeError: fetch failed`
- `ECONNRESET`
- `socket hang up`
- `terminated`
- `other side closed`

When that happens, the whole memory sync aborts even though retrying the same batch often succeeds.

## Why this matters

This shows up during larger remote embedding runs, especially when indexing many documents over network-backed providers.

The current failure mode is costly:

- the whole reindex fails
- already-processed chunks are wasted
- rerunning often succeeds without any input change

So the system is already resilient to rate limits, but still brittle against transient transport failures.

## Expected behavior

For transient transport errors during batch embedding:

1. Retry a few times with the existing backoff behavior.
2. If retries are exhausted and the batch has multiple items, split the batch and continue recursively.
3. Only fail immediately for non-retryable errors or single-item batches that still fail after retries.

## Reproduction

A focused unit-test repro is straightforward:

- mock `embedBatch()` to fail once with `TypeError("fetch failed")`, then succeed
- mock `embedBatch()` to keep failing with `fetch failed` for `texts.length > 1`, but succeed for single-item batches

## Scope of a safe fix

This can stay intentionally narrow:

- only `embedBatchWithRetry()` needs to change
- no provider-specific branching
- no config/schema changes
- no timeout constant changes

A small targeted retry + split fallback should make remote memory reindex much more resilient without changing the normal success path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory reindex aborts on transient embedding transport errors instead of retrying or splitting the batch #44166

Summary

Why this matters

Expected behavior

Reproduction

Scope of a safe fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

memory reindex aborts on transient embedding transport errors instead of retrying or splitting the batch #44166

Description

Summary

Why this matters

Expected behavior

Reproduction

Scope of a safe fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions