Skip to content

fix(datastore): add retries to emulator#14591

Merged
bhshkh merged 1 commit into
googleapis:mainfrom
bhshkh:fix/ds-keep-alive
May 15, 2026
Merged

fix(datastore): add retries to emulator#14591
bhshkh merged 1 commit into
googleapis:mainfrom
bhshkh:fix/ds-keep-alive

Conversation

@bhshkh

@bhshkh bhshkh commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary
This PR refines the Datastore client's resilience logic to fix regressions identified during TAP testing, specifically regarding emulator startup races, Unix domain sockets, and IPv6 literals. It ensures that the resilience benefits for bug b/473841984 are applied consistently to both production and emulator environments without breaking existing tests.

Why these changes are needed

  1. Handling Emulator Startup Races: Many tests (e.g., exchange:datastore_test) experience a "connection refused" error if an RPC is attempted immediately after starting the emulator process. By adding "waitForReady": true to the gRPC Service Config, the client will now block and wait for the emulator to finish binding its port instead of failing the RPC immediately.
  2. Preventing Infinite Hangs: With waitForReady enabled, an unreachable host could cause a test to hang indefinitely. This PR adds a canonical "timeout": "60s" to the generic methodConfig to ensure RPCs eventually fail with a clear error.
  3. Robust Scheme Handling (Unix Sockets & IPv6): Internal test environments like tin often use Unix domain sockets (unix:///) or IPv6 literals ([::1]) for emulator addresses. Previous logic would mangle these into invalid targets (e.g., passthrough:///unix:///...), leading to "too many colons" errors or connection hangs. The updated logic defensively checks for existing valid schemes before applying the passthrough:/// optimization.
  4. Idempotent Retries for Commit: A catch-all configuration block is added to the Service Config to ensure that UNAVAILABLE errors are retried for all methods, including Commit (used by PutMulti), which was previously omitted.

Why PR #14411 was insufficient
While PR #14411 introduced Keepalives and expanded retries for idempotent calls, it had several gaps:

  • It only applied the new configuration to the production code path, leaving the emulator path without resilience logic or standardized retries.
  • By removing the manual 100ms retry loop without adding waitForReady: true to the Service Config, it removed the "wait" period needed for emulators to become ready.
  • The aggressive prepending of passthrough:/// did not account for unix:/// schemes or bracketed IPv6 literals, causing dialer errors in specialized test environments. (Even though the passthrough was not added in PR fix(datastore): enable gRPC keepalives and expand retries  #14411, the current google3 imported code did not have this logic. While trying to import cl/906545069 latest code with psssthrough, this issue was discovered. )

Related Bug
b/473841984

@product-auto-label product-auto-label Bot added the api: datastore Issues related to the Datastore API. label May 14, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Datastore client's retry policy and emulator connection logic. The retryPolicy now includes waitForReady: true and provides a default configuration for all service methods. The NewClientWithDatabase function was updated to better handle various gRPC target schemes for the emulator host and to apply consistent keepalive and retry configurations across both standard and emulator clients. I have no feedback to provide as there were no review comments.

@bhshkh bhshkh marked this pull request as ready for review May 14, 2026 23:23
@bhshkh bhshkh requested review from a team as code owners May 14, 2026 23:24
@bhshkh bhshkh merged commit d944830 into googleapis:main May 15, 2026
11 checks passed
@bhshkh bhshkh deleted the fix/ds-keep-alive branch May 15, 2026 01:22
bhshkh added a commit that referenced this pull request May 19, 2026
PR created by the Librarian CLI to initialize a release. Merging this PR
will auto trigger a release.

Librarian Version: v0.13.2-0.20260518181009-04b8e642ea4c
Language Image:
us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/librarian-go@sha256:b04b076f5eedbb5546bd6fc1404969dd3698c8b19c0f34ae815a84ae735a606a
<details><summary>datastore: v1.24.0</summary>

##
[v1.24.0](datastore/v1.23.0...datastore/v1.24.0)
(2026-05-19)

### Bug Fixes

* detach rollback context from transaction cancellation (#14602)
([0a8d10d](0a8d10d1))

* fix context leak in Iterator and Transaction spans (#14478)
([78de20d](78de20da))

* add retries to emulator (#14591)
([d944830](d9448309))

</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: datastore Issues related to the Datastore API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants