Skip to content

spanner: propagate x-goog-spanner-request-id header on every call and increment it appropriately per retry #11073

@odeke-em

Description

@odeke-em

This feature was requested for implementation by the Google Cloud Spanner internal engineering teams, to aid in much better debugging consistently and non-sporadically without sampling, we should send over a request header "x-goog-spanner-request-id".
Trying to use the OpenTelemetry TraceID is a non-starter because firstly traces are usually sampled and very expensive to always have on given that sampled tracing generates spans with more fields and exporting those to a backend can consume bandwidth and high costs for customers; also trying to retrieve a traceID retroactively is almost impossible and very difficult. Instead, the Cloud Spanner internal engineering teams are ready to accept x-goog-spanner-request-id that won't incur any extra costs for customers and can directly allow correlation of RPCs and help quickly debug.

Structure

<version>.<processId>.<clientId>.<channelId>.<request_number>.<rpc_number>
where

  • version is the version of the specification
  • process_id: a randomly generated 32-bit unsigned integer, created at startup time once per process and shared across all Spanner clients
  • client_id: a monotonically increasing (within process) value of clients created within a process
  • request_number: the nth request per grpc_spanner_client
  • channel_id: the gRPC channel ID
  • rpc_number: within retries per method; rpc_number is monotonically increasing if request_number is the same but otherwise starts from 1 for every other independent call

Task expectations

  • Generating x-goog-spanner-request-id according to the rubric above
  • Propagating x-goog-spanner-request-id in every call and every retry accordingly and tested
  • On any error returned from an RPC, augment that error with the value of x-spanner-request-id so that customers can report it to Google Customer
  • Benchmarking to show the implications of this header being generated for client libraries aka latency impact and RAM impact to generate the ID and have it plumbed in these libraries
  • For performance benefits, make sure that we use the most efficient lock-free structures and mechanisms

I have already began on this task with a working sample with a bunch of tests at PR #11048

Kindly /cc-ing @tharoldD @willpoint @olavloite

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the Spanner API.triage meI really want to be triaged.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions