-
Notifications
You must be signed in to change notification settings - Fork 1.5k
spanner: propagate x-goog-spanner-request-id header on every call and increment it appropriately per retry #11073
Description
This feature was requested for implementation by the Google Cloud Spanner internal engineering teams, to aid in much better debugging consistently and non-sporadically without sampling, we should send over a request header "x-goog-spanner-request-id".
Trying to use the OpenTelemetry TraceID is a non-starter because firstly traces are usually sampled and very expensive to always have on given that sampled tracing generates spans with more fields and exporting those to a backend can consume bandwidth and high costs for customers; also trying to retrieve a traceID retroactively is almost impossible and very difficult. Instead, the Cloud Spanner internal engineering teams are ready to accept x-goog-spanner-request-id that won't incur any extra costs for customers and can directly allow correlation of RPCs and help quickly debug.
Structure
<version>.<processId>.<clientId>.<channelId>.<request_number>.<rpc_number>
where
- version is the version of the specification
- process_id: a randomly generated 32-bit unsigned integer, created at startup time once per process and shared across all Spanner clients
- client_id: a monotonically increasing (within process) value of clients created within a process
- request_number: the nth request per grpc_spanner_client
- channel_id: the gRPC channel ID
- rpc_number: within retries per method; rpc_number is monotonically increasing if request_number is the same but otherwise starts from 1 for every other independent call
Task expectations
- Generating x-goog-spanner-request-id according to the rubric above
- Propagating x-goog-spanner-request-id in every call and every retry accordingly and tested
- On any error returned from an RPC, augment that error with the value of x-spanner-request-id so that customers can report it to Google Customer
- Benchmarking to show the implications of this header being generated for client libraries aka latency impact and RAM impact to generate the ID and have it plumbed in these libraries
- For performance benefits, make sure that we use the most efficient lock-free structures and mechanisms
I have already began on this task with a working sample with a bunch of tests at PR #11048
Kindly /cc-ing @tharoldD @willpoint @olavloite