tracers: fix Stop/GetResult race in native tracers#21123
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a data race between Stop (timeout watchdog) and GetResult (RPC handler) in Erigon’s native tracers by making the interruption reason concurrently readable/writable via an atomic pointer, and adds a regression test intended to cover the concurrent Stop/GetResult path.
Changes:
- Replace plain
errorinterruption reason fields withatomic.Pointer[error]incallTracer,fourByteTracer, andprestateTracer. - Update
GetResultimplementations (includingflatCallTracer) to read the reason viaLoad()and handle the unset (nil) case. - Add
TestTracerStopRaceto exercise concurrentStopandGetResultacross multiple native tracers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| execution/tracing/tracers/native/tracer_test.go | Adds a concurrency regression test for Stop vs GetResult on several native tracers. |
| execution/tracing/tracers/native/prestate.go | Makes Stop/GetResult interruption reason access atomic in prestateTracer. |
| execution/tracing/tracers/native/call.go | Makes Stop/GetResult interruption reason access atomic in callTracer. |
| execution/tracing/tracers/native/call_flat.go | Reads shared callTracer interruption reason atomically in flatCallTracer.GetResult. |
| execution/tracing/tracers/native/4byte.go | Makes Stop/GetResult interruption reason access atomic in fourByteTracer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // callTracer and flatCallTracer's GetResult short-circuits on | ||
| // an empty callstack ("incorrect number of top-level calls") before loading | ||
| // the reason. For those tracers the test pushes a single top-level call frame | ||
| // via OnEnter so GetResult reaches the reason.Load() path where the race can | ||
| // be observed under -race. |
There was a problem hiding this comment.
The important bit here is that both tracers return before reaching the reason.Load() path when the callstack is empty, but you’re right that the quoted error only matches callTracer; flatCallTracer returns "invalid number of calls" instead.
This ports the relevant native tracer race fix to Erigon.
Native tracers that support
Stopstore an interruption reason from the trace timeout path, whileGetResultcan read that reason concurrently from the RPC handler path. The reason was stored as a plain error interface, so concurrent reads and writes were unsynchronized.Because
erroris an interface value, racing access can observe an inconsistent interface state. This replaces the plain reason field withatomic.Pointer[error]in Erigon’s native tracers:execution/tracing/tracers/native/call.goexecution/tracing/tracers/native/4byte.goexecution/tracing/tracers/native/prestate.goflatCallTracershares the underlyingcallTracer.reason, so itsGetResultnow reads that pointer atomically as well.Also adds
TestTracerStopRace, covering concurrentStopandGetResultfor:callTracerflatCallTracer4byteTracerprestateTracer