Skip to content

Commit 6fd665f

Browse files
[NETPATH-656] Downgrade network ID fetch failure log from Error to Warn (#47347)
https://datadoghq.atlassian.net/browse/NETPATH-656 # Plan: Reduce Log Level for Network ID Fetch Failure ## Problem In `runner.go`, when `retryGetNetworkID()` exhausts all retries, the failure is logged at `Error` level: ```go // runner.go:96 log.Errorf("failed to get network ID: %s", err.Error()) ``` This is misleading to customers because: - The network ID is **enrichment-only metadata** — it populates `Source.NetworkID` in the traceroute payload but does not affect the correctness or completeness of the traceroute itself. - The underlying function (`retryGetNetworkID`) already retries 4 times with backoff and logs each attempt at `Debug`, so there is already adequate observability for debugging. - The docstring on `retryGetNetworkID` acknowledges that the cloud provider metadata endpoint "is sometimes unavailable during host startup", indicating this is a known transient condition rather than a hard failure. ## Code Path ``` New() (runner.go:80) └─ MemoizeNoError closure (runner.go:93) └─ retryGetNetworkID() (runner.go:265) └─ cloudprovidersnetwork.GetNetworkID() — retried 4x with backoff └─ on failure: returns ("", error) └─ log.Errorf(...) ← misleading (runner.go:96) └─ returns "" └─ networkID stored as memoized func on Runner Run() (runner.go:114) └─ r.networkID() called once, result set on payload.NetworkPath.Source.NetworkID ``` ## Proposed Fix Change line 96 from `Errorf` to `Warnf`: ```go // Before log.Errorf("failed to get network ID: %s", err.Error()) // After log.Warnf("failed to get network ID: %s", err.Error()) ``` ## Rationale for `Warn` over `Info` - `Warn` reflects a genuinely degraded state: network path data will be missing `NetworkID`, which can affect enrichment and correlation in the Datadog backend. - `Info` would be too quiet — a customer debugging missing network ID enrichment would likely overlook it. - `Warn` is consistent with the adjacent log at line 86, which uses `Warnf` for a similarly non-fatal `gatewayLookup` issue. ## Affected File - `pkg/networkpath/traceroute/runner/runner.go` — line 96, one-line change Co-authored-by: ken.schneider <ken.schneider@datadoghq.com>
1 parent 92a2823 commit 6fd665f

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

pkg/networkpath/traceroute/runner/runner.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ func New(telemetryComp telemetryComponent.Component) (*Runner, error) {
9393
networkID: funcs.MemoizeNoError(func() string {
9494
nid, err := retryGetNetworkID()
9595
if err != nil {
96-
log.Errorf("failed to get network ID: %s", err.Error())
96+
log.Warnf("failed to get network ID: %s", err.Error())
9797
}
9898
return nid
9999
}),

0 commit comments

Comments
 (0)