Skip to content

Human readable logging on timeout #15752

@marioloko

Description

@marioloko

Describe the feature

When queue proxy timeouts it shows the following trace:

{
  "severity": "ERROR",
  "timestamp": "2025-02-04T18:01:55.621921326Z",
  "logger": "queueproxy",
  "caller": "network/error_handler.go:33",
  "message": "error reverse proxying request; sockstat: sockets: used 9\nTCP: inuse 1 orphan 1 tw 14 alloc 444 mem 2299\nUDP: inuse 0 mem 4352\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n",
  "commit": "6a27004",
  "knative.dev/key": "vista/helloworld-go-00004",
  "knative.dev/pod": "helloworld-go-00004-deployment-5578b648b-kldq5",
  "error": "context canceled",
  "stacktrace": "knative.dev/pkg/network.ErrorHandler.func1\n\tknative.dev/pkg@v0.0.0-20241021183759-9b9d535af5ad/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\tnet/http/httputil/reverseproxy.go:472\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\n\tknative.dev/serving/pkg/queue/request_metric.go:199\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3.2\n\tknative.dev/serving/pkg/queue/handler.go:65\nknative.dev/serving/pkg/queue.(*Breaker).Maybe\n\tknative.dev/serving/pkg/queue/breaker.go:155\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\n\tknative.dev/serving/pkg/queue/handler.go:63\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\n\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2171\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\tknative.dev/serving/pkg/http/handler/timeout.go:118"
}

The error that it shows is:

error reverse proxying request; sockstat: sockets: used 9
TCP: inuse 1 orphan 1 tw 14 alloc 444 mem 2299
UDP: inuse 0 mem 4352
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

I guess that for most people, it is hard to correlate this error with a timeout due to timeoutSeconds. My proposal is to log or trace the timeout when it occurs in the following functions:

  1. tryTimeoutAndWriteError
  2. tryResponseStartTimeoutAndWriteError
  3. tryIdleTimeoutAndWriteError

In each of the function indicating that a timeout was triggered and which timeout was triggered. I think that it is a simple fix, that should not impact in performance as it should be quite infrequent and it can help people quite a log figuring out what is happening in their systems.

Metadata

Metadata

Assignees

Labels

kind/featureWell-understood/specified features, ready for coding.triage/acceptedIssues which should be fixed (post-triage)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions