Skip to content

fix: Configure idle timeout when timeout is set on HTTPRoute#2708

Merged
arkodg merged 3 commits intoenvoyproxy:mainfrom
davidalger:algerdev/idle-timeout
Feb 27, 2024
Merged

fix: Configure idle timeout when timeout is set on HTTPRoute#2708
arkodg merged 3 commits intoenvoyproxy:mainfrom
davidalger:algerdev/idle-timeout

Conversation

@davidalger
Copy link
Copy Markdown
Contributor

@davidalger davidalger commented Feb 26, 2024

What type of PR is this?

Fixes bug where default stream_idle_timeout in connection manager interferes with request timeouts set on HTTPRoute when set greater than 5 minutes preventing high request timeouts from being used without patching connection manager settings.

What this PR does:

If an HTTP request timeout is configured on the route and is non-zero, the idle_timeout will be set to the greater of 1hr or the configured request timeout. If the timeout on the HTTPRoute is set to 0 the idle_timeout is also set to 0 disabling both timeouts on the route.

How to reproduce:

  1. Deploy demo into kind cluster and set a fault injection delay of 500s:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: fault-injection-delay
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: backend
  faultInjection:
    delay:
      fixedDelay: 500s
  1. Configure timeout on HTTPRoute:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
spec:
  hostnames:
  - www.example.com
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
  rules:
  - backendRefs:
    - kind: Service
      name: backend
      port: 3000
    matches:
    - path:
        type: PathPrefix
        value: /
    timeouts:
      request: 600s
  1. Issue request via curl noting the 504 response code at the 5 minute mark
$ time curl -sv -H "Host: www.example.com" http://172.18.255.200/
*   Trying 172.18.255.200:80...
* Connected to 172.18.255.200 (172.18.255.200) port 80
> GET / HTTP/1.1
> Host: www.example.com
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 504 Gateway Timeout
< content-length: 14
< content-type: text/plain
< date: Mon, 26 Feb 2024 18:20:32 GMT
< 
* Connection #0 to host 172.18.255.200 left intact
stream timeout
real	5m0.072s
user	0m0.017s
sys	0m0.023s
  1. Note the reason for the early timeout in the request log:
"response_code":"504","response_flags":"DI,SI","response_code_details":"stream_idle_timeout"

Signed-off-by: David Alger <davidmalger@gmail.com>
@davidalger davidalger requested a review from a team as a code owner February 26, 2024 18:54
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.03%. Comparing base (72fadb7) to head (c5dec65).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2708      +/-   ##
==========================================
+ Coverage   63.00%   63.03%   +0.02%     
==========================================
  Files         122      122              
  Lines       19770    19786      +16     
==========================================
+ Hits        12457    12473      +16     
  Misses       6506     6506              
  Partials      807      807              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}
}

func idleTimeout(httpRoute *ir.HTTPRoute) *durationpb.Duration {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for below logic until we expose this as an API field

idleTimeout = 1hr
if idleTimout < requestTimeout {
    idleTimout = requestTimeout
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this, with the caveat that it should still be set to zero so the request timeout will function per GEP-1742:

A zero-valued timeout ("0s") MUST be interpreted as disabling the timeout
ref: https://gateway-api.sigs.k8s.io/geps/gep-1742/#timeout-values

This should work per envoy docs:

value of 0 will completely disable the route’s idle timeout, even if a connection manager stream idle timeout is configured

ref: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-field-config-route-v3-routeaction-idle-timeout

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah agree with that

Signed-off-by: David Alger <davidmalger@gmail.com>
name: first-route
route:
cluster: first-route-dest
idleTimeout: 3600s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be ideal if this wasn't set

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two choices:

a) 1hr default set on the route only when the request timeout is passed on the route (as currently implemented)

b) Set stream_idle_timeout in the connection manager to 1hr (vs implicit default of 300s), only configuring per-route idle timeout if >1hr or 0

thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer b

Copy link
Copy Markdown
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@zirain
Copy link
Copy Markdown
Member

zirain commented Feb 27, 2024

/retest

@zirain
Copy link
Copy Markdown
Member

zirain commented Feb 27, 2024

/retest

@arkodg arkodg merged commit 698e6f1 into envoyproxy:main Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants