-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Recently we ran into a problem with delayed_close_timeout (the timeout was introduced to Envoy not so long ago and set to 1 second by default: https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/network/http_connection_manager/v2/http_connection_manager.proto.html?highlight=delayed_close_timeout).
Due to some badly implemented client, we had to lower the timeout (that client is a Varnish healtcheck; it doesn't close the connection by itself and waits for closing by other side).
We lowered it to 0.02s.
Some time later we noticed that large responses proxied by Envoy are incomplete. Envoy simply cuts off last part of a response. For example an 1MB response could be truncated by Envoy to only first 700KB.
We discovered that Envoy with delayed close processing enabled could close connection before the write buffer is flushed. The timer is activated before flush is complete. It is even documented here: https://github.com/envoyproxy/envoy/blob/master/source/common/network/connection_impl.cc#L133. With timeout set to 0, Envoy closes the connection only after the buffer is flushed.
Disabling delayed close (setting timeout to 0) fixes the problem, but this is not a perfect solution. I assume that delayed processing is enabled by default for good reason. Ideally we would like be protected against risks that it mitigates (to at least some degree).
Assuming that my diagnosis is correct, I have a couple of thoughts/questions:
- IMHO it is a bit misleading that timeout with delayed in its name could actually cause closing the connection earlier, than without the timeout. What do you think?
- Is it neccessary to activate the timer before the flush is complete?
- If it is neccessary, maybe it is a good idea to provide separate timeouts:
- one for waiting for flush completion
- one for waiting for closing the connection by remote peer