R4R: Add timeouts to http servers#2780
Conversation
… with ones with timeouts to prevent file descriptor exhaustion
rpc/lib/server/http_server.go
Outdated
| logger.Info("RPC HTTP server stopped", "err", err) | ||
| s := &http.Server{ | ||
| Handler: RecoverAndLogHandler(maxBytesHandler{h: handler, n: maxBodyBytes}, logger), | ||
| ReadTimeout: 10 * time.Second, |
There was a problem hiding this comment.
Is 10 seconds too long?
There was a problem hiding this comment.
I'd lower both to 3s
| ReadTimeout: 10 * time.Second, | |
| ReadTimeout: 3 * time.Second, |
|
rpc/lib/server/http_server.go
Outdated
| logger.Info("RPC HTTP server stopped", "err", err) | ||
| s := &http.Server{ | ||
| Handler: RecoverAndLogHandler(maxBytesHandler{h: handler, n: maxBodyBytes}, logger), | ||
| ReadTimeout: 10 * time.Second, |
There was a problem hiding this comment.
I'd lower both to 3s
| ReadTimeout: 10 * time.Second, | |
| ReadTimeout: 3 * time.Second, |
|
This has magic constants in several places, and I'm not sure how it relates to other timeout parameters that we already have for peer connections. Needs a bit of review to parameterize & make consistent w/ the rest of the codebase. |
What does this have to do with peer connections? Are you saying we should make the http and p2p connection timeouts somehow related?
What is this referring to ? |
Ah no, we shouldn't do that. Lets just use constants instead of 10, and otherwise good to go. |
|
Oh - this breaks Not sure how to reconcile ... Does anyone know if/what the issue was here in the first place? Is there any endpoint that can hang for a while that we'd need this for? |
Right. tendermint/rpc/core/mempool.go Line 197 in d8ab850 |
| maxBodyBytes = int64(1000000) // 1MB | ||
|
|
||
| // same as the net/http default | ||
| maxHeaderBytes = 1 << 20 |
There was a problem hiding this comment.
Go will use a default in the absence of such. Is there a reason to duplicate on our side?
We need to make one a function of the other so this doesn't become some kind of implicit dependency between constants. I also still want to understand why we need these timeouts in the first place - under what circumstance are they actually important to us ? |
|
Ok, I read https://blog.cloudflare.com/exposing-go-on-the-internet/ and https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/ It's clear we need to do this to handle hostile clients, but I still wonder if the original issue that prompted this might lie deeper than this fix, assuming our clients were functioning properly. I suspect there are other issues with file descriptors in Tendermint, but anyways. |
The `broadcast_tx_commit` endpoint has it's own timeout. If this is longer than the http server's WriteTimeout, the user will receive an error. Here, we export the WriteTimeout and set the broadcast_tx_commit timeout to be less than it. In the future, we should use a config struct for the timeouts to avoid the need to export. The broadcast_tx_commit timeout may also become configurable, but we must check that it's less than the server's WriteTimeout.
Codecov Report
@@ Coverage Diff @@
## develop #2780 +/- ##
===========================================
+ Coverage 62.09% 62.13% +0.03%
===========================================
Files 212 212
Lines 17279 17283 +4
===========================================
+ Hits 10730 10738 +8
+ Misses 5642 5639 -3
+ Partials 907 906 -1
|
|
I think the other file descriptor issues might a result of leveldb settings |
…the JSON (backport tendermint#2774) (tendermint#2780) This change fixes a bug in which BitArray.UnmarshalJSON hadn't accounted for the fact that invoking NewBitArray(<=0) returns nil and hence when dereferenced would crash with a runtime nil pointer dereference. This bug was found by my security analysis and fuzzing too. Author: @odeke-em Fixes cometbft/cometbft#2658 --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code comments~~ - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec <hr>This is an automatic backport of pull request tendermint#2774 done by [Mergify](https://mergify.com). --------- Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com> Co-authored-by: Andy Nogueira <me@andynogueira.dev>
The current http server implementation does not enforce anytime outs on connections. This facilitates file descriptor exhaustion.
This introduces timeouts into the http server implementation.