Skip to content

limit number of open connections#1781

Merged
xla merged 3 commits intodevelopfrom
1740-node-crashes-when-too-many-rpc-connections
Jun 22, 2018
Merged

limit number of open connections#1781
xla merged 3 commits intodevelopfrom
1740-node-crashes-when-too-many-rpc-connections

Conversation

@melekes
Copy link
Contributor

@melekes melekes commented Jun 20, 2018

Refs #1740

also, expose limit option for number concurrent streams for gRPC
(unlimited by default)

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG.md
  • add metrics: rpc.ws_connections will be done in a separate PR
  • test creating many gRPC connections

@melekes melekes requested review from ebuchman and xla as code owners June 20, 2018 16:21
@melekes melekes force-pushed the 1740-node-crashes-when-too-many-rpc-connections branch 2 times, most recently from 3f65600 to 0a8f47b Compare June 20, 2018 17:29
@codecov-io
Copy link

codecov-io commented Jun 20, 2018

Codecov Report

Merging #1781 into develop will increase coverage by 0.27%.
The diff coverage is 48.93%.

@@             Coverage Diff             @@
##           develop    #1781      +/-   ##
===========================================
+ Coverage    62.37%   62.64%   +0.27%     
===========================================
  Files          140      140              
  Lines        11887    11910      +23     
===========================================
+ Hits          7414     7461      +47     
+ Misses        3785     3757      -28     
- Partials       688      692       +4
Impacted Files Coverage Δ
config/toml.go 53.19% <ø> (ø) ⬆️
rpc/grpc/client_server.go 71.42% <100%> (+3%) ⬆️
node/node.go 64.32% <100%> (+0.99%) ⬆️
config/config.go 83.58% <100%> (+0.17%) ⬆️
rpc/lib/server/http_server.go 38.83% <11.11%> (+32.81%) ⬆️
blockchain/pool.go 69.58% <0%> (-0.7%) ⬇️
p2p/peer.go 60.81% <0%> (ø) ⬆️
cmd/tendermint/commands/run_node.go 0% <0%> (ø) ⬆️

@melekes
Copy link
Contributor Author

melekes commented Jun 21, 2018

Just crashed Tendermint by attacking gRPC server (even though it's not running by default, it's good to test and fix this).

package main

import (
	"context"
	"flag"
	"fmt"
	"log"
	"sync"
	"time"

	grpc "github.com/tendermint/tendermint/rpc/grpc"
)

var (
	serverAddr = flag.String("server_addr", "127.0.0.1:36658", "The server address in the format of host:port")
)

func main() {
	flag.Parse()
	var wg sync.WaitGroup
	wg.Add(1024)
	for i := 0; i < 100000; i++ {
		go func(world int) {
			defer wg.Done()

			client := grpc.StartGRPCClient(*serverAddr)

			ctx := context.Background()
			resp, err := client.BroadcastTx(ctx, &grpc.RequestBroadcastTx{Tx: []byte(fmt.Sprintf("Hello world %d", world))})
			if err != nil {
				log.Println(err)
			} else {
				fmt.Println("Got response", resp)
			}
			time.Sleep(10 * time.Second)
		}(i)
	}
	wg.Wait()
}
panic: open /tmp/data/cs.wal: too many open files in system

goroutine 82 [running]:
github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile.(*Group).readGroupInfo(0xc420209540, 0x0, 0x0, 0x0, 0x0)
        /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile/group.go:487 +0x500
github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile.(*Group).checkTotalSizeLimit(0xc420209540)
        /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile/group.go:238 +0x64
github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile.(*Group).processTicks(0xc420209540)
        /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile/group.go:208 +0x3b
created by github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/autofile.(*Group).OnStart

@melekes melekes force-pushed the 1740-node-crashes-when-too-many-rpc-connections branch 3 times, most recently from 0549f37 to 4913bbe Compare June 22, 2018 14:52
@melekes melekes changed the title WIP: limit number of open connections limit number of open connections Jun 22, 2018
@melekes melekes mentioned this pull request Jun 22, 2018
Refs #1740

also, expose limit option for number concurrent streams for gRPC
(unlimited by default)
@melekes melekes force-pushed the 1740-node-crashes-when-too-many-rpc-connections branch from 4913bbe to 936a655 Compare June 22, 2018 15:26
config/config.go Outdated

Unsafe: false,
// should be < ({ulimit -Sn} - {MaxNumPeers} - {N of wal, db and other open files}) / 2
// divided by 2 because 1 fd for ipv4, 1 fd - ipv6
Copy link
Contributor Author

@melekes melekes Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anybody knows why websocket connection results in 2 fd: ipv4 and ipv6?

grpc open only 1 ipv6 connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't replicate it anymore... hm

@melekes
Copy link
Contributor Author

melekes commented Jun 22, 2018

package main

import (
	"encoding/json"
	"flag"
	"fmt"
	"net/http"
	"net/url"
	"os"
	"sync"

	"github.com/gorilla/websocket"
	rpctypes "github.com/tendermint/tendermint/rpc/lib/types"
)

var (
	serverAddr = flag.String("server_addr", "127.0.0.1:26657", "The server address in the format of host:port")
)

func main() {
	flag.Parse()

	const numConns = 1024

	var wg sync.WaitGroup
	wg.Add(numConns)
	for i := 0; i < numConns; i++ {
		go func(connIndex int) {
			defer wg.Done()

			c, _, err := connect(*serverAddr)
			if err != nil {
				fmt.Fprintf(os.Stderr, "conn#%d failed to connect: %v\n", connIndex, err)
				return
			}
			subscribeForBlockHeaders(c, connIndex)
			go receiveLoop(c)
		}(i)
	}
	wg.Wait()
}

func subscribeForBlockHeaders(c *websocket.Conn, connIndex int) {
	paramsJSON, err := json.Marshal(map[string]interface{}{"query": "tm.event='NewBlockHeader'"})
	if err != nil {
		fmt.Fprintf(os.Stderr, "failed to encode params: %v\n", err)
		os.Exit(1)
	}
	rawParamsJSON := json.RawMessage(paramsJSON)
	err = c.WriteJSON(rpctypes.RPCRequest{
		JSONRPC: "2.0",
		ID:      fmt.Sprintf("conn#%d", connIndex),
		Method:  "subscribe",
		Params:  rawParamsJSON,
	})
	if err != nil {
		fmt.Fprintf(os.Stderr, "conn#%d Failed to subscribe: %v", connIndex, err)
	}
}

func receiveLoop(c *websocket.Conn) {
	for {
		_, _, err := c.ReadMessage()
		if err != nil {
			if !websocket.IsCloseError(err, websocket.CloseNormalClosure) {
				fmt.Fprintf(os.Stderr, "failed to read response: %v", err)
			}
			return
		}
	}
}

func connect(host string) (*websocket.Conn, *http.Response, error) {
	u := url.URL{Scheme: "ws", Host: host, Path: "/websocket"}
	return websocket.DefaultDialer.Dial(u.String(), nil)
}

- [main] added metrics (served under `/metrics` using a Prometheus client;
disabled by default). See the new `instrumentation` section in the config and
[metrics](https://tendermint.readthedocs.io/projects/tools/en/v0.21.0/metrics.html)
guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of an older PR, right? Did you rebase recently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. For some reason, Bucky did not include it in the 0.21.0 section. But this feature was released in 0.21.0


// StartHTTPServer starts an HTTP server on listenAddr with the given handler.
// It wraps handler with RecoverAndLogHandler.
func StartHTTPServer(listenAddr string, handler http.Handler, logger log.Logger, config Config) (listener net.Listener, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is quite long, maybe we break it apart

// StartHTTPAndTLSServer starts an HTTPS server on listenAddr with the given
// handler.
// It wraps handler with RecoverAndLogHandler.
func StartHTTPAndTLSServer(listenAddr string, handler http.Handler, certFile, keyFile string, logger log.Logger, config Config) (listener net.Listener, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one of those long lines.

return nil, errors.Errorf("Failed to listen on %v: %v", listenAddr, err)
}
if config.MaxOpenConnections > 0 {
listener = netutil.LimitListener(listener, config.MaxOpenConnections)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! What is the behaviour of LimitListener, will it just block on the socket until a connection gets freed up? Is there an attack vector for resource exhaustion other than ports/fds when we use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it just block on the socket until a connection gets freed up

yes

Is there an attack vector for resource exhaustion other than ports/fds when we use this?

there is still no rate limiting. so one client can occupy all available slots

Copy link
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :octocat: :shipit: 🍡

Really like the solution, we should wrap all our listeners with this.

@xla xla merged commit f62d665 into develop Jun 22, 2018
@xla xla deleted the 1740-node-crashes-when-too-many-rpc-connections branch June 22, 2018 23:15
firelizzard18 pushed a commit to AccumulateNetwork/tendermint that referenced this pull request Feb 1, 2024
* docs: remove "Run" section from install

The "Quick Start" guide does a much better job explaining what is
happening, and readers should follow that guide instead.

* docs: move "CometBFT Quality Assurance" down

No reason to have “CometBFT Quality Assurance” as a second item. The
target audience for that document is probably security researchers - not
the primary audience.

* docs: remove cmdKVStore func in favor of link

I can’t see a single reason why users need to see the source code of
that function in order to progress with the abci-cli tool.

Also the link should point to kvstore app, not abci-cli tool!

* docs: swap abci-cli and getting-started items

https://docs.cometbft.com/v0.38/app-dev/getting-started#first-cometbft-app
should go before https://docs.cometbft.com/v0.38/app-dev/abci-cli
because the latter is used to test ABCI applications. And by this point,
the reader doesn’t have an app to test.

* docs: remove "Committing a Block" from validators page

not clear what’s the reason of explaining consensus details in the
validators section. If a validator wants to get familiar with consensus,
shouldn’t it go to consensus spec / page?

* docs: make it clear who curates the validator set

“Validators are expected to be online, and the set of validators is
permissioned/curated by some external process.” This sentence is
confusing to me. The set is curated by the ABCI application.

* docs: replace EndBlock with FinalizeBlock

* docs: add Bash script to compile persistent peers string

Getting an ID from every machine is tedious and can be streamlined with
a script, which, given IPs, collects IDs and outputs the command to run
CometBFT.
iKapitonau pushed a commit to scrtlabs/tendermint that referenced this pull request Jul 10, 2024
…t#1803)

* chore(docs): small improvements (tendermint#1781)

* docs: remove "Run" section from install

The "Quick Start" guide does a much better job explaining what is
happening, and readers should follow that guide instead.

* docs: move "CometBFT Quality Assurance" down

No reason to have “CometBFT Quality Assurance” as a second item. The
target audience for that document is probably security researchers - not
the primary audience.

* docs: remove cmdKVStore func in favor of link

I can’t see a single reason why users need to see the source code of
that function in order to progress with the abci-cli tool.

Also the link should point to kvstore app, not abci-cli tool!

* docs: swap abci-cli and getting-started items

https://docs.cometbft.com/v0.38/app-dev/getting-started#first-cometbft-app
should go before https://docs.cometbft.com/v0.38/app-dev/abci-cli
because the latter is used to test ABCI applications. And by this point,
the reader doesn’t have an app to test.

* docs: remove "Committing a Block" from validators page

not clear what’s the reason of explaining consensus details in the
validators section. If a validator wants to get familiar with consensus,
shouldn’t it go to consensus spec / page?

* docs: make it clear who curates the validator set

“Validators are expected to be online, and the set of validators is
permissioned/curated by some external process.” This sentence is
confusing to me. The set is curated by the ABCI application.

* docs: replace EndBlock with FinalizeBlock

* docs: add Bash script to compile persistent peers string

Getting an ID from every machine is tedious and can be streamlined with
a script, which, given IPs, collects IDs and outputs the command to run
CometBFT.

(cherry picked from commit 9020ce2)

# Conflicts:
#	docs/app-dev/abci-cli.md
#	docs/core/validators.md
#	docs/data-companion/README.md

* fix conflicts

* correct number for qa item

---------

Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants