Skip to content

Incompatible peers should be marked bad #3081

@mdyring

Description

@mdyring

ABCI app (name for built-in, URL for self-written if it's publicly available):
gaia v.0.29.1

What happened:
Log entry on freshly spun up genki-4002 node:

Jan 04 07:33:28 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:33:28.439] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_3, expected genki-4002" attempts=0
Jan 04 07:33:56 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:33:56.319] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_3, expected genki-4002" attempts=1
Jan 04 07:34:24 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:34:24.931] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="duplicate CONN<51.15.121.63:26656>" attempts=2
Jan 04 07:34:57 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:34:57.927] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="auth failure: secrect conn failed: read tcp 10.42.42.185:59088->51.15.121.63:26656: i/o timeout" attempts=3
Jan 04 07:35:27 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:35:27.930] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="auth failure: secrect conn failed: read tcp 10.42.42.185:59472->51.15.121.63:26656: i/o timeout" attempts=4
Jan 04 07:36:24 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:36:24.932] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="duplicate CONN<51.15.121.63:26656>" attempts=5
Jan 04 07:37:54 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:37:54.960] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_3, expected genki-4002" attempts=6
Jan 04 07:40:24 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:40:24.929] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="duplicate CONN<51.15.121.63:26656>" attempts=7
Jan 04 07:44:54 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:44:54.959] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_3, expected genki-4002" attempts=8
Jan 04 07:53:54 i-0dbf25d1b1adc068c gaiad[15884]: E[4016-01-04|07:53:54.961] Dialing failed                               module=pex addr=1e825baf672f3193f7457ff4ce9c3865dc21057c@51.15.121.63:26656 err="incompatible: Peer is on a different network. Got game_of_stakes_3, expected genki-4002" attempts=9

Relevant part of addrbook.json

		{
                        "addr": {
                                "id": "1e825baf672f3193f7457ff4ce9c3865dc21057c",
                                "ip": "51.15.121.63",
                                "port": 26656
                        },
                        "src": {
                                "id": "ed6b6d5019563b40e81ae29c80c712fce7ae68f0",
                                "ip": "5.83.160.83",
                                "port": 26656
                        },
                        "attempts": 24,
                        "last_attempt": "2019-01-04T07:44:54.96000829Z",
                        "last_success": "0001-01-01T00:00:00Z",
                        "bucket_type": 1,
                        "buckets": [
                                18,
                                125,
                                130,
                                199
                        ]
                },

What you expected to happen:
Incompatible peers should be marked bad and "never" attempted again.

Couple if things to note:

  • MarkBad is currently only used for failed connection attempts. Incompatible peers should also be marked bad.
  • MarkBad in AddrBook simply removes a peer today, so it will be retried immediately once added again (via RPC/PEX).
  • Bad peers should ideally be tracked using a flag and/or expiry date ("marked bad until..."), after which it can be removed from AddrBook. It can then be retried after a sensible time, if added again.

It might also be worth considering not including bad peers in PEX. But there are pros/cons.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C:p2pComponent: P2P pkgstalefor use by stalebot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions