Skip to content

0.17.0-rc1: ResourceMgr defaults clash with AcceleratedDHTClient #9405

@lidel

Description

@lidel

Version

0.17.0-rc1

Config

> ipfs config --json Swarm.ConnMgr # these are defaults from 0.17.0-rc1
{
  "GracePeriod": "20s",
  "HighWater": 900,
  "LowWater": 600,
  "Type": "basic"
}

$ ipfs config --json Experimental.AcceleratedDHTClient
true

Description

Enabling Experimental.AcceleratedDHTClient with ResourceMgr does not work with defaults from 0.17.0-rc1, user gets vague ERROR message which is then overrun by resourcemanager errors:

2022-11-14T17:17:03.911Z	ERROR	fullrtdht	fullrt/dht.go:309	Accelerated DHT client was unable to fully refresh its routing table due to Resource Manager limits, which may degrade content routing. Consider increasing resource limits. See debug logs for the "dht-crawler" subsystem for details.
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 168 times with error "transient: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 177 times with error "transient: cannot reserve outbound connection: resource limit exceeded".
2022-11-14T17:17:13.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-14T17:17:23.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 42 times with error "system: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:23.135Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-14T17:17:33.136Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 80627 times with error "system: cannot reserve connection: resource limit exceeded".
2022-11-14T17:17:33.136Z	ERROR	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr

Given that we suggest enabling this setting to everyone running a Server, it feels we should do more here, either in docs or general UX.

Potential fix?

  • proper fix: make accelerated DHT client adjust its work based on limits + if limit is lower than X, print one-time WARNING recommending increasing the connection limits for best performance.
  • short term, or if proper fix is not feasible: we could update https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#accelerated-dht-client with example how to raise relevant limits, and link to it from the error message:
    2022-11-14T17:17:03.911Z	ERROR	fullrtdht	fullrt/dht.go:309	Accelerated DHT client was unable    to fully refresh its routing table due to ResourceMgr limits, which may degrade content routing. Consider increasing resource limits. See debug logs for the "dht-crawler" subsystem for details, and 
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugA bug in existing code (including security flaws)need/triageNeeds initial labeling and prioritizationtopic/resource-managerIssues related to Swarm.ResourceMgr (resource manager)

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions