Skip to content

Improve robustness of monitoring APIs #55550

@imotov

Description

@imotov

We observed some cases (#50241 for example) where a data node responding slowly can cause accumulation of ResponseContexts for indices:monitor/recovery[n], indices:monitor/stats[n], cluster:monitor/stats[n] and cluster:monitor/xpack/ml/job/stats/get[n] which correspond to _xpack/usage and _nodes/stats calls.

We would like to improve robustness of stats and usage call in case of a slowly responding data nodes by

  1. introducing timeout on stats and usage APIs and/or
  2. making stats and usage APIs tasks cancellable and cancel them if the REST client disconnects

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions