Skip to content

Chunked Encoding for NodeStatsResponse#90097

Merged
original-brownbear merged 1 commit intoelastic:mainfrom
original-brownbear:chunked-nodes-stats
Sep 20, 2022
Merged

Chunked Encoding for NodeStatsResponse#90097
original-brownbear merged 1 commit intoelastic:mainfrom
original-brownbear:chunked-nodes-stats

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

Turn this into a chunked response to some degree.
Only chunks per node for now, since deeper chunking needs larger changes downstream that don't fit in well with the current API.
The problem is that the "level" parameter that controls whether or not we return the very large indices or shard level responses is an x-content param so we don't have it when creating the iterator. I'd address this in a follow-up that changes the API a little.

As a result, I did not add a test here that validates the chunk count since I'd like to do more work on this anyway. I think it's a valuable change in its current form already and introduces a parent class that allows for turning other APIs into chunked encoding also.
For example, the indices level response for node stats in a 25k indices cluster across 6 data nodes is currently ~120M (and that is without pretty or human!). Without this change, each hit of the indices stats API will cause the coordinating node to allocate 120M for the response. With this change, we will only allocate ~20M for sending the same response. Serializing those 20M on the transport thread should be a non-issue from some quick benchmarking as even serializing the full 120M seems to well under one second.

relates #89838

Turn this into a chunked response to some degree.
Only chunks per node for now, since deeper chunking needs
larger changes downstream that don't fit in well with the
current API.
@original-brownbear original-brownbear added >non-issue :Distributed/Network Http and internode communication implementations v8.5.0 labels Sep 15, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Sep 15, 2022
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks David!

@Dongzhenpu
Copy link
Copy Markdown

Hi there, I found that after 8.5(the version that this commit was merged into), the nodes/stats api with level=abc query params will raise exception instead of return the error message in versions before 8.5

# version after 8.5
Error: socket hang up

# version before 8.5
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "level parameter must be one of [cluster] or [indices] or [shards] but was [abc]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "level parameter must be one of [cluster] or [indices] or [shards] but was [abc]",
    "suppressed" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "Failed to close the XContentBuilder",
        "caused_by" : {
          "type" : "i_o_exception",
          "reason" : "Unclosed object or array found"
        }
      }
    ]
  },
  "status" : 400
}

I see Armin Braun said that

The problem is that the "level" parameter that controls whether or not we return the very large indices or shard level responses is an x-content param so we don't have it when creating the iterator. I'd address this in a follow-up that changes the API a little.

Is this an unexpected exception to this commit or is it a bug?

@DaveCTurner
Copy link
Copy Markdown
Member

I think this is a bug, although it's just one instance of a much more general problem. I opened #93981.

@original-brownbear original-brownbear restored the chunked-nodes-stats branch April 18, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Network Http and internode communication implementations >non-issue Team:Distributed Meta label for distributed team. v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants