Skip to content

mgr/rest: Trim requests array and limit size#54984

Merged
NitzanMordhai merged 3 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-restful-un-boundary-keep-requests
Aug 20, 2024
Merged

mgr/rest: Trim requests array and limit size#54984
NitzanMordhai merged 3 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-restful-un-boundary-keep-requests

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Dec 21, 2023

Presently, the requests array in the REST module has the potential to grow indefinitely, leading to excessive memory consumption, particularly when dealing with lengthy and intricate request results.

To address this issue, a limit will be imposed on the requests array within the REST module.
This limitation will be governed by the mgr/restful/x/max_requests configuration parameter specific to the REST module.

I ran a few thousand pg dump commands with and without the trimming of the requests array, there is a big improvement with the trimming when the cluster returns a large output from the pg dump command.

with the trimming:
image

without the trimming:
image

Fixes: https://tracker.ceph.com/issues/59580

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@NitzanMordhai
Copy link
Contributor Author

@tchaikov @athanatos this is a split PR from #54634 like you suggested.
the 500 max_requests default size was set because of the large outputs that the users mentioned in the tracker that "pg dump" caused the mgr to OOM.
with trimmed requests, we are limiting the users getting older request results but on the other hand we not letting it grow un boundary ( We can also change that to limit by capacity instead of size, that way we can keep large amount of small requests)

@tchaikov
Copy link
Contributor

tchaikov commented Dec 25, 2023

i think there is a defect in the design of the restful mgr module. which allows client to send requests to the cluster using RESTful API, and client can track the requests' status using their ids. and the mgr module deletes the finished requests upon request of client, or delete the specified one upon request of client.

but if client fails to do so, the requests are accumulated in the mgr module, no matter if they are finished or not.

@NitzanMordhai i'd suggest

  1. mark the restful mgr module deprecated, and encourage user switch to ceph_api. previous Ernesto sent a mail for this purpose 3 years ago. see https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LBKLNXH7UQL7TLFU5G52Y2SYVME4RS6P/ . in order to deprecate this module, we can

    • add corresponding document in doc/mgr/restful.rst.
    • move this module out of mgr-modules-core into a dedicated package. (i can help on this)
    • note in the PendingReleaseNotes regarding the change on packaging (i can help on this also)
    • reply on the tracker ticket and mail, to explain that this leakage is due to a deprecated mgr module.
  2. to address this very issue while preserving the existing behavior of tracking requests sent to cluster via restful mgr module to a certain degree

    • add the max_requests option as implemented in this change. and note it in PendingReleaseNotes.
    • when the number of requests exceeds the number of max_requets, trim the requests with the configured max_requets. and print a logging message with the info level if only the finished ones are trimmed., if unfinished requests are trimmed, print an error message in the logging message.
    • please also note the rationales in the commit messages in details.

i think the title of "Trim results array and limit size" is a little bit misleading, it's a "request" list, not a "results" array.

and the option should be documented, but you checked "No doc update is appropriate". i doubt this.

@NitzanMordhai
Copy link
Contributor Author

@tchaikov thanks for the review, I'm going to fix some of the things, but since we haven't got any respond yet on the user-list (3 years past and yet, we have some users who use that) maybe it will be good idea to resent that message and announce on the deprecation

@tchaikov
Copy link
Contributor

tchaikov commented Jan 8, 2024

@NitzanMordhai yeah, IMHO, 3 years is long enough to make a breaking change. and the first step is just to move the module into its own module instead of removing it. this should help users to understand the status of this module.

@tchaikov tchaikov removed their request for review February 5, 2024 02:20
@tchaikov
Copy link
Contributor

tchaikov commented Feb 5, 2024

please ping me if this change is ready for another review.

@NitzanMordhai
Copy link
Contributor Author

please ping me if this change is ready for another review.

@tchaikov thank you very much for your reviews, can we proceed with that fix and deprecate the rest module in another PR? Currently, the deprecation is not announced in changelog, I'll create another PR for the deprecation process - will you help me with that?

@tchaikov
Copy link
Contributor

please ping me if this change is ready for another review.

@tchaikov thank you very much for your reviews, can we proceed with that fix and deprecate the rest module in another PR? Currently, the deprecation is not announced in changelog, I'll create another PR for the deprecation process - will you help me with that?

@NitzanMordhai Nitzan, sure. we can do this in parallel. but this PR also needs more work, AFAICT. as it does address the "unbounded requests queue" issue. what exactly do you want from my end?

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from 01344a6 to d8a58ec Compare February 21, 2024 10:02
@NitzanMordhai NitzanMordhai requested a review from a team as a code owner February 21, 2024 10:02
@NitzanMordhai
Copy link
Contributor Author

@NitzanMordhai Nitzan, sure. we can do this in parallel. but this PR also needs more work, AFAICT. as it does address the "unbounded requests queue" issue. what exactly do you want from my end?

@tchaikov i just updated the PR with your suggestions. can you please review it?

@NitzanMordhai NitzanMordhai changed the title mgr/rest: Trim results array and limit size mgr/rest: Trim requests array and limit size Feb 22, 2024
Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the latency. left a comment.

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from d8a58ec to 738f485 Compare March 12, 2024 13:19
@NitzanMordhai NitzanMordhai requested a review from tchaikov March 12, 2024 13:20
@NitzanMordhai
Copy link
Contributor Author

jenkins test api

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could improve the document a little bit.

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from 738f485 to d6e99c0 Compare March 18, 2024 08:54
@NitzanMordhai NitzanMordhai requested a review from tchaikov March 18, 2024 08:54
Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the document does not look right.

@tchaikov
Copy link
Contributor

@NitzanMordhai ping?

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from d6e99c0 to e12ba7d Compare March 31, 2024 06:12
@tchaikov
Copy link
Contributor

jenkins test make check

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 21, 2024
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from 208aed8 to 7ba09b0 Compare August 11, 2024 06:02
@NitzanMordhai
Copy link
Contributor Author

@yuriw can we add it to one of the tests?

@github-actions github-actions bot removed the stale label Aug 11, 2024
@rsacherer
Copy link

Hello, would be good to see movement on this as I see people using the restful plugin (most of them using it for zabbix).

Thank you.

default json format produces a rather massive output in large clusters and
isn't scalable. So we have removed the 'network_ping_times' section from
the output. Details in the tracker: https://tracker.ceph.com/issues/57460
* mgr/rest: The rest manager module will trim requests based on the 'max_requests' option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST

* mgr/rest: The rest manager module will trim requests based on the 'max_requests' option.
Without this feature, and in the absence of manual deletion of old requests,
the accumulation of requests in the array can lead to Out Of Memory (OOM) issues,
resulting in the termination of the manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resulting in the Manager crashing.

If the address it not configured, the *restful* will bind to ``::``,
which corresponds to all available IPv4 and IPv6 addresses.

Configuring the max_request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the//

---------------------------

The maximum request size can be configured via the configuration key
facility::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via a central configuration option::

@NitzanMordhai
Copy link
Contributor Author

@yuriw
Copy link
Contributor

yuriw commented Aug 19, 2024

jenkins test api

@yuriw
Copy link
Contributor

yuriw commented Aug 19, 2024

@NitzanMordhai pls merge when all checks passed
ref: https://tracker.ceph.com/issues/67523

Presently, the requests array in the REST module has the potential to grow
indefinitely, leading to excessive memory consumption, particularly when
dealing with lengthy and intricate request results.

To address this issue, a limit will be imposed on the requests array within
the REST module.
This limitation will be governed by the `mgr/restful/x/max_requests` configuration
parameter specific to the REST module.
when submit_request called we will check request array if exceed max_request option
if it does we will check if the future trimmed request finished and log error
message in case we are trimming un-finished requests.

Fixes: https://tracker.ceph.com/issues/59580
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
…max_request option

Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-restful-un-boundary-keep-requests branch from 7ba09b0 to dd7e8bb Compare August 20, 2024 10:35
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants