Skip to content

rgw/dedup: add throttling mechanism#64730

Merged
yuvalif merged 1 commit intoceph:mainfrom
benhanokh:dedup_throttle
Oct 19, 2025
Merged

rgw/dedup: add throttling mechanism#64730
yuvalif merged 1 commit intoceph:mainfrom
benhanokh:dedup_throttle

Conversation

@benhanokh
Copy link
Contributor

@benhanokh benhanokh commented Jul 29, 2025

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@benhanokh benhanokh self-assigned this Jul 29, 2025
@benhanokh benhanokh requested a review from a team as a code owner July 29, 2025 07:20
@benhanokh benhanokh marked this pull request as draft July 29, 2025 07:21
@github-actions github-actions bot added the rgw label Jul 29, 2025
@github-actions github-actions bot added the tests label Aug 6, 2025
@benhanokh benhanokh requested a review from yuvalif August 6, 2025 12:42
@benhanokh
Copy link
Contributor Author

jenkins test make check

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test make check

@benhanokh
Copy link
Contributor Author

jenkins test make check

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test api

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test make check

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test make check

@benhanokh benhanokh added the needs-tentacle-backport PRs that need tentacle back-port label Aug 17, 2025
@benhanokh
Copy link
Contributor Author

jenkins test make check

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test make check

@benhanokh
Copy link
Contributor Author

@benhanokh
Copy link
Contributor Author

53 tests passed and 11 failed:

3 failures were caused by failing to load linux code:
'sudo DEBIAN_FRONTEND=noninteractive apt-get -y install linux-image-generic'

5 more errors reported by valgrind on Objector (which is not affected by the PR)
valgrind error: Leak_PossiblyLost operator new(unsigned long) Objecter::start_tick() Objecter::start(OSDMap const*)

2 errors for rgw multisite on multi-object delete, but dedup code was inactive so should not be related
ERROR: rgw_multi.tests.test_multi_object_delete

1 error coming from test_delete_objects_version_if_match_size() failing to delete objects by version.
Unrelated to my code which doesn't touch version code and which was not activate on this system

restarted the failing test: https://pulpito.ceph.com/benhanokh-2025-09-07_09:40:40-rgw-wip_dedup_throttle_V1-distro-default-smithi/

@benhanokh
Copy link
Contributor Author

benhanokh commented Sep 7, 2025

Rerun solved the load-linux issues (including in rgw-dedup test) and the multi-site issue.
We still have 5 failures from valgrind error: Leak_PossiblyLost operator new(unsigned long) Objecter::start_tick() Objecter::start(OSDMap const*).
This is unrelated to my PR

@yuvalif
Copy link
Contributor

yuvalif commented Sep 8, 2025

Rerun solved the load-linux issues (including in rgw-dedup test) and the multi-site issue. We still have 5 failures from valgrind error: Leak_PossiblyLost operator new(unsigned long) Objecter::start_tick() Objecter::start(OSDMap const*). This is unrelated to my PR

all failures are known issue

Aborts an active dedup session and release all resources used by it.
- ``radosgw-admin dedup estimate``:
Starts a new dedup estimate session (aborting first existing session if exists).
- ``radosgw-admin dedup throttle --max-bucket-index_ops=<count>``:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``radosgw-admin dedup throttle --max-bucket-index_ops=<count>``:
- ``radosgw-admin dedup throttle --max-bucket-index-ops=<count>``:

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@benhanokh
Copy link
Contributor Author

@cbodley
Copy link
Contributor

cbodley commented Oct 2, 2025

https://pulpito.ceph.com/benhanokh-2025-09-30_11:36:26-rgw-dedup_throttle-distro-default-smithi/

lots of s3test failures because we've merged new test cases since then. can you please rebase/rerun?

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

2 similar comments
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh benhanokh requested a review from Copilot October 5, 2025 09:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a throttling mechanism to the RGW deduplication system to control the rate of operations during dedup execution. The throttling system allows administrators to limit bucket-index operations and metadata access operations per second to manage resource consumption.

  • Add new throttling classes and commands to control dedup operation rates
  • Implement throttling for bucket-index and metadata operations with configurable limits
  • Update command interface from dedup restart to dedup exec with backward compatibility

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/test/rgw/dedup/test_dedup.py Updates test commands and adds throttling configuration with variable naming corrections
src/test/cli/radosgw-admin/help.t Updates help text to reflect new dedup exec command and throttle options
src/rgw/radosgw-admin/radosgw-admin.cc Adds throttle command parsing and implements new dedup exec command interface
src/rgw/driver/rados/rgw_dedup_utils.h Defines Throttle class and throttling-related data structures
src/rgw/driver/rados/rgw_dedup_utils.cc Implements throttling logic and message encoding/decoding
src/rgw/driver/rados/rgw_dedup_store.cc Adds failure tracking for slab write operations
src/rgw/driver/rados/rgw_dedup_cluster.h Adds new function signature for throttle control with buffer list
src/rgw/driver/rados/rgw_dedup_cluster.cc Implements throttle control logic and reporting functionality
src/rgw/driver/rados/rgw_dedup.h Integrates throttle objects into control structure
src/rgw/driver/rados/rgw_dedup.cc Applies throttling to metadata and bucket index operations throughout dedup process
doc/radosgw/s3_objects_dedup.rst Documents new throttling commands and updated dedup exec interface
Comments suppressed due to low confidence (1)

src/rgw/driver/rados/rgw_dedup_utils.cc:1

  • Corrected spelling of 'disbaled' to 'disabled'.
// -*- mode:C++; tab-width:8; c-basic-offset:2;

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}
//ldpp_dout(dpp, 0) << __func__ << "::sleep for 2 seconds\n" << dendl;
//std::this_thread::sleep_for(std::chrono::seconds(2));
//std::this_thread::sleep_forstd::chrono::microseconds(usec_timeout);
Copy link

Copilot AI Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a missing space in the commented code. Should be 'sleep_for std::chrono::microseconds'.

Suggested change
//std::this_thread::sleep_forstd::chrono::microseconds(usec_timeout);
//std::this_thread::sleep_for(std::chrono::microseconds(usec_timeout));

Copilot uses AI. Check for mistakes.
@anrao19
Copy link
Contributor

anrao19 commented Oct 8, 2025

jenkins test make check arm64

@anrao19
Copy link
Contributor

anrao19 commented Oct 8, 2025

pr testing completed and approved by @ivancich. Please find the details : https://tracker.ceph.com/issues/73370
@benhanokh , if there is not further testing needed this can be merged. Please let me know

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

rgw/dedup: Change throttle code to work lock free and remove the atomic
from the timestamp

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

@benhanokh
Copy link
Contributor Author

8553077, 8553079:
s3:get_bucket_location init_permissions on :datacachebucket[]) failed, ret=-2002

8553067, 8553087, 8553092, 8553096,8553106, 8553116, 8553118, 8553120, 8553122
valgrind issue in ceph-client

8553082, 8553094,
ulimits issue

8553066,
rgw main: Lua ERROR: failed to find luarocks

@benhanokh
Copy link
Contributor Author

I don't see how can this PR cause any issue as it doesn't affect RGW code and dedup is only activated from command line
Can we please merge the PR?

@yuvalif yuvalif merged commit 04280e1 into ceph:main Oct 19, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants