Skip to content

rgw/dedup: full object dedup#62179

Merged
yuvalif merged 2 commits intoceph:mainfrom
benhanokh:s3_full_object_dedup
May 26, 2025
Merged

rgw/dedup: full object dedup#62179
yuvalif merged 2 commits intoceph:mainfrom
benhanokh:s3_full_object_dedup

Conversation

@benhanokh
Copy link
Contributor

@benhanokh benhanokh commented Mar 9, 2025

Design Document:
https://docs.google.com/document/d/152VyCTR2NlZ6ongbe6-CJfP4qxr1_zH83FB_WukWD7c

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@benhanokh
Copy link
Contributor Author

jenkins test api

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test api

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test api

@yuvalif
Copy link
Contributor

yuvalif commented Mar 10, 2025

nit: please retab the code. mix of tabs and spaces makes it hard to review

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 2970b51 to 509376b Compare March 11, 2025 07:36
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please title your commits according to the conventions of this project. This is required for your PR to be merged.

See this article on GitHub on how to amend commits and update your pull request.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design doc should not be in google docs. Please move it to doc/dev.

Empty commit messages only including a commit title is also not acceptable.

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 0dd968a to b1c8d63 Compare March 12, 2025 08:36
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

1 similar comment
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test make check

@benhanokh
Copy link
Contributor Author

jenkins test api

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch 3 times, most recently from 56f7f7c to 7fbbdf1 Compare March 25, 2025 15:06
@benhanokh
Copy link
Contributor Author

jenkins test api

@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 4e61a08 to 8fc2fc2 Compare May 18, 2025 13:23
@yuvalif
Copy link
Contributor

yuvalif commented May 20, 2025

regression: https://pulpito.ceph.com/yuvalif-2025-05-19_09:57:15-rgw-wip_s3_full_object_dedup_merged-distro-default-smithi/
rerun still has 18 failures: https://pulpito.ceph.com/yuvalif-2025-05-19_16:10:52-rgw-wip_s3_full_object_dedup_merged-distro-default-smithi/
known issues:

not sure about: test_librgw_file.sh'

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 8fc2fc2 to 560a080 Compare May 20, 2025 08:29
@benhanokh
Copy link
Contributor Author

jenkins test make check arm64

@benhanokh
Copy link
Contributor Author

jenkins test windows

@yuvalif
Copy link
Contributor

yuvalif commented May 20, 2025

regression: https://pulpito.ceph.com/yuvalif-2025-05-19_09:57:15-rgw-wip_s3_full_object_dedup_merged-distro-default-smithi/ rerun still has 18 failures: https://pulpito.ceph.com/yuvalif-2025-05-19_16:10:52-rgw-wip_s3_full_object_dedup_merged-distro-default-smithi/ known issues:

not sure about: test_librgw_file.sh'

librgw is failing due to a crash on shutdown. can reproduce locally:

[ RUN      ] LibRGW.SHUTDOWN
2025-05-20T14:18:25.946+0000 7ffff2f4bb40 -1 shutting down
[Thread 0x7ffec8ee66c0 (LWP 3365973) exited]
[Thread 0x7ffff09dd6c0 (LWP 3365925) exited]
[Thread 0x7ffec7ee46c0 (LWP 3366659) exited]

Thread 603 "rgw_frontend" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7ffec9ee86c0 (LWP 3365971)]
0x00007ffff38ba08e in __futex_abstimed_wait_common () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install cyrus-sasl-lib-2.1.27-21.el9.x86_64 expat-2.5.0-4.el9.x86_64 glibc-2.39-17.el10.x86_64 gperftools-libs-2.9.1-3.el9.x86_64 keyutils-libs-1.6.3-1.el9.x86_64 krb5-libs-1.21.1-6.el9.x86_64
 libblkid-2.37.4-21.el9.x86_64 libbrotli-1.0.9-7.el9.x86_64 libcap-2.48-9.el9.x86_64 libcom_err-1.46.5-7.el9.x86_64 libcurl-7.76.1-31.el9.x86_64 libevent-2.1.12-8.el9.x86_64 libgcc-14.1.1-5.el10.x86_64 libidn2-2.3.0-7.el9.x86_64 libnghttp2-
1.43.0-6.el9.x86_64 liboath-2.6.12-1.el9.x86_64 libpsl-0.21.1-5.el9.x86_64 librabbitmq-0.11.0-7.el9.x86_64 librdkafka-1.6.1-102.el9.x86_64 libssh-0.10.4-13.el9.x86_64 libstdc++-14.1.1-5.el10.x86_64 libunistring-0.9.10-15.el9.x86_64 libunwin
d-1.6.2-1.el9.x86_64 libxcrypt-4.4.18-3.el9.x86_64 lmdb-libs-0.9.29-3.el9.x86_64 lua-libs-5.4.4-4.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 openldap-2.6.8-4.el9.x86_64 openssl-libs-3.2.2-6.el9.x86_64 systemd-libs-252-51.el9.x86_64 thrift-0.15.
0-4.el9.x86_64 zlib-1.2.11-41.el9.x86_64
(gdb) bt
#0  0x00007ffff38ba08e in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007ffff38bd02c in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007ffff61879bf in std::__condvar::wait_until (this=0x555556f679b8, __m=..., __clock=1, __abs_time=...) at /usr/include/c++/14/bits/std_mutex.h:137
#3  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., this=0x555556f679b8, __atime=...) at /usr/include/c++/14/condition_variable:203
#4  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x555556f679b8, __lock=..., __atime=...) at /usr/include/c++/14/condition_variable:113
#5  std::condition_variable::wait_for<long, std::ratio<1l, 1l> > (__rtime=..., this=0x555556f679b8, __lock=...) at /usr/include/c++/14/condition_variable:165
#6  rgw::RGWLibProcess::run (this=<optimized out>) at /home/yuvalif/ceph5/src/rgw/rgw_lib.cc:109
#7  0x00007ffff60aa39e in RGWProcessControlThread::entry (this=<optimized out>) at /home/yuvalif/ceph5/src/rgw/rgw_process.h:121
#8  0x00007ffff38bd812 in start_thread () from /lib64/libc.so.6
#9  0x00007ffff392d27c in clone3 () from /lib64/libc.so.6

@cbodley
Copy link
Contributor

cbodley commented May 20, 2025

objecter valgrind issue: no tracker

@yuvalif seeing that in rgw/notifications jobs for other runs too, so opened https://tracker.ceph.com/issues/71390. seems something in the notification path is preventing clean shutdown. could you take a look?

@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 5649610 to d10bf97 Compare May 21, 2025 06:48
@anthonyeleven
Copy link
Contributor

Please correct the spelling in the PR title.

@benhanokh benhanokh changed the title rgw/dedup: full object dedupd rgw/dedup: full object dedup May 21, 2025
@yuvalif
Copy link
Contributor

yuvalif commented May 22, 2025

teuthology: https://pulpito.ceph.com/yuvalif-2025-05-21_09:48:40-rgw-wip_s3_full_object_dedup_shutdown-distro-default-smithi/
rerun of failed tests: https://pulpito.ceph.com/yuvalif-2025-05-21_13:42:30-rgw-wip_s3_full_object_dedup_shutdown-distro-default-smithi/

@yuvalif yuvalif force-pushed the s3_full_object_dedup branch 2 times, most recently from 8802cca to 76fabe6 Compare May 22, 2025 09:04
@yuvalif
Copy link
Contributor

yuvalif commented May 22, 2025

@yuvalif
Copy link
Contributor

yuvalif commented May 22, 2025

jenkins test make check

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Design Document:
https://docs.google.com/document/d/152VyCTR2NlZ6ongbe6-CJfP4qxr1_zH83FB_WukWD7c

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
@benhanokh benhanokh force-pushed the s3_full_object_dedup branch from 76fabe6 to 081bda7 Compare May 22, 2025 15:01
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
@yuvalif yuvalif force-pushed the s3_full_object_dedup branch from 58a5c47 to 895121f Compare May 23, 2025 13:26
@yuvalif yuvalif merged commit b1005d3 into ceph:main May 26, 2025
12 checks passed
#include <string>

namespace rgw::dedup {
static constexpr const char* DEDUP_POOL_NAME = "rgw_dedup_pool";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pool name needs to be include the zone id. you can put multiple zones on the same cluster, and their pools should not conflict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhanokh append somethin like:

static_cast<rgw::sal::RadosStore*>(driver)->svc()->zone->zone_id();

to the pool name.

  • need to make sure that the zone id has the zone-group and realm in it, o/w it may not be unique in the cluster.
  • note that the zone id may change after a reload, so the reload should probably terminate the dedup process if we reload. also need to do that cleanup in this case

Comment on lines +2286 to +2287
d_runner = std::thread(&Background::run, this);
const auto rc = ceph_pthread_setname("dedup_bg");
Copy link
Contributor

@cbodley cbodley Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is calling ceph_pthread_setname() from the main thread, not the background thread

the consequence is that it changes the process name from radosgw to dedup_bg, meaning that killall radosgw no longer works

edit: tracking in https://tracker.ceph.com/issues/71543

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it in my other PR - #63560

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benhanokh can you please fix it in a small dedicated PR?
(this is causing lots of inconviences in the dev process)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants