crimson: add initial scrub support#54214
Conversation
f52a36b to
99be158
Compare
99be158 to
237002f
Compare
src/crimson/osd/pg_backend.cc
Outdated
| } | ||
|
|
||
| delta_stats.num_objects--; | ||
|
|
There was a problem hiding this comment.
If the object should be marked as whiteout (whiteout == true), num_objects won't be decremented because of the return beforehand. IIUC, we shouldn't decrement num_objects on whiteout because of the created object.
There was a problem hiding this comment.
Ugh, I misread some code in classic and concluded that whiteouts don't count in num_objects. I'll fix this and the corresponding code in scrub.
src/crimson/osd/pg_backend.cc
Outdated
| delta_stats.num_bytes -= os.oi.size; | ||
|
|
||
| // todo: update watchers | ||
| if (os.oi.is_whiteout()) { |
There was a problem hiding this comment.
Can you explain why the previous check location was wrong?
It differs from PrimaryLogPG::_delete_oid
There was a problem hiding this comment.
Oh, I missed the if (whiteout && os.oi.is_whiteout()) { above -- this change doesn't actually have any effect. I'll remove it.
|
make check seems to be failing because of unused captures. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
8dfb958 to
a180eaf
Compare
|
Will fix the lambda captures tomorrow. |
a180eaf to
edfb060
Compare
edfb060 to
1928ce3
Compare
Solves an FTBFS when using with_interruption from pg_commands.cc:
In file included from ../src/crimson/admin/pg_commands.cc:14:
../src/crimson/common/interruptible_future.h:135:46: error: redefinition of ‘bool __tls_guard’
135 | thread_local interrupt_cond_t<InterruptCond> interrupt_cond;
| ^~~~~~~~~~~~~~
In file included from ../src/common/TrackedOp.h:18,
from ../src/os/ObjectStore.h:25,
from ../src/osd/PGLog.h:23,
from ../src/osd/PeeringState.h:19,
from ../src/crimson/osd/shard_services.h:17,
from ../src/crimson/osd/pg_shard_manager.h:10,
from ../src/crimson/osd/osd.h:23,
from ../src/crimson/admin/pg_commands.cc:16:
../src/common/StackStringStream.h:188:36: note: ‘bool __tls_guard’ previously declared here
188 | inline static thread_local Cache cache;
This matches what we already do with
interrupt_cond_t<crimson::os::seastore::TransactionConflictCondition>.
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
- Switch to DEBUGDPPU to add pg prefix to each line - Pass this_instance_id through for uniformity - Clarify log lines particularly entering/leaving stages Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
It's common during cluster setup for there to be periods with degraded/recovering PGs. Ignore those errors. Signed-off-by: Samuel Just <sjust@redhat.com>
…_snaps for crimson Signed-off-by: Samuel Just <sjust@redhat.com>
Scrub wants to be able to pass numeric_limits::max() as limit since we've already selected bounds based on the number of objects. Signed-off-by: Samuel Just <sjust@redhat.com>
seek_to_first() if start is nullopt, upper_bound() otherwise. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
This is the more modern variant. Crimson doesn't currently support the pg <pgid> deep_scrub variant, so let's just use this one generally. Signed-off-by: Samuel Just <sjust@redhat.com>
Scrub doesn't want to split snapshot sets across chunks. The range, however, is exclusive on the right. get_max_object_boundary() gives us a way to always include head and all snapshots in the same chunk as it will sort greater than any legal object with the same head. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
fmt_print_ctx avoids creating a string. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
…as mutable, don't move action The lambda passed to seastar::repeat will be invoked multiple times, don't pass an rvalue ref to call_with_interruption. Also, declare lambda as mutable in case action needs to be mutable. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
…ruct There will ultimately be many Operation types that need to use the PGPeeringPipeline, let's not enumerate them as friends. Signed-off-by: Samuel Just <sjust@redhat.com>
We don't want scrub and snaptrimming running at the same time, use the same lock. Signed-off-by: Samuel Just <sjust@redhat.com>
…ests scrub_validator.h/cc contain the logic from translating a set of ScrubMaps into per-chunk stats and errors. scrub_machine.h/cc contain the logic for the overall scrub lifecycle and reservation management. ScrubContext (scrub_machine.h) is the interface through which the above logic performs reservations and scans. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Crimson routes the scrub request directly to the PGScrubber rather than going through PeeringState. Signed-off-by: Samuel Just <sjust@redhat.com>
1ac65d8 to
12e5f22
Compare
|
My rebase above broke something https://pulpito.ceph.com/sjust-2023-12-11_07:14:30-crimson-rados-wip-sjust-crimson-pr-testing-2023-12-10-distro-default-smithi/ (ceph/main test looks fine, it's an issue with this branch https://pulpito.ceph.com/sjust-2023-12-11_20:41:06-crimson-rados-main-distro-default-smithi/) Investigating. |
|
Failure above now fixed in main, run looks good https://pulpito.ceph.com/sjust-2024-01-03_01:34:31-crimson-rados-wip-sjust-crimson-scrub-2024-01-02-distro-default-smithi/ |
|
jenkins test make check |
|
@athanatos I see a new teuthology failure that looks related to this: https://tracker.ceph.com/issues/63967 |
This PR adds initial support for scrubs initiated via the ceph tell (scrub|deep_scrub) commands. Scheduling, preemption, repair, other features will be future work, but this suffices at least to scrub after teuthology jobs.
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows