Conversation
430d867 to
57df0f1
Compare
batrick
left a comment
There was a problem hiding this comment.
My main concern with this PR is that it doesn't actually address:
"These really ought to be ref-counted in some way to prevent early expiry."
from https://tracker.ceph.com/issues/4744
It's not well laid out what the concern is but I suspect it's something like to prevent tickets like:
https://tracker.ceph.com/issues/59119
You could potentially resolve that ticket with this PR by checking that the LogSegment reference count is 1 before trimming* the segment.
- I believe Greg actually meant "early trimming" as that's the real concern: that the log segment will be removed from the physical journal.
| class MDSRank; | ||
| class LogSegment; | ||
| class EMetaBlob; | ||
| using AutoSharedLogSegment = auto_shared_ptr<LogSegment>; |
There was a problem hiding this comment.
So I'm mystified by the utility of this class. Why couldn't it be:
using foo = std::shared_ptr<LogSegment>
?
There was a problem hiding this comment.
As I mentioned in the commit message, the utility of this class is to automatically convert a pointer to an object that inherits from enable_shared_from_this into a shared pointer and also adds an implicit conversion from the shared pointer back to a plain pointer. This helps minimize refactoring.
I'll elaborate on how I believe this helps address the original ticket in a separate comment.
There was a problem hiding this comment.
Normally we use boost::intrusive_ptr if other accessors have a raw pointer. That's somewhat inappropriate in a code base we fully control, the mds.
Your autoshared class is interesting though. I'll have to think about this more.
|
|
||
| [[nodiscard]] static AutoSharedLogSegment create(uint64_t _seq, loff_t off = -1) | ||
| { | ||
| return std::shared_ptr<LogSegment>(new LogSegment(_seq, off)); |
There was a problem hiding this comment.
There was a problem hiding this comment.
make_shared is inaccessible here because the constructor is private. See class Best() in the example from the documentation on shared_ptr
Additionally, make_shared comes at the cost of pinning the memory of the whole object when only weak pointers are left since the same memory block is used to hold the shared pointer control and the class data. This is mentioned in the second note on the documentation page you've referenced.
To clarify, I'm not sure this will be a blocker for us, and maybe we could even benefit from it, but I think it's an important point to keep in mind
There was a problem hiding this comment.
make_sharedis inaccessible here because the constructor is private. See classBest()in the example from the documentation on shared_ptr
ok
Additionally,
make_sharedcomes at the cost of pinning the memory of the whole object when only weak pointers are left since the same memory block is used to hold the shared pointer control and the class data. This is mentioned in the second note on the documentation page you've referenced. To clarify, I'm not sure this will be a blocker for us, and maybe we could even benefit from it, but I think it's an important point to keep in mind
I don't really think that will be significant but I also wonder if a clever implementation is permitted to realloc the memory block when T is destroyed. What an odd caveat I had the opposite understanding about.
@batrick I do think that this PR addresses the original ticket. It does so by changing all containers to hold a shared pointer to the LogSegment instead of a plain pointer. It wouldn't be sufficient to only update the storage semantics, we also needed to correctly create shared pointers to the same original object. To achieve this I decided to utilize the To further reduce the scope of refactoring I've introduced a helper class
This point needs discussion, as I don't think this requirement is still applicable - though I'll be happy to learn if I'm wrong about this. At the time the original ticket was created, mds trimming logic was different - and simpler. Back then trimming was done synchronously, but now trimming is staged through I haven't yet fully understood the logic, but it looks like there's already an asynchronous process that only trims segments that were reported as unused. |
You've refcounted Lines 842 to 851 in ce9771b to not trim (i.e. delete) a segment unless its refcount is 1. |
Well, I don't need to do anything for the log segment to be deleted only when the last strong reference to it is gone, that's the primary function of a shared pointer. In other words, if the delete operation is what we'd like to delay - that's already the case with the current implementation. |
57df0f1 to
f50344f
Compare
f50344f to
03faff6
Compare
03faff6 to
47028d4
Compare
|
@batrick @vshankar I ended up changing quite a bit. Unfortunately, there is no unittest infrastructure for MDLog, but it's worth investing in it. Most if not all of my changes can be relatively quickly verified with unit tests, and MDLog does not have that many external dependencies compared to other core classes. I've changed this PR to a draft so that I can upload the current version which compiles, and have you go over it and start giving overall comments. If we want to proceed with this, I will have another commit where I make MDLog constructible in a unit test environment and start adding unit tests to verify the invariants that I have most probably violated For details about the changes please see the commit message |
47028d4 to
dc9d658
Compare
|
Specific questions to look at here:
|
src/mds/MDLog.cc
Outdated
| logger->set(l_mdl_wrpos, ls->end); | ||
| // acquire the mds lock until the end of the scope | ||
| // to update the LS offsets and major offsets | ||
| std::lock_guard l(mds->mds_lock); |
There was a problem hiding this comment.
I still need to go through these changes in more detail but this will not fly. The mds_lock has too much thread contention already. Also, the submit thread is intended to operate completely outside the mds_lock. I do not think we should ever change that.
I believe there may be another way to protect the LogSegment members behind the mds_lock: add the LogSegment to C_MDL_Flushed and update the segment's end/offset in the completion (::finish).
There was a problem hiding this comment.
C_MDL_Flushed will take mds_lock, so not much difference in terms of contention. But I agree that we could benefit from performing the update on the finisher thread instead of the submit thread.
There was a problem hiding this comment.
As you said, the journaler thread already regularly acquires the mds_lock so it's advantageous to use it for this purpose instead of adding a new thread to the contention.
There was a problem hiding this comment.
Please see a1c9d1a, I removed the need for locking.
allowing move construction and assignment. Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
This will allow generic code easily identify whether a given MDSContext instance will take the mds lock or it expects to be called with the lock taken. Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
damaged_unlocked is another method in the same class that has the same semantics but a better name, so now hande_write_error_unlocked will take the mds lock before executing handle_write_error. Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
* Make sure that any structure referring to a LogSegment is using a shared pointer to do that * Limit the API changes by enhancing the LogSegment class with an `enable_shared_from_this` base - Make LogSegment::LogSegment private and expose a `create` method to correctly initialize the base class - Introduce AutoSharedLogSegment helper class that recovers a shared pointer from a LogSegment* Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/4744
The main goal of this change is to keep expired segments as weak references,
allowing to spot "zombie" segments which should've been deleted by the time
we decide to trim them.
Since it's not a fatal error to have a dangling log segment given that it was
properly expired, the system will ignore zombies unless a dedicated new
debug flag is enabled:
- name: mds_debug_zombie_log_segments
type: bool
level: advanced
default: false
services:
- mds
fmt_desc: MDS will abort if an expired log segment is still
strongly referenced (for developers only).
To achieve this it was necessary to change how segments are kept in the log.
Specifically, it was necessary to break up the `segments` map into two:
unexpired_segments and expired_segments, where the latter holds structs of
ExpiredSegmentInfo with a weak ref to the original log segment,
as well as some importany by-value copies for accounting and syncronization.
Since the change already required significant refactoring, it was used to deliver
a number of additional improvements:
* expiring segments don't need a separate container, they are now kept in `unexpired_segments`
until they complete expiration. The expiration boundary is kept in a separate
variable `last_expiring_segment_seq` which is sufficient.
* submit_mutex is now used exclusively to protect the `pending_events` when sending them
to the submit thread. All other internal state is already protected by the mds_lock
* expiration waiters are now added as a single context associated with a segment sequence,
which will be finished when no more unexpired segments exists with the sequence or below.
* expiration and trimming logic were optimized to make as much use of the O(log N) lookup
in the std containers as possible.
* methods and fields were renamed to reduce ambiguity
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
a26b130 to
5dfe9b5
Compare
|
https://pulpito.ceph.com/leonidus-2024-06-25_18:06:42-fs-wip-lusov-mdlog-distro-default-smithi/ |
There are 2 more failures. xfstests-dev, which I am re-running now to see if this is easily reproduced, and a crash in the trimming. That one I'll address today |
Correction:
And there are xfs failures here: |
|
I'm running 10 more instances of the xfs tests here: https://pulpito.ceph.com/leonidus-2024-06-26_05:12:19-fs-wip-lusov-mdlog-distro-default-smithi/ |
|
@vshankar known issues at this point:
I haven't scheduled a full fs suite on this change because it is not even started with priority 100. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
enable_shared_from_thisbasecreatemethod to correctly initialize the base classFixes: https://tracker.ceph.com/issues/4744
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows