mds: add minor segment boundaries#48732
Conversation
|
@vshankar @gregsfortytwo here's a draft prototype for avoiding ESubtreeMap for every LogSegment. Still some polishing / testing / documentation to do. |
5af8ce9 to
30b130f
Compare
30b130f to
b154cfb
Compare
b154cfb to
c6b74a3
Compare
|
change+rebase: fixed a debug message |
| static void generate_test_instances(std::list<EMetaBlob*>& ls); | ||
| // soft stateadd | ||
| uint64_t last_subtree_map; | ||
| uint64_t event_seq; |
There was a problem hiding this comment.
This bit looks like a nice cleanup (+ enhancement). I wonder why EMetaBlob tracked event_seq in its own structure.
There was a problem hiding this comment.
It was a hack to see if a dir inode was already added to the EMetaBlob.
|
LGTM |
c6b74a3 to
4bf786b
Compare
|
rebase plus minor changes:
|
|
Added tracker ticket forr potential backport: https://tracker.ceph.com/issues/58154 |
4bf786b to
0bd63cd
Compare
Sorry - wrong branch/link. This is still under test. |
92f0fe0 to
28d1a99
Compare
|
@vshankar see Will work on adding https://tracker.ceph.com/issues/58550 to this PR next. |
28d1a99 to
5fee39d
Compare
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The major problem here is that the MDLog::_start_entry method puts the current event sequence number in the EMetaBlob of the event (if present). Because of this, no other event can be submitted as this would invalidate the event sequence. Instead, fixup the event sequence during submission and simplify related logic that uses it during EMetaBlob construction. Secondarily, for the purposes of this commit series, _start_entry introduced recursive locks when generating the ESubtreeMap within MDLog::_segment_upkeep. So, this commit is a necessary cleanup. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This commit adds a new ESegment event type which can delineate LogSegments. This event can be used as an alternative to the heavy weight ESubtreeMap which can be very expensive to generate when the MDS has a large subtree map. Fixes: https://tracker.ceph.com/issues/58154 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Fixes: https://tracker.ceph.com/issues/58550 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When the ESubtreeMap is very large (~5k+ subtrees), the MDS will end up logging only a few events (as bad as 1) per segment as the subtree map dominates the segment size. This test simply creates an artificially large subtree and confirms that other file system activity completes in a timely manner. This is now taking advantage of the minor segments which allows for a normal set of events per log segment (and fewer subtree maps). The test fails on the current main HEAD. Historical note: when I first observed this abberant behavior, the vstart cluster was actually using mds_debug_subtrees = True (the default for every vstart cluster). This caused the MDS to write out the subtree map (for debugging reasons) with every event. When testing the MDS with large subtrees (distributed ephemeral pinning), this caused the MDS to slow to a trickle of operations per second. Despite this unintentional misconfiguration, the problem still exists but the number of auth subtrees must be large for a particlar rank to replicate the behavior. On main HEAD, the creation of 10k files (workload stage) takes ~110 seconds. On this branch, it takes ~30 seconds. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When the MDS journal is wiped, EResetJournal is a major segment boundary as it necessarily begins the journal. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Prior to this set of commits, the MDS would write the ESubtreeMap to the journal, trim everything up to that segment, then finally force the trimming of that last segment (`MDLog::trim(0)`). This is awkward in the new code which preserves a major segment boundary at the beginning of the journal during trimming. Instead of writing a special case for this situation, it seems more natural to just use a new "lid" or "cap" event to mark the beginning of the journal when no subtree map can yet be written but we need sequence numbers to tie in other MDS tables. Like ESegment, ELid doesn't actually contain any state. It's just a marker for the beginning the log after rank deactivation or rank creation. It can appear in the middle of the log if the shutdown sequence is interrupted while writing the event but the MDS will skip it during replay in that case. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
To prevent old MDS from joining a file system with the new log events. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
For killpoint testing. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This change causes the program to exit gracefully when stdin is closed rather than with a Python exception. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This relies on the new stdin-killer [1] teuthology helper that allows interacting with the command's stdin. [1] ceph/teuthology#1846 Fixes: 8bb77ed Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This makes sourcing this for e.g. vstart_runner.py actually useful. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
There's no technical reason to disallow this. The original intent was to avoid deadlocks but this possibility is already present when interacting with a teuthology RemoteProcess. Avoiding it only for local processes does not make sense. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
With [1], these tools are now installed in the teuthology virtualenv. Update the path in the command arguments so these tools can be run via sudo. [1] ceph/teuthology#1846 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
These tools are now available in the $PATH so it's no longer necessary to remove them. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Now that the teuthology tools can be run in vstart_runner, there's no reason to override this method. Importantly, this enables the use of the new stdin-killer tool [1]. [1] ceph/teuthology#1846 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
So stdin-killer and other utilities are installed in the bin directory. vstart_runner.py now relies on their presence. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
Rebased to fix conflict created by merging #51995. |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
jenkins test dashboard cephadm |
|
@vshankar tests pass now |
Nice. Merging this soon. |
https://tracker.ceph.com/issues/58154
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows