mds: batch backtrace updates by pool-id when expiring a log segment#55421
mds: batch backtrace updates by pool-id when expiring a log segment#55421
Conversation
@vshankar I think the |
batrick
left a comment
There was a problem hiding this comment.
It's not clear to me from reading the patch how "thereby causing the MDS
to go read-only when the error (-ENOENT) is trickeled up for a backtrace
update for the metadata pool or a undeleted data pool." is avoided by this change. Could you explain in a comment?
Sure. Will update the change explaining the fix. |
|
Dropped from testing till #55421 (comment) gets fixed. |
|
jenkins test api |
batrick
left a comment
There was a problem hiding this comment.
I think a test should be somewhat trivial to sythesize, no?
- create dir with
ceph.dir.layout.pool == some-new-pool - create empty file in that dir
- flush mds journal
- set
ceph.file.layout.pool == some-new-pool2on the file - restore default layout for dir
- delete
some-new-pool - flush the mds log (would fail before but now does not)
? I don't think a failover is required?
You can use ceph-dencoder of course to verify the backtrace updates.
src/mds/journal.cc
Outdated
| // dispatch separate ops for backtrace updates for old pools | ||
| in->store_backtrace(ops_vec_map[pool_id].back(), op_prio, true); | ||
| for (auto p : in->get_inode()->old_pools) { | ||
| in->store_backtrace(ops_vec_map[p].back(), op_prio, true); |
There was a problem hiding this comment.
| in->store_backtrace(ops_vec_map[p].back(), op_prio, true); | |
| ops_vec_map[p].push_back(CInodeCommitOperations()); | |
| in->store_backtrace(ops_vec_map[p].back(), op_prio, true); |
?
There was a problem hiding this comment.
I see what you mean to suggest. It's done for the current pool - can be the same for old_pools.
* refs/pull/55421/head: mds: batch backtrace updates by pool-id when expiring a log segment Reviewed-by: Xiubo Li <xiubli@redhat.com>
Make sense. I didn't bother adding a test since this issue was pretty much getting reproduced on fs suite test branches. I'll add a test and update.
Correct.
|
* refs/pull/55421/head: mds: batch backtrace updates by pool-id when expiring a log segment Reviewed-by: Xiubo Li <xiubli@redhat.com>
|
This change is ready for re-review. I change the implementation a bit since the old implementation was buggy - backtrace updated to old pools were not dispatched. So, now, journal.cc prepares a separate CInodeCommitOperations() instance of each old pool for an inode and uses the backtrace generated for the default data pool. Also, I added a check for STATE_DIRTYPOOL (as done in CInode::_store_backtrace) since layout and backtraces to old pools should not be updated under certain circumstances (parent changing, etc..). Note: The idea of dispatching per-pool backtrace updates still remains the same. |
|
@lxbsz fixed and updated. |
|
jenkins test make check |
|
jenkins test make check arm64 |
batrick
left a comment
There was a problem hiding this comment.
base for this branch is weird, it's not on main?
otherwise lgtm
It should be. I fixed and refreshed the change. |
|
This PR is under test in https://tracker.ceph.com/issues/66521. |
|
This change seems to be a bit buggy and causing failures like https://pulpito.ceph.com/vshankar-2024-06-30_16:43:29-fs-wip-vshankar-testing-20240628.170835-debug-testing-default-smithi/7779994/ |
I take this back. The issue is in another change. See: #54725 (comment) (back into test branch) |
|
I see one failed test job with similar effect - mds going read-only. Deferring merge till its investigated. /a/vshankar-2024-07-08_07:21:13-fs-wip-vshankar-testing-20240705.150505-debug-testing-default-smithi/)/7791798 |
Some osd_op's are returning -2 (ENOENT) for commit operations sent by the MDS. |
This looks like a new bug and unrelated to this change. I don't see any backtrace related error during segment expiry. |
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
LogSegment::try_to_expire() batches backtrace updations for inodes in dirty_parent_inodes list. If a backtrace update operations fails for one inode due to missing (removed) data pool, which is specially handled by treating the operation as a success, however, the errno (-ENOENT) is stored by the gather context and passed on as the return value to subsequent operations (even for successful backtrace update operations in the same gather context). Fixes: http://tracker.ceph.com/issues/63259 Signed-off-by: Venky Shankar <vshankar@redhat.com>
…a pool Signed-off-by: Venky Shankar <vshankar@redhat.com>
|
This PR is under test in https://tracker.ceph.com/issues/67711. |
Otherwise, a backtrace update failure due to a removed data pool would cause the entire batch to be considered as a failed backtrace update (depending on when the first failure happens), thereby causing the MDS to go read-only when the error (-ENOENT) is trickeled up for a backtrace update for the metadata pool or a undeleted data pool.
Fixes: http://tracker.ceph.com/issues/63259
NOTE - this hasn't been tested yet - will get to that in a while.
@lxbsz - this change does away with the vector resize bits, I think we can bring that back in. Question: was that (vector resize) done since you expected the MDS to spend much time in that (holding the mds_lock) so preallocating space would lower the time spend?
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e