osd/OSD: rewrite track_pools_and_pg_num_changes logic#54988
Conversation
9004e95 to
a320579
Compare
|
I think it would be helpful to add a comment to pg_num_history listing how it's used. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
a320579 to
04380c6
Compare
04380c6 to
e36a60f
Compare
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
e36a60f to
d0baf4f
Compare
athanatos
left a comment
There was a problem hiding this comment.
I'm not sure it's actually ok for pg_num_history to have inaccurate epochs associated with the pg creation, removal, and pg num change events. A comment explaining why this is safe would help.
Refactor, no change in behavior Signed-off-by: Matan Breizman <mbreizma@redhat.com>
f033797 to
ddb1657
Compare
ddb1657 to
8ebf51e
Compare
|
@athanatos, probably worth mentioning that I suspect the mapgap issue can cause random errors much later on such as https://tracker.ceph.com/issues/57628, where the OSD would start and load the pgs as exited before the gap (even if those no longer exist). I can't verify the specific tracker since the logs are out of date. |
8ebf51e to
20ae911
Compare
* track_pools_and_pg_num_changes call is moved before the superblock updates to know if we have any OSDMaps *before* trimming them (if possible). This is used to identify map gaps. * track_pools_and_pg_num_changes inner loop is moved into _track_pools_and_pg_num_changes. * lastmap now starts with the newest_map we have, even on mapgap. Previously, on mapgaps, we would (falsly) assume that this is the first start of this OSD and would ignore the state before the gap. This case is fully described in the tracker below: Fixes: https://tracker.ceph.com/issues/63881 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
20ae911 to
f283458
Compare
Avoid p,q usage Signed-off-by: Matan Breizman <mbreizma@redhat.com>
f283458 to
78d7668
Compare
|
jenkins test api |
|
This PR is under test in https://tracker.ceph.com/issues/65796. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
osd/OSD: rewrite track_pools_and_pg_num_changes logic Reviewed-by: Samuel Just <sjust@redhat.com>
track_pools_and_pg_num_changes()is moved into a separate methodtrack_pools_and_pg_num_changes()inner logic is moved into a separate methodMap gap case is now properly handled,
last_mapshould start with the newest map we have.Fixes: https://tracker.ceph.com/issues/63881
Example fix usage after letting osd.2 hit a map gap while creating 2 pools and changing pg_num in the meanwhile:
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e