Skip to content

reef: PGMap: remove pool max_avail scale factor#61320

Merged
SrinivasaBharath merged 1 commit intoceph:reeffrom
linuxkidd:backport-max-avail-pr-57003-to-reef
May 22, 2025
Merged

reef: PGMap: remove pool max_avail scale factor#61320
SrinivasaBharath merged 1 commit intoceph:reeffrom
linuxkidd:backport-max-avail-pr-57003-to-reef

Conversation

@linuxkidd
Copy link
Contributor

@linuxkidd linuxkidd commented Jan 10, 2025

Fixes: https://tracker.ceph.com/issues/67906
(cherry picked from commit 4de57e9)

The scaling of max_avail by the ratio of non-degraded to total objects count results in the reported max_avail increasing proportionally to the number of OSDs marked down but not out. This is counter intuitive since OSDs going down should never result in more space being available.

Removing the scale factor allows max_avail to remain unchanged until the OSDs are marked out.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@linuxkidd linuxkidd requested a review from a team as a code owner January 10, 2025 13:51
@github-actions github-actions bot added this to the reef milestone Jan 10, 2025
@linuxkidd
Copy link
Contributor Author

jenkins test make check

@neha-ojha
Copy link
Member

https://jenkins.ceph.com/job/ceph-pull-requests/149382/consoleFull#15343907506733401c-e9d0-4737-9832-6594c5da0afa

diskprediction_local/module.py:17: note: In module imported here,
diskprediction_local/__init__.py:2: note: ... from here:
diskprediction_local/predictor.py: note: In member "__preprocess" of class "RHDiskFailurePredictor":
diskprediction_local/predictor.py:160: error: _ArrayT? has no attribute "reshape"
diskprediction_local/predictor.py:161: error: Unsupported left operand type for + (_ShapeT_co?)
Found 2 errors in 1 file (checked 32 source files)`

I think this is the same issue being discussed in #56894 (comment)

@neha-ojha
Copy link
Member

https://jenkins.ceph.com/job/ceph-pull-requests/149382/consoleFull#15343907506733401c-e9d0-4737-9832-6594c5da0afa

diskprediction_local/module.py:17: note: In module imported here,
diskprediction_local/__init__.py:2: note: ... from here:
diskprediction_local/predictor.py: note: In member "__preprocess" of class "RHDiskFailurePredictor":
diskprediction_local/predictor.py:160: error: _ArrayT? has no attribute "reshape"
diskprediction_local/predictor.py:161: error: Unsupported left operand type for + (_ShapeT_co?)
Found 2 errors in 1 file (checked 32 source files)`

I think this is the same issue being discussed in #56894 (comment)

https://tracker.ceph.com/issues/69471

@linuxkidd
Copy link
Contributor Author

jenkins retest this please

1 similar comment
@linuxkidd
Copy link
Contributor Author

jenkins retest this please

The scaling of max_avail by the ratio of non-degraded to total objects
count results in the reported max_avail increasing proportionally to the
number of OSDs marked `down` but not `out`.  This is counter intuitive
since OSDs going `down` should never result in more space being
available.

Removing the scale factor allows max_avail to remain unchanged until the
OSDs are marked `out`.

Signed-off-by: Michael J. Kidd <linuxkidd@gmail.com>
(cherry picked from commit 4de57e9)
@linuxkidd linuxkidd force-pushed the backport-max-avail-pr-57003-to-reef branch from 5ae09ae to 1ee12b4 Compare April 1, 2025 21:23
@linuxkidd
Copy link
Contributor Author

Force pushed to correct missing -x during cherry-pick

@linuxkidd
Copy link
Contributor Author

jenkins retest this please

@Naveenaidu
Copy link
Contributor

RADOS approved: https://tracker.ceph.com/issues/71030#note-9

@SrinivasaBharath SrinivasaBharath merged commit bbd1e99 into ceph:reef May 22, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants