Skip to content

qa: Add benign cluster warning from ec-inconsistent-hinfo test to ignorelist#55764

Merged
yuriw merged 1 commit intoceph:mainfrom
sseshasa:wip-fix-ec-inconsisten-hinfo-wrn
Mar 12, 2024
Merged

qa: Add benign cluster warning from ec-inconsistent-hinfo test to ignorelist#55764
yuriw merged 1 commit intoceph:mainfrom
sseshasa:wip-fix-ec-inconsisten-hinfo-wrn

Conversation

@sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Feb 26, 2024

The changes introduced in PR: #53524 made the randomized values of osd_op_queue and osd_op_queue_cut_off consistent across all OSD shards.

Due to the above, ec-inconsistent-hinfo test could fail with the following cluster warning (benign) depending on the randomly selected scheduler type.

"cluster [WRN] Error(s) ignored for 2:ad551702:::test:head enough copies available"

NOTE:
The above warning doesn't show up currently on "main" branch due to #47830.
Once #55455 and/or #49730 is merged, the above warning might start showing up. Therefore, this is a preemptive PR.

In summary, the warning is generated due to the difference in the PG deletion rates between WPQ and mClock schedulers. Therefore, the warning shows up in cases where the mClock scheduler is the op queue scheduler chosen randomly for the test. The PG deletion rate with mClock scheduler is quicker compared to the WPQ scheduler since it doesn't use sleeps between each delete transaction and relies on the cost of the deletion which in turn is proportional to the average size of the objects in the PG.

For a more detailed analysis, see the associated tracker.

Fixes: https://tracker.ceph.com/issues/64573
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

…orelist

The changes introduced in PR: ceph#53524
made the randomized values of osd_op_queue and osd_op_queue_cut_off
consistent across all OSD shards.

Due to the above, ec-inconsistent-hinfo test could fail with the following
cluster warning (benign) depending on the randomly selected scheduler type.

"cluster [WRN] Error(s) ignored for 2:ad551702:::test:head enough copies
available"

In summary, the warning is generated due to the difference in the PG
deletion rates between WPQ and mClock schedulers. Therefore, the warning
shows up in cases where the mClock scheduler is the op queue scheduler
chosen randomly for the test. The PG deletion rate with mClock scheduler
is quicker compared to the WPQ scheduler since it doesn't use sleeps
between each delete transaction and relies on the cost of the deletion
which in turn is proportional to the average size of the objects in the PG.

For a more detailed analysis, see the associated tracker.

Fixes: https://tracker.ceph.com/issues/64573
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa sseshasa requested a review from a team as a code owner February 26, 2024 17:03
@github-actions github-actions bot added the core label Feb 26, 2024
@sseshasa
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants