Skip to content

quincy: revert backport of #45529#46605

Merged
yuriw merged 2 commits intoceph:quincyfrom
rzarzynski:wip-55981-quincy
Jun 10, 2022
Merged

quincy: revert backport of #45529#46605
yuriw merged 2 commits intoceph:quincyfrom
rzarzynski:wip-55981-quincy

Conversation

@rzarzynski
Copy link
Contributor

@rzarzynski rzarzynski commented Jun 9, 2022

Technically this isn't a backport but a full revert of an already merged backport.

"Backport" ticket for the sake of tracking: https://tracker.ceph.com/issues/55981

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@rzarzynski rzarzynski added this to the quincy milestone Jun 9, 2022
@rzarzynski rzarzynski requested a review from a team as a code owner June 9, 2022 19:44
This reverts commit 3ff0df6
which is the in-OSD part of the fix for accumulation of `dup`
entries in a PG Log. Brainstorming it has brought questions
on the OSD's behaviour during an upgrade if there are tons of
dups in the log. What must be double-checked before bringing
it back is ensuring we chunk the deletions properly to not
impose OOMs / stalls in, to exemplify, RocksDB.

The backport ticket is: https://tracker.ceph.com/issues/55981

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This reverts commit 5245fb3.

Although the chunking in off-line `dups` trimming (via COT) seems
fine, the `ceph-objectstore-tool` is a client of `trim()` of
`PGLog::IndexedLog` which means than a partial revert is not
possible without extensive changes.

The backport ticket is: https://tracker.ceph.com/issues/55981

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@rzarzynski rzarzynski changed the title quincy: don't trim excessive PGLog::IndexedLog::dups entries on-line quincy: revert backport of #45529 Jun 9, 2022
@ljflores
Copy link
Member

http://pulpito.front.sepia.ceph.com/?branch=wip-yuri4-testing-2022-06-09-1510-quincy

Failures, unrelated:
1. https://tracker.ceph.com/issues/52321
2. https://tracker.ceph.com/issues/52124
3. https://tracker.ceph.com/issues/45721
4. https://tracker.ceph.com/issues/55741
5. https://tracker.ceph.com/issues/55001
6. https://tracker.ceph.com/issues/55986

Details:
1. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
2. Invalid read of size 8 in handle_recovery_delete() - Ceph - RADOS
3. CommandFailedError: Command failed (workunit test rados/test_python.sh) FAIL: test_rados.TestWatchNotify.test - Ceph - RADOS
4. cephadm/test_dashboard_e2e.sh: Unable to find element cd-modal .custom-control-label when testing on orchestrator/01-hosts.e2e-spec.ts - Ceph - Mgr - Dashboard
5. rados/test.sh: Early exit right after LibRados global tests complete - Ceph - RADOS
6. cephadm: Test failure: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) - Ceph - Orchestrator

@yuriw yuriw merged commit a60ba40 into ceph:quincy Jun 10, 2022
@LittleFox94
Copy link
Contributor

is there any info why this was reverted? Sadly cannot find anything on the ticket, this PR or the mailing lists

@rzarzynski
Copy link
Contributor Author

Hello @LittleFox94. Yes, I've put some notes in the commits' descriptions but a follow-on is expected. Basically this PR is a result of the worry that the in-OSD chunking doesn't properly chunk removal of keys which can be problematic when somebody upgrades a cluster with those wrong dups buried in. Here the 5245fb3 got reverted only because it depends on the in-OSD changes.

Unfortunately, over the weekend it turned out the COT part is problematic too (backfill enforcing. pg_info). The new version is #46630.

@LittleFox94
Copy link
Contributor

Thank you @rzarzynski - for some reason I didn't look in the commit messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants