Skip to content

pacific: qa/suites: added more whitelisting + fix typo#55717

Merged
ljflores merged 1 commit intoceph:pacificfrom
kamoltat:wip-ksirivad-pacific-release-whitelist
Feb 26, 2024
Merged

pacific: qa/suites: added more whitelisting + fix typo#55717
ljflores merged 1 commit intoceph:pacificfrom
kamoltat:wip-ksirivad-pacific-release-whitelist

Conversation

@kamoltat
Copy link
Member

@kamoltat kamoltat commented Feb 22, 2024

Problem:

  1. Not enough whitelisting for certain Cephadm failures
  2. previous PR that landed has a typo that
    causes https://tracker.ceph.com/issues/64452

Solution:

  1. Add more whitelisting
  2. Fix typo in https://tracker.ceph.com/issues/64452

Fixes: https://tracker.ceph.com/issues/64452

Signed-off-by: Kamoltat ksirivad@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the tests label Feb 22, 2024
@github-actions github-actions bot added this to the pacific milestone Feb 22, 2024
@kamoltat kamoltat force-pushed the wip-ksirivad-pacific-release-whitelist branch 3 times, most recently from 391ea41 to 4f758cd Compare February 22, 2024 21:33
@kamoltat kamoltat requested a review from a team as a code owner February 22, 2024 21:33
@github-actions github-actions bot added the core label Feb 22, 2024
Problem:

1. Not enough whitelisting for certain Cephadm failures
2. previous PR that landed has a typo that
causes https://tracker.ceph.com/issues/64452

Solution:

1. Add more whitelisting
2. Fix typo in https://tracker.ceph.com/issues/64452

Fixes: https://tracker.ceph.com/issues/64452

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-pacific-release-whitelist branch from 4f758cd to 489c6ba Compare February 26, 2024 16:32
@kamoltat kamoltat changed the title [DNM] qa/suites: added more whitelisting pacific qa/suites: added more whitelisting Feb 26, 2024
@kamoltat kamoltat changed the title pacific qa/suites: added more whitelisting pacific: qa/suites: added more whitelisting + fix typo Feb 26, 2024
@ljflores ljflores merged commit 1d20752 into ceph:pacific Feb 26, 2024
@kamoltat
Copy link
Member Author

Rados Approved:

https://pulpito.ceph.com/yuriw-2024-02-19_19:25:49-rados-pacific-release-distro-default-smithi

  1. https://tracker.ceph.com/issues/64455 task/test_orch_cli: Health check failed: cephadm background work is paused (CEPHADM_PAUSED)" in cluster log (White list)
  2. https://tracker.ceph.com/issues/64454 rados/cephadm/mgr-nfs-upgrade: Health check failed: 1 stray daemon(s) not managed by cephadm (CEPHADM_STRAY_DAEMON)" in cluster log (whitelist)
  3. https://tracker.ceph.com/issues/63887: Starting alertmanager fails from missing container (happens in Pacific)
  4. Failed to reconnect to smithi155 [7566763]
  5. https://tracker.ceph.com/issues/64278 Unable to update caps for client.iscsi.iscsi.a (known failures)
  6. https://tracker.ceph.com/issues/64452 Teuthology runs into "TypeError: expected string or bytes-like object" during log scraping (teuthology failure)
  7. https://tracker.ceph.com/issues/64343 Expected warnings that need to be whitelisted cause rados/cephadm tests to fail for 7566717 we neeed to add (ERR|WRN|SEC)
  8. https://tracker.ceph.com/issues/58145 orch/cephadm: nfs tests failing to mount exports (mount -t nfs 10.0.31.120:/fake /mnt/foo' fails) 7566724 (resolved issue re-opened)
  9. https://tracker.ceph.com/issues/63577 cephadm: docker.io/library/haproxy: toomanyrequests: You have reached your pull rate limit.
  10. https://tracker.ceph.com/issues/54071 rdos/cephadm/osds: Invalid command: missing required parameter hostname() 756674

Note:

  1. Although 7566762 seems like a different failure from what is displayed in pulpito, in the teuth log it failed because of https://tracker.ceph.com/issues/64278.
  2. rados/cephadm/thrash/ … failed a lot because of https://tracker.ceph.com/issues/64452
  3. 7566717. failed because we didn’t whitelist (ERR|WRN|SEC) :tasks.cephadm:Checking cluster log for badness...
  4. 7566724 https://tracker.ceph.com/issues/58145 ganesha seems resolved 1 year ago, but popped up again so re-opened tracker and ping Adam King (resolved)
    7566777, 7566781, 7566796 are due to https://tracker.ceph.com/issues/63577



White List and re-ran:
    
using this PR: https://github.com/ceph/ceph/pull/55717

https://pulpito.ceph.com/yuriw-2024-02-22_21:39:39-rados-pacific-release-distro-default-smithi/

I list it by description:

rados/cephadm/mds_upgrade_sequence/ —> failed to shutdown mon (known failure discussed with A.King)
rados/cephadm/mgr-nfs-upgrade —> failed to shutdown mon (known failure discussed with A.King)
rados/cephadm/osds —> zap disk error (known failure) https://tracker.ceph.com/issues/54071
rados/cephadm/smoke-roleless —> toomanyrequests: You have reached your pull rate limit. https://www.docker.com/increase-rate-limit. (known failures) https://tracker.ceph.com/issues/63577
rados/cephadm/thrash —> Just needs to whitelist (CACHE_POOL_NEAR_FULL) (known failures)
rados/cephadm/upgrade —> CEPHADM_FAILED_DAEMON (WRN) node-exporter (known failure discussed with A.King, cannot white list FAILED DAEMON)
rados/cephadm/workunits —> known failure: https://tracker.ceph.com/issues/63887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants