Project

General

Profile

Actions

Bug #57864

closed

qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing

Added by Ilya Dryomov over 3 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
qa
Target version:
-
% Done:

0%

Source:
Backport:
reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-6909-g34b3782032
Released In:
v20.2.0~1342
Upkeep Timestamp:
2025-11-01T01:34:39+00:00

Description

Discovered in https://github.com/ceph/ceph/pull/48288#discussion_r993883997:


It appears there's a case where the whitelist check fails silently

All tests reported as "pass"
http://pulpito.front.sepia.ceph.com/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/

As seen in http://qa-proxy.ceph.com/teuthology/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/7058075/teuthology.log RemoveFullTry runs as expected, but the badness check has issues :

2022-10-07T18:49:19.467 INFO:tasks.workunit.client.0.smithi110.stdout:[ RUN      ] TestLibRBD.RemoveFullTry
2022-10-07T18:49:41.562 INFO:tasks.workunit.client.0.smithi110.stdout:[       OK ] TestLibRBD.RemoveFullTry (22095 ms)
...
2022-10-07T19:10:54.268 INFO:tasks.cephadm:Checking cluster log for badness...
2022-10-07T19:10:54.269 DEBUG:teuthology.orchestra.run.smithi110:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2022-10-07T19:10:54.296 INFO:teuthology.orchestra.run.smithi110.stderr:grep: /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log: No such file or directory

When grep '\[ERR\]|\[WRN\]|\[SEC\]' on a non-existent file "No such file or directory" is output to terminal via stderr. stdout is empty. When the empty stdout is piped finally to the head command the sh/run method returns 0 and fails silently.

For example:

$ egrep "SOMETHING" /does/not/exist 
grep: /does/not/exist: No such file or directory
$ echo $?
2
$ egrep "SOMETHING" /does/not/exist | head -n 1
grep: /does/not/exist: No such file or directory
$ echo $?
0


Just to expand on the commit history a bit:

- this is coming from cephadm task (`qa/tasks/cephadm.py`) and was added in https://github.com/ceph/ceph/commit/65b402563547f8caf5e57b5f75324077df9c24d9 -- cut-and-paste from the ceph task
- ceph task (`qa/tasks/ceph.py`) has the same issue and that goes all the way back, through https://github.com/ceph/ceph/commit/bcded7f163570dd6563523957bb7240cefd534fd and https://github.com/ceph/ceph/commit/1cad309d6542697eb774ab5eed985270118631db, to https://github.com/ceph/ceph/commit/42318c57cbfd29c0654bf9701dd1093bd6e93154
- rook task (`qa/tasks/rook.py`) has the same issue, again inherited from the ceph task

        r = mon0_remote.run(args=[
                'if', run.Raw('!'),
                'egrep', '-q', '\[ERR\]|\[WRN\]|\[SEC\]',
                '/tmp/cephtest/data/%s/log' % firstmon,
                run.Raw(';'), 'then', 'echo', 'OK', run.Raw(';'),
                'fi',
                ],
                stdout=StringIO(),
                )

Inverting `egrep -q` exit code (which is 2 for a nonexistent file) results in echoing OK...

Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #69572: squid: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missingResolvedIlya DryomovActions
Copied to Ceph - Backport #69573: reef: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missingResolvedIlya DryomovActions
Actions #1

Updated by Ilya Dryomov over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Christopher Hoffman
Actions #2

Updated by Ilya Dryomov over 2 years ago

  • Priority changed from Urgent to Normal
  • Pull request ID set to 48539
Actions #3

Updated by Ilya Dryomov about 1 year ago

  • Status changed from In Progress to Pending Backport
  • Assignee changed from Christopher Hoffman to Ilya Dryomov
  • Backport set to reef,squid
Actions #4

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #69572: squid: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing added
Actions #5

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #69573: reef: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing added
Actions #6

Updated by Upkeep Bot about 1 year ago

  • Tags (freeform) set to backport_processed
Actions #7

Updated by Upkeep Bot about 1 year ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #8

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 34b378203288f3da2cdaff19e7fae2b05d58634f
  • Fixed In set to v19.3.0-6909-g34b37820328
  • Upkeep Timestamp set to 2025-07-12T23:39:10+00:00
Actions #9

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-6909-g34b37820328 to v19.3.0-6909-g34b3782032
  • Upkeep Timestamp changed from 2025-07-12T23:39:10+00:00 to 2025-07-15T00:37:13+00:00
Actions #10

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~1342
  • Upkeep Timestamp changed from 2025-07-15T00:37:13+00:00 to 2025-11-01T01:34:39+00:00
Actions

Also available in: Atom PDF