Bug #57864: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing - Ceph - Ceph

Actions

Copy link

Bug #57864

closed

qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing

Added by Ilya Dryomov over 3 years ago. Updated 5 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Ilya Dryomov

Category:

Target version:

% Done:

Source:

Backport:

reef,squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

48539

Crash signature (v1):

Crash signature (v2):

Tags (freeform):

backport_processed

Merge Commit:

34b378203288f3da2cdaff19e7fae2b05d58634f

Fixed In:

v19.3.0-6909-g34b3782032

Released In:

v20.2.0~1342

Upkeep Timestamp:

2025-11-01T01:34:39+00:00

Description

Discovered in https://github.com/ceph/ceph/pull/48288#discussion_r993883997:

It appears there's a case where the whitelist check fails silently

All tests reported as "pass"
http://pulpito.front.sepia.ceph.com/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/

As seen in http://qa-proxy.ceph.com/teuthology/teuthology-2022-10-07_14:23:03-upgrade:pacific-x-quincy-distro-default-smithi/7058075/teuthology.log RemoveFullTry runs as expected, but the badness check has issues :

2022-10-07T18:49:19.467 INFO:tasks.workunit.client.0.smithi110.stdout:[ RUN      ] TestLibRBD.RemoveFullTry
2022-10-07T18:49:41.562 INFO:tasks.workunit.client.0.smithi110.stdout:[       OK ] TestLibRBD.RemoveFullTry (22095 ms)
...
2022-10-07T19:10:54.268 INFO:tasks.cephadm:Checking cluster log for badness...
2022-10-07T19:10:54.269 DEBUG:teuthology.orchestra.run.smithi110:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2022-10-07T19:10:54.296 INFO:teuthology.orchestra.run.smithi110.stderr:grep: /var/log/ceph/24ceeee2-466a-11ed-8436-001a4aab830c/ceph.log: No such file or directory

When grep '\[ERR\]|\[WRN\]|\[SEC\]' on a non-existent file "No such file or directory" is output to terminal via stderr. stdout is empty. When the empty stdout is piped finally to the head command the sh/run method returns 0 and fails silently.

For example:

$ egrep "SOMETHING" /does/not/exist 
grep: /does/not/exist: No such file or directory
$ echo $?
2
$ egrep "SOMETHING" /does/not/exist | head -n 1
grep: /does/not/exist: No such file or directory
$ echo $?
0

Just to expand on the commit history a bit:

- this is coming from cephadm task (`qa/tasks/cephadm.py`) and was added in https://github.com/ceph/ceph/commit/65b402563547f8caf5e57b5f75324077df9c24d9 -- cut-and-paste from the ceph task
- ceph task (`qa/tasks/ceph.py`) has the same issue and that goes all the way back, through https://github.com/ceph/ceph/commit/bcded7f163570dd6563523957bb7240cefd534fd and https://github.com/ceph/ceph/commit/1cad309d6542697eb774ab5eed985270118631db, to https://github.com/ceph/ceph/commit/42318c57cbfd29c0654bf9701dd1093bd6e93154
- rook task (`qa/tasks/rook.py`) has the same issue, again inherited from the ceph task

        r = mon0_remote.run(args=[
                'if', run.Raw('!'),
                'egrep', '-q', '\[ERR\]|\[WRN\]|\[SEC\]',
                '/tmp/cephtest/data/%s/log' % firstmon,
                run.Raw(';'), 'then', 'echo', 'OK', run.Raw(';'),
                'fi',
                ],
                stdout=StringIO(),
                )

Inverting `egrep -q` exit code (which is 2 for a nonexistent file) results in echoing OK...

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Ilya Dryomov over 3 years ago

Status changed from New to In Progress
Assignee set to Christopher Hoffman

Actions

Copy link

Updated by Ilya Dryomov over 2 years ago

Priority changed from Urgent to Normal
Pull request ID set to 48539

Actions

Copy link

Updated by Ilya Dryomov about 1 year ago

Status changed from In Progress to Pending Backport
Assignee changed from Christopher Hoffman to Ilya Dryomov
Backport set to reef,squid

Actions

Copy link

Updated by Upkeep Bot about 1 year ago

Copied to Backport #69572: squid: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing added

Actions

Copy link

Updated by Upkeep Bot about 1 year ago

Copied to Backport #69573: reef: qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing added

Actions

Copy link

Updated by Upkeep Bot about 1 year ago

Tags (freeform) set to backport_processed

Actions

Copy link

Updated by Upkeep Bot about 1 year ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Updated by Upkeep Bot 8 months ago

Merge Commit set to 34b378203288f3da2cdaff19e7fae2b05d58634f
Fixed In set to v19.3.0-6909-g34b37820328
Upkeep Timestamp set to 2025-07-12T23:39:10+00:00

Actions

Copy link

Updated by Upkeep Bot 8 months ago

Fixed In changed from v19.3.0-6909-g34b37820328 to v19.3.0-6909-g34b3782032
Upkeep Timestamp changed from 2025-07-12T23:39:10+00:00 to 2025-07-15T00:37:13+00:00

Actions

Copy link

#10

Updated by Upkeep Bot 5 months ago

Released In set to v20.2.0~1342
Upkeep Timestamp changed from 2025-07-15T00:37:13+00:00 to 2025-11-01T01:34:39+00:00

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Tags

Custom queries

Bug #57864

qa: fail "Checking cluster log for badness" check (and therefore the job) if the cluster log file is missing

Updated by Ilya Dryomov over 3 years ago

Updated by Ilya Dryomov over 2 years ago

Updated by Ilya Dryomov about 1 year ago

Updated by Upkeep Bot about 1 year ago

Updated by Upkeep Bot about 1 year ago

Updated by Upkeep Bot about 1 year ago

Updated by Upkeep Bot about 1 year ago

Updated by Upkeep Bot 8 months ago

Updated by Upkeep Bot 8 months ago

Updated by Upkeep Bot 5 months ago