QA Run #65797
closedwip-yuri5-testing-2024-05-15-0804
Description
--- done. these PRs were included:
https://github.com/ceph/ceph/pull/56389 - osd/scrub: remove detection & handling of reservation timeouts from the code
https://github.com/ceph/ceph/pull/56428 - crush: use std::vector instead of variable length arrays
https://github.com/ceph/ceph/pull/56531 - qa: cephtool/test.sh overrides ec profile with --yes_i_really_mean_it
https://github.com/ceph/ceph/pull/56980 - osd: class:device-class config database mask does not work for osd_compact_on_start
https://github.com/ceph/ceph/pull/57015 - bluefs: bluefs alloc unit should only be shrink
Updated by Yuri Weinstein almost 2 years ago
- Subject changed from wip-yuriw-testing-20240503.213344-main to wip-yuriw-testing-20240503.213344-main (wip-yuri5-testing)
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Testing to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
Updated by Laura Flores almost 2 years ago
- Assignee changed from Laura Flores to Ronen Friedman
@Ronen Friedman can you review the rados run?
Updated by Laura Flores almost 2 years ago
- Assignee changed from Ronen Friedman to Yuri Weinstein
@Yuri Weinstein looks like there are no runs scheduled on the link. Can you check if the link is correct and/or schedule runs?
Updated by Laura Flores almost 2 years ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
Updated by Yuri Weinstein almost 2 years ago
- Shaman Build changed from wip-yuriw-testing-20240503.213344-main to wip-yuri5-testing-2024-05-15-0804
- QA Runs deleted (
wip-yuriw-testing-20240503.213344-main) - Git Branch changed from ceph/ceph-ci/commits/testing/wip-yuriw-testing-20240503.213344-main to ceph/ceph-ci/commits/testing/wip-yuri5-testing-2024-05-15-0804
Updated by Yuri Weinstein almost 2 years ago
- Git Branch changed from ceph/ceph-ci/commits/testing/wip-yuri5-testing-2024-05-15-0804 to yuriw/ceph/commits/wip-yuri5-testing-2024-05-15-0804
Updated by Yuri Weinstein almost 2 years ago
No sure what happened, sorry
rebasing
@Laura Flores
Updated by Yuri Weinstein almost 2 years ago
- Subject changed from wip-yuriw-testing-20240503.213344-main (wip-yuri5-testing) to wip-yuri5-testing-2024-05-15-0804
Updated by Yuri Weinstein almost 2 years ago
- QA Runs set to wip-yuri5-testing-2024-05-15-0804
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Laura Flores
Updated by Laura Flores almost 2 years ago
- Assignee changed from Laura Flores to Matan Breizman
@Matan Breizman can you review this when it's ready?
Updated by Ronen Friedman almost 2 years ago
@Matan Breizman - note that the failure in 7707685 (scrub-repair) is unrelated.
Updated by Matan Breizman almost 2 years ago
@Laura Flores, there are almost 50 red jobs. Should we schedule a re-run?
Ronen Friedman wrote in #note-14:
@Matan Breizman - note that the failure in 7707685 (scrub-repair) is unrelated.
Thanks.
Updated by Laura Flores almost 2 years ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Matan Breizman to Yuri Weinstein
Matan Breizman wrote in #note-15:
@Laura Flores, there are almost 50 red jobs. Should we schedule a re-run?
Ronen Friedman wrote in #note-14:
@Matan Breizman - note that the failure in 7707685 (scrub-repair) is unrelated.
Thanks.
@Yuri Weinstein can you schedule a rerun?
Updated by Yuri Weinstein almost 2 years ago
Laura Flores wrote in #note-16:
@Yuri Weinstein can you schedule a rerun?
For some reason I could schedule a rerun but a full run is being scheduled
@Matan Breizman @Laura Flores pls review when ready
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Matan Breizman
Updated by Matan Breizman almost 2 years ago · Edited
- Related Failures :* :
- https://github.com/ceph/ceph/pull/57015 - bluefs: bluefs alloc unit should only be shrink (7707982)
- https://github.com/ceph/ceph/pull/56531 shouldn't be merged yet as the issue persists. Left a comment in the PR. No need for a rerun since it's only a qa change. (7707707, 7707787)
- We still have around 60 failed jobs on each rados run. It looks like most of them are not related. However, it may be tricky to ignore 60 jobs on each batch (@Laura Flores).
- There are new ones failures which look suspicious, may be related to the failures above(?). I'll investigate on rerun.
"reached maximum tries (51) after waiting for 300 seconds" - 7707938, 7707827.
- https://tracker.ceph.com/issues/65728 - Had 7 instances: 7707712, 7707734, 7707751, 7707848, 7707867, 7707905, 7707972
- https://tracker.ceph.com/issues/65824 - 4 instances: 7707711, 7707754, 7707919, 7707960
New, not related:
https://tracker.ceph.com/issues/66209
Updated by Matan Breizman almost 2 years ago
- Status changed from QA Needs Approval to QA Needs Rerun/Rebuilt
- Assignee changed from Matan Breizman to Yuri Weinstein
Related failures described above.
Updated by Yuri Weinstein almost 2 years ago
@Laura Flores I can't schedule a rerun
Can you pls try so we can see if this is my setup, this run, or the actual problem?
I see in the output looping thru log:
2024-05-23 15:02:41,096.096 DEBUG:teuthology.suite.merge:postmerge script running:
2024-05-23 15:02:41,098.098 DEBUG:teuthology.suite.merge:skipping config rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_pg_log 2-recovery-overrides/{more-async-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-1} backoff/normal ceph clusters/{fixed-4 openstack} crc-failures/default d-balancer/upmap-read mon_election/classic msgr-failures/osd-delay msgr/async-v1only objectstore/bluestore-comp-zlib rados supported-random-distro$/{centos_latest} thrashers/pggrow thrashosds-health workloads/cache-snaps-balanced} due to postmerge filter
2024-05-23 15:02:41,098.098 DEBUG:teuthology.suite.merge:merging config thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-1} backoff/peering ceph clusters/{fixed-4 openstack} crc-failures/bad_map_crc_failure d-balancer/crush-compat mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v2only objectstore/bluestore-comp-zstd rados supported-random-distro$/{ubuntu_latest} thrashers/careful thrashosds-health workloads/cache-snaps}
Here are my lines:
SHA1=
CEPH_BRANCH=
CEPH_QA_MAIL=
CEPH_REPO=
SUITE_REPO=
LIMIT=
DISTRO=
PRIO=
TEUTH=
MACHINE_NAME=
echo $SHA1
echo $CEPH_BRANCH
CEPH_REPO=https://github.com/ceph/ceph-ci.git
SHA1=6cd0f8013e2dea00c3a29a0d8b10656b132d7c80
CEPH_BRANCH=wip-yuri5-testing-2024-05-15-0804
SUITE_REPO=$CEPH_REPO
SUITE_BRANCH=$CEPH_BRANCH
LIMIT=10000
DISTRO=distro
TEUTH=main
MACHINE_NAME=smithi
CEPH_QA_MAIL=yweinste@redhat.com
PRIO=99
echo $SHA1
echo $CEPH_BRANCH
echo $SUITE_BRANCH
RERUN=yuriw-2024-05-20_19:47:14-rados-wip-yuri5-testing-2024-05-15-0804-distro-default-smithi
teuthology-suite -v -c $CEPH_BRANCH -m $MACHINE_NAME -r $RERUN --suite-repo $SUITE_REPO --suite-branch $SUITE_BRANCH --ceph-repo $CEPH_REPO -p $PRIO -R fail,dead,running,waiting --force-priority -k $DISTRO -t $TEUTH -S $SHA1
Updated by Matan Breizman almost 2 years ago · Edited
Yuri Weinstein wrote in #note-21:
@Laura Flores I can't schedule a rerun
Can you pls try so we can see if this is my setup, this run, or the actual problem?
Hey Yuri,
This run will probably need a rebuild either way - without the 2 blocking PRs. Thanks!
Updated by Yuri Weinstein almost 2 years ago
Matan Breizman wrote in #note-22:
Yuri Weinstein wrote in #note-21:
@Laura Flores I can't schedule a rerun
Can you pls try so we can see if this is my setup, this run, or the actual problem?Hey Yuri,
This run will probably need a rebuild either way - without the 2 blocking PRs. Thanks!
When can I rebase, pls LMK
The scheduling issue is a stand-alone one
Updated by Yuri Weinstein almost 2 years ago
rerun bug => https://github.com/ceph/teuthology/pull/1946
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Needs Rerun/Rebuilt to QA Needs Approval
- Assignee changed from Yuri Weinstein to Pere Díaz Bou
Updated by Matan Breizman almost 2 years ago · Edited
I that think we can rebase:When can I rebase, pls LMK
- https://github.com/ceph/ceph/pull/57015 - new changes, fixed previous issue.
- https://github.com/ceph/ceph/pull/56531 - should be excluded from this batch.
Updated by Matan Breizman almost 2 years ago
- Assignee changed from Pere Díaz Bou to Matan Breizman
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Needs Approval to QA Testing
- Assignee changed from Matan Breizman to Yuri Weinstein
@Matan Breizman rebasing
pls assign it to me and/or change the status, otherwise, I don't see it needs attention in the future, thx
Updated by Yuri Weinstein almost 2 years ago
- QA Runs deleted (
wip-yuri5-testing-2024-05-15-0804)
Updated by Yuri Weinstein almost 2 years ago
- QA Runs set to wip-yuri5-testing-2024-05-15-0804
Updated by Yuri Weinstein almost 2 years ago
added a rerun for failed
@Matan Breizman fyi
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Testing to QA Needs Approval
- Assignee changed from Yuri Weinstein to Matan Breizman
Updated by Matan Breizman almost 2 years ago
The following comment was probably missed:
- https://github.com/ceph/ceph/pull/56531 - should be excluded from this batch.
However, the PR is only a `qa` change so no rerun is needed.
Updated by Laura Flores almost 2 years ago
- Assignee changed from Matan Breizman to Yuri Weinstein
Matan Breizman wrote in #note-33:
The following comment was probably missed:
- https://github.com/ceph/ceph/pull/56531 - should be excluded from this batch.
However, the PR is only a `qa` change so no rerun is needed.
@Matan Breizman is any action needed from Yuri here?
Updated by Matan Breizman almost 2 years ago
- Status changed from QA Needs Approval to QA Approved
https://github.com/ceph/ceph/pull/56531 - should not be merged.
Updated by Yuri Weinstein almost 2 years ago
@Matan Breizman merged all but https://github.com/ceph/ceph/pull/56531
I think you forgot to add your comment/link to all PRs
Updated by Yuri Weinstein almost 2 years ago
- Status changed from QA Approved to QA Closed
Updated by Matan Breizman almost 2 years ago
I think you forgot to add your comment/link to all PRs
@Yuri Weinstein, I was about to but you were quicker than me :)