Skip to content

qa/workunits/rados/test.sh: add timeout mechanism and more info to workloads with parallel tests#64219

Merged
SrinivasaBharath merged 5 commits intoceph:mainfrom
kamoltat:wip-ksirivad-fix-qa-rados-test
Jul 24, 2025
Merged

qa/workunits/rados/test.sh: add timeout mechanism and more info to workloads with parallel tests#64219
SrinivasaBharath merged 5 commits intoceph:mainfrom
kamoltat:wip-ksirivad-fix-qa-rados-test

Conversation

@kamoltat
Copy link
Member

@kamoltat kamoltat commented Jun 26, 2025

This commit adds a 90-minute timeout (configurable) for each test in the RADOS test suite
when running in parallel mode. Previously, if a test hung, the entire test
script would hang indefinitely,
causing teutholgoy jobs to time out without useful diagnostic information.

The implementation:

Adds a timeout check in the process monitoring loop
Forcibly terminates tests that run longer than 90 minutes (or whatever
is configured)
Ensures wait is only called after a process has ended or been killed
Improves error reporting to identify which specific test caused the issue
This change prevents indefinite hanging of the test script and provides
better visibility into which test is problematic when timeouts occur.

Also, we added XML output support for the neorados test suite,
ensuring test results are properly captured in XML format. This change
aligns the neorados tests with the existing behavior of the standard
RADOS tests, which were already generating XML output files and utilize
teuthology scanner to capture failures if any.

Fixes: https://tracker.ceph.com/issues/70772

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@kamoltat kamoltat requested a review from a team as a code owner June 26, 2025 20:01
@kamoltat kamoltat self-assigned this Jun 26, 2025
@kamoltat kamoltat changed the title qa/rados/test.sh: add timeout mechanism to RADOS parallel tests [WIP] qa/rados/test.sh: add timeout mechanism to RADOS parallel tests Jun 26, 2025
@kamoltat kamoltat changed the title [WIP] qa/rados/test.sh: add timeout mechanism to RADOS parallel tests [WIP] qa/rados/test.sh: add timeout mechanism and more info to RADOS parallel tests Jun 26, 2025
@kamoltat kamoltat changed the title [WIP] qa/rados/test.sh: add timeout mechanism and more info to RADOS parallel tests [WIP] qa/rados/test.sh: add timeout mechanism and more info to rados_api_tests parallel tests Jun 26, 2025
Copy link
Member

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for a WIP! Let me know when it's ready to be fully reviewed and I can help with that.

Do you know a good way to verify the script locally? I would imagine it involves something like:
ninja ceph_test_rados
../qa/workunits/rados/test.sh

@kamoltat
Copy link
Member Author

@kamoltat
Copy link
Member Author

Looks good for a WIP! Let me know when it's ready to be fully reviewed and I can help with that.

Do you know a good way to verify the script locally? I would imagine it involves something like: ninja ceph_test_rados ../qa/workunits/rados/test.sh

@ljflores good question, didn't think about that, I scheduled some teuthology jobs, I guess we'll see if the script works first, but yeah doing it locally first would have been better ... whoops

@kamoltat
Copy link
Member Author

kamoltat commented Jun 27, 2025

With the new test change we are able to reproduce 3 failures and failed them within 25mins and pin point which pid/test causes the failure:

ksirivad-2025-06-26_20:28:15-rados:monthrash-wip-ksirivad-fix-msg-v2-distro-default-smithi/8352030/teuthology.log

2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: unknown file: Failure
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: C++ exception with description "(110) Connection timed out [osd:110]" thrown in the test body.
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify:
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [  FAILED  ] NeoRadosWatchNotifyPoll.WatchNotify (50468 ms)
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WatchNotifyTimeout
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WatchNotifyTimeout (3691 ms)
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WrongWatchType
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WrongWatchType (3035 ms)
2025-06-26T21:13:25.179 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WatchNotifyCancel
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WatchNotifyCancel (8172 ms)
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [----------] 4 tests from NeoRadosWatchNotifyPoll (65368 ms total)
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify:
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [----------] Global test environment tear-down
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [==========] 6 tests from 2 test suites ran. (76902 ms total)
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [  PASSED  ] 5 tests.
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [  FAILED  ] 1 test, listed below:
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify: [  FAILED  ] NeoRadosWatchNotifyPoll.WatchNotify
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify:
2025-06-26T21:13:25.180 INFO:tasks.workunit.client.0.smithi103.stdout:             watch_notify:  1 FAILED TEST
.
.
>
2025-06-26T21:29:58.681 INFO:tasks.workunit.client.0.smithi103.stderr:+ echo 'Waiting for test watch_notify (PID 16474)...'
2025-06-26T21:29:58.681 INFO:tasks.workunit.client.0.smithi103.stdout:Waiting for test watch_notify (PID 16474)...
2025-06-26T21:29:58.682 INFO:tasks.workunit.client.0.smithi103.stderr:+ kill -0 16474
2025-06-26T21:29:58.682 INFO:tasks.workunit.client.0.smithi103.stderr:+ wait 16474
2025-06-26T21:29:58.682 INFO:tasks.workunit.client.0.smithi103.stdout:Test watch_notify (PID 16474) failed with non-zero exit status
2025-06-26T21:29:58.683 INFO:tasks.workunit.client.0.smithi103.stderr:+ echo 'Test watch_notify (PID 16474) failed with non-zero exit status'
2025-06-26T21:29:58.683 INFO:tasks.workunit.client.0.smithi103.stderr:+ ret=1

ksirivad-2025-06-26_20:28:15-rados:monthrash-wip-ksirivad-fix-msg-v2-distro-default-smithi/8352012/teuthology.log

2025-06-26T20:49:55.694 INFO:tasks.workunit.client.0.smithi089.stdout:                   api_io: [  SKIPPED ] LibRadosIoEC.RmXattr (0 ms)
2025-06-26T20:49:55.695 INFO:tasks.workunit.client.0.smithi089.stdout:                   api_io: [ RUN      ] LibRadosIoEC.XattrIter
2025-06-26T20:49:55.695 INFO:tasks.workunit.client.0.smithi089.stdout:                   api_io: ./src/test/librados/io.cc:422: Skipped
2025-06-26T20:49:55.695 INFO:tasks.workunit.client.0.smithi089.stdout:                   api_io:
2025-06-26T20:49:55.695 INFO:tasks.workunit.client.0.smithi089.stdout:                   api_io: [  SKIPPED ] LibRadosIoEC.XattrIter (0 ms)
2025-06-26T20:49:55.724 INFO:tasks.workunit.client.0.smithi089.stderr:bash: line 1: 14730 Segmentation fault      (core dumped) ceph_test_rados_api_io --gtest_output=xml:/home/ubuntu/cephtest/archive/unit_test_xml_report/api_io.xml 2>&1
2025-06-26T20:49:55.724 INFO:tasks.workunit.client.0.smithi089.stderr:     14731 Done                    | tee ceph_test_rados_api_io.log
2025-06-26T20:49:55.724 INFO:tasks.workunit.client.0.smithi089.stderr:     14732 Done                    | sed "s/^/                   api_io: /"
.
.
.
2025-06-26T21:07:09.754 INFO:tasks.workunit.client.0.smithi089.stdout:Waiting for test api_io (PID 14725)...
2025-06-26T21:07:09.754 INFO:tasks.workunit.client.0.smithi089.stderr:+ echo 'Waiting for test api_io (PID 14725)...'
2025-06-26T21:07:09.754 INFO:tasks.workunit.client.0.smithi089.stderr:+ kill -0 14725
2025-06-26T21:07:09.754 INFO:tasks.workunit.client.0.smithi089.stderr:+ wait 14725
2025-06-26T21:07:09.754 INFO:tasks.workunit.client.0.smithi089.stdout:Test api_io (PID 14725) failed with non-zero exit status
2025-06-26T21:07:09.755 INFO:tasks.workunit.client.0.smithi089.stderr:+ echo 'Test api_io (PID 14725) failed with non-zero exit status'
2025-06-26T21:07:09.755 INFO:tasks.workunit.client.0.smithi089.stderr:+ ret=1

ksirivad-2025-06-26_20:28:15-rados:monthrash-wip-ksirivad-fix-msg-v2-distro-default-smithi/8352036/teuthology.log

2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [       OK ] NeoRadosWatchNotify.WatchNotifyTimeout (7384 ms)
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [----------] 2 tests from NeoRadosWatchNotify (10809 ms total)
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify:
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [----------] 4 tests from NeoRadosWatchNotifyPoll
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WatchNotify
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: unknown file: Failure
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: C++ exception with description "(110) Connection timed out [osd:110]" thrown in the test body.
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify:
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [  FAILED  ] NeoRadosWatchNotifyPoll.WatchNotify (50285 ms)
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WatchNotifyTimeout
2025-06-26T21:15:43.355 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WatchNotifyTimeout (9241 ms)
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WrongWatchType
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WrongWatchType (3948 ms)
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [ RUN      ] NeoRadosWatchNotifyPoll.WatchNotifyCancel
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [       OK ] NeoRadosWatchNotifyPoll.WatchNotifyCancel (3013 ms)
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [----------] 4 tests from NeoRadosWatchNotifyPoll (66488 ms total)
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify:
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [----------] Global test environment tear-down
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [==========] 6 tests from 2 test suites ran. (77298 ms total)
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [  PASSED  ] 5 tests.
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [  FAILED  ] 1 test, listed below:
2025-06-26T21:15:43.356 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify: [  FAILED  ] NeoRadosWatchNotifyPoll.WatchNotify
2025-06-26T21:15:43.357 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify:
2025-06-26T21:15:43.357 INFO:tasks.workunit.client.0.smithi181.stdout:             watch_notify:  1 FAILED TEST

.
.

2025-06-26T21:31:36.342 INFO:tasks.workunit.client.0.smithi181.stdout:Waiting for test watch_notify (PID 16785)...
2025-06-26T21:31:36.342 INFO:tasks.workunit.client.0.smithi181.stderr:+ pid=16785
2025-06-26T21:31:36.342 INFO:tasks.workunit.client.0.smithi181.stderr:+ echo 'Waiting for test watch_notify (PID 16785)...'
2025-06-26T21:31:36.343 INFO:tasks.workunit.client.0.smithi181.stderr:+ kill -0 16785
2025-06-26T21:31:36.343 INFO:tasks.workunit.client.0.smithi181.stderr:+ wait 16785
2025-06-26T21:31:36.343 INFO:tasks.workunit.client.0.smithi181.stdout:Test watch_notify (PID 16785) failed with non-zero exit status
2025-06-26T21:31:36.343 INFO:tasks.workunit.client.0.smithi181.stderr:+ echo 'Test watch_notify (PID 16785) failed with non-zero exit status'
2025-06-26T21:31:36.344 INFO:tasks.workunit.client.0.smithi181.stderr:+ ret=1

@kamoltat
Copy link
Member Author

Currently we do not perform xml output for neo-rados suite, the commit 3a20d0b adds this.

@kamoltat
Copy link
Member Author

By enabling xml output for neo-rados suite, we are able to get teuthology to fail descriptively: https://pulpito.ceph.com/ksirivad-2025-06-27_18:50:34-rados:monthrash-main-distro-default-smithi/8353691/

Unit test failed (workunit test rados/test.sh) on smithi120 with status 1: '(total 1 failed) FAILURE: Test `WatchNotify` of `NeoRadosWatchNotifyPoll`. Reason: unknown file C++ exception with description "(110) Connection timed out [osd:110]" thrown in the test body..'

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-qa-rados-test branch from 3a20d0b to 94c9f1c Compare June 27, 2025 20:49
@kamoltat kamoltat requested a review from Matan-B June 27, 2025 20:49
@kamoltat kamoltat changed the title [WIP] qa/rados/test.sh: add timeout mechanism and more info to rados_api_tests parallel tests qa/workunits/rados/test.sh: add timeout mechanism and more info to rados_api_tests parallel tests Jun 27, 2025
@kamoltat kamoltat added needs-squid-backport PR needs a squid backport needs-tentacle-backport PRs that need tentacle back-port labels Jun 27, 2025
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-qa-rados-test branch from 627ac95 to 1af98a2 Compare June 28, 2025 23:56
@kamoltat
Copy link
Member Author

Failure due to timeout plus unit-tests are more descriptive now:
https://pulpito.ceph.com/ksirivad-2025-06-28_23:57:38-rados:monthrash-main-distro-default-smithi/

image

@kamoltat
Copy link
Member Author

kamoltat commented Jun 29, 2025

@ljflores think this PR is ready to be reviewed now!

Regarding the vstart compatibility work that you did:

e432e35

Should I try to cherry-pick and resolve the conflict? Or is there a way for you to do it, in-case you want to do it yourself since you authored the commit.

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-qa-rados-test branch from 1af98a2 to 87e49e3 Compare June 29, 2025 04:38
Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!
Beside comment below regarding timeout per-test / total timeout - it looks like "serial + timeout" case might be missed here.
The added timeout logic bit is only run under if [ $parallel -eq 1 ] meaning test.sh --serial --timeout 60 is not supported.

My suggestion would be to define --timeout as total timeout (for all tests), then we could also apply it to "serial" by decreasing the waited time from total time each time a test passes.

Comment on lines +121 to +124
echo "ERROR: Test $t ($pid) - TIMED OUT after $max_wait seconds"
kill -9 $pid 2>/dev/null || true
ret=1
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

check_interval=10
pid=${pids[$t]}
echo "Checking Test $t (PID $pid)..."
# Check in a loop with timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, after each test passes, timeout should be updated to
timeout = timeout - waited

Otherwise according to timeout=1800 # 30 minutes default value we would get 30 minutes for each test. A timeout for each test is also probably ok but should be noted in the comment above that this is the intent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, well now after Bill's suggestion, we increased the timeout to 5400 (90 mins) for each test.

Will add more descriptive comments at the top so user is well aware that it is for each test

@ljflores
Copy link
Member

@ljflores think this PR is ready to be reviewed now!

Regarding the vstart compatibility work that you did:

e432e35

Should I try to cherry-pick and resolve the conflict? Or is there a way for you to do it, in-case you want to do it yourself since you authored the commit.

I can cherry-pick it and resolve the conflict. I'll review the rest as well

@kamoltat kamoltat changed the title qa/workunits/rados/test.sh: add timeout mechanism and more info to rados_api_tests parallel tests qa/workunits/rados/test.sh: add timeout mechanism and more info to workloads with parallel tests Jun 30, 2025
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-qa-rados-test branch from f0c4d83 to 7c2e90d Compare July 2, 2025 20:36
kamoltat added 3 commits July 2, 2025 20:58
…ests

This commit adds a 90-minute timeout (configurable) for each test in the RADOS test suite
when running in parallel mode. Previously, if a test hung, the entire test
script would hang indefinitely,
causing teutholgoy jobs to time out without useful diagnostic information.

The implementation:

Adds a timeout check in the process monitoring loop
Forcibly terminates tests that run longer than 90 minutes (or whatever
is configured)
Ensures wait is only called after a process has ended or been killed
Improves error reporting to identify which specific test caused the issue
This change prevents indefinite hanging of the test script and provides
better visibility into which test is problematic when timeouts occur.

Fixes: https://tracker.ceph.com/issues/70772

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
This commit adds XML output support for the neorados test suite,
ensuring test results are properly captured in XML format. This change
aligns the neorados tests with the existing behavior of the standard
RADOS tests, which were already generating XML output files and utilize
teuthology scanner to capture failures if any.

Fixes: https://tracker.ceph.com/issues/70772

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Currently the failure description in teuthology doesn't
give us much detail rather than COMMAND_FAILURE when
the test timedout, this is because when we timeout and
kill the process there are no XML files generated, therefore,
this commit will create an XML file report when we decide
to timeout and kill the long running test process.

Fixes: https://tracker.ceph.com/issues/70772

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Laura Flores and others added 2 commits July 2, 2025 20:59
This commit addes a `--vstart` flag that developers can pass
to run this script locally. The user must be in the `ceph/build`
directory, which is explained as a comment in the script.

The script prepares the test to be run locally by doing the following:
1. Compiling all the necessry targets
2. Setting up a test cluster
3. Prefixing `./bin/` to all executable commands to make them
   vstart-friendly
4. Tearing down the test cluster at the end.

Fixes: https://tracker.ceph.com/issues/70772

Signed-off-by: Laura Flores <lflores@ibm.com>
- For --serial we use timeout function and --timeout
value to to reutrn exit 1 if test exceeds --timeout.
- Update documentation for command-line arguments
such that user understands the order of argument matters.
- Use test arrays for better organization of test definitions
- Add test type tracking to properly handle both rados
and neorados tests when falling back on XML
- Change timeout default value to 90 minutes

Fixes: https://tracker.ceph.com/issues/70772

Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-qa-rados-test branch from 7c2e90d to bfa6edb Compare July 2, 2025 21:02
@kamoltat
Copy link
Member Author

kamoltat commented Jul 2, 2025

@ljflores @Matan-B should be ready for another review now, thanks!

@kamoltat
Copy link
Member Author

kamoltat commented Jul 2, 2025

Latest test running at: https://pulpito.ceph.com/ksirivad-2025-07-02_21:05:49-rados:monthrash-main-distro-default-smithi/

(Edit: All failures seems to be descriptive as intended, involving the correct timeout)

@kamoltat
Copy link
Member Author

kamoltat commented Jul 3, 2025

Also surprisingly, I could not find a single test in suites/rados that uses rados/test.sh --serial. I've seen rados/test.sh --crimson but not rados/test.sh --serial

@ljflores
Copy link
Member

ljflores commented Jul 3, 2025

jenkins test api

@ljflores
Copy link
Member

ljflores commented Jul 3, 2025

jenkins test make check arm64

@ljflores
Copy link
Member

ljflores commented Jul 3, 2025

@kamoltat this is on my to do list to review!

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Also surprisingly, I could not find a single test in suites/rados that uses rados/test.sh --serial. I've seen rados/test.sh --crimson but not rados/test.sh --serial

I've introduced that specifically for Crimson (if --crimson is used parallel=0). This was useful since this test was still flaky from time to time with Crimson and it was hard to spot which test failed since other kept running.

I'll run this through the crimson suite as well once this is ready so we could test the --serial case!

# ./test.sh --timeout 3600 # Parallel mode, 60-min timeout
# ./test.sh --serial --timeout 60 # Serial mode, 1-min timeout
# ./test.sh --crimson --timeout 0 # Crimson mode, no timeout
# ../qa/workunits/rados/test.sh --vstart # Run tests locally from `ceph/build` dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Member

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@shraddhaag
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants