src/test: add libcephfs tests for async(nonblocking) calls#54435
src/test: add libcephfs tests for async(nonblocking) calls#54435
Conversation
c0ef42b to
2e7f65f
Compare
cb22b10 to
cc60032
Compare
b1aac0e to
be7ab35
Compare
3e2898a to
d429557
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
d429557 to
ba7868c
Compare
|
@vshankar is it good to go? |
|
I remember |
Pretty much, yes. I will bundle this up with other PRs this week. |
|
jenkins retest this please |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@vshankar any update on this? |
|
jenkins test api |
|
@vshankar any update? |
|
@vshankar was this run thro' the suite? asking since https://tracker.ceph.com/issues/73037 and this PR has an fsync enabled test case. Curious if it passed. |
I think it was run at least once, but I will have to check. But, I think the issues reported exists with squid, so its an existing bug in the async preadv_pwritev call. |
in my testing it had passed since the case is using non-zero length buffers but the issue is reported with zero length buffers, maybe it exists only with the zero length case |
|
@vshankar any ETA on this getting a merge? |
I can run it through tests, yes. But, reading update here: https://tracker.ceph.com/issues/73037#note-7 Do you think we can reproduce this with a test case? |
|
This PR is under test in https://tracker.ceph.com/issues/73647. |
| } No newline at end of file | ||
| } | ||
|
|
||
| TEST_F(TestClient, LlreadvLlwritevOverlimit) { |
There was a problem hiding this comment.
This test (and some others) are failing here: https://pulpito.ceph.com/vshankar-2025-10-28_13:07:05-fs-wip-vshankar-testing-20251028.073424-debug-for-linus-default-smithi/8572921/
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] 6 tests, listed below:
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevOverlimit
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevNonContiguous
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevWriteOnlyFile
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevFsync
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevBufferOverflow
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[ FAILED ] TestClient.LlreadvLlwritevQuotaFull
Mind checking @dparmar18 ?
There was a problem hiding this comment.
All these have the same failure reason - retval is EBADF while async write. Some code change should've triggered this. Digging deeper.
There was a problem hiding this comment.
@vshankar tests are running fine locally before and after rebase.
There was a problem hiding this comment.
Could be related to the way we run tests w/ teuthology.
There was a problem hiding this comment.
Could be related to the way we run tests w/ teuthology.
can you elaborate on this? it's kind of strange to have such behaviour.
There was a problem hiding this comment.
@vshankar the problem is the check
if(fh == NULL || !_ll_fh_exists(fh)) {
ldout(cct, 3) << "(fh)" << fh << " is invalid" << dendl;
retval = -EBADF;
}
is returning
2025-11-06T05:53:58.339 INFO:tasks.workunit.client.0.smithi077.stderr:2025-11-06T05:53:58.253+0000 7f2b4f1d9540 3 client.4749 (fh)0 is invalid
2025-11-06T05:53:58.340 INFO:tasks.workunit.client.0.smithi077.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3820-g3c987ae7/src/test/client/nonblocking.cc:1151: Failure
and the func is
bool _ll_fh_exists(Fh *f) {
return ll_unclosed_fh_set.count(f);
}
this means the fh 0 isn't in the ll_unclosed_fh_set 0_o. I'm surprised
There was a problem hiding this comment.
oh hold on its a fh not a fd, it shouldn't be 0
There was a problem hiding this comment.
the problem here was that the pointers for fh and inode are released after returning from ll_createx and thus they return 0(null) which is evident in the logs. I've removed the conflicting helper func. Test results below.
|
|
||
| #include "test/client/TestClient.h" | ||
|
|
||
| TEST_F(TestClient, LlreadvLlwritevDataPoolFull) { |
There was a problem hiding this comment.
Not sure why these added tests are failing in this run. Last run was way better than this.
There was a problem hiding this comment.
2025-10-28T13:49:51.955 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/nonblocking_full.cc:75: Failure
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Value of: client->wait_for_osdmap_epoch_update(osd_epoch)
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout: Actual: false
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Expected: true
Hmm, this is strange..
There was a problem hiding this comment.
same as #54435 (comment), the reason it resulted in false is the because no write was done:
seq 2) from mds.0
2025-10-28T13:49:40.474 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/TestClient.h:166: Failure
2025-10-28T13:49:40.474 INFO:tasks.workunit.client.0.smithi146.stdout:Expected equality of these values:
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout: bytes_written
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout: Which is: -9
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout: bytes_expected
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout: Which is: 276824064
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:40.472+0000 7f47f5d21a00 3 client.4653 (fh)0 is invalid
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47e0ff9640 11 objectcacher flusher 0 / 209715200: 0 tx, 0 rx, 0 clean, 0 dirty (8388608 target, 104857600 max)
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 tick
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 collect_and_send_metrics
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 collect_and_send_global_metrics
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.0 aggregate
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.0 aggregate res size 0
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 1 -- 172.21.15.146:0/3210938130 --> [v2:172.21.15.162:6834/3017020185,v1:172.21.15.162:6835/3017020185] -- client_metrics [client_metric_type: READ_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: WRITE_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: METADATA_LATENCY latency: 2050-04-11T16:55:54.147995+0000, avg_latency: 2050-04-11T16:55:54.147995+0000, sq_sum: 0, count=1][client_metric_type: CAP_INFO cap_hits: 0 cap_misses: 0 num_caps: 0][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 0 num_dentries: 0][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 1][client_metric_type: PINNED_ICAPS pinned_icaps: 1 total_inodes: 1][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 1][client_metric_type: READ_IO_SIZES total_ops: 0 total_size: 0][client_metric_type: WRITE_IO_SIZES total_ops: 0 total_size: 0] -- 0x7f47cc002a50 con 0x55c4a0698990
2025-10-28T13:49:41.274 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 trim_cache size 0 max 16384
2025-10-28T13:49:41.274 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 upkeep thread waiting interval 1.000000000s
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/TestClient.h:166: Failure
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:Expected equality of these values:
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout: bytes_written
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout: Which is: -9
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout: bytes_expected
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout: Which is: 304506470
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout:
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.812+0000 7f47f5d21a00 3 client.4653 (fh)0 is invalid
2025-10-28T13:49:41.955 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.952+0000 7f47f5d21a00 10 client.4653.objecter _maybe_request_map subscribing (onetime) to next osd map
and thus no epoch update:
2025-10-28T13:49:51.275 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:51.272+0000 7f47d7fff640 20 client.4653 upkeep thread waiting interval 1.000000000s
2025-10-28T13:49:51.955 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/nonblocking_full.cc:75: Failure
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Value of: client->wait_for_osdmap_epoch_update(osd_epoch)
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout: Actual: false
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Expected: true
Results in the below link (green)
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Fixes: https://tracker.ceph.com/issues/63104 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Fixes: https://tracker.ceph.com/issues/63104 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
This requires a new suite, cannot be run with other async i/o test cases therefore apart from adding the test case, add a binary, a shell script to run it and a YAML file to pick it up in teuthology. Fixes: https://tracker.ceph.com/issues/63104 Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
|
had to do another run post rebase to eliminate any side effect https://pulpito.ceph.com/dparmar-2025-11-14_08:26:03-fs:libcephfs-libcephfs-nonblocking-io-testcases-distro-default-smithi/ -- all green |
|
jenkins retest this please |
|
This PR is under test in https://tracker.ceph.com/issues/73938. |
add async i/o test cases
Fixes: https://tracker.ceph.com/issues/63104
Signed-off-by: Dhairya Parmar dparmar@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e