Skip to content

src/test: add libcephfs tests for async(nonblocking) calls#54435

Merged
vshankar merged 3 commits intoceph:mainfrom
dparmar18:libcephfs-nonblocking-io-testcases
Dec 10, 2025
Merged

src/test: add libcephfs tests for async(nonblocking) calls#54435
vshankar merged 3 commits intoceph:mainfrom
dparmar18:libcephfs-nonblocking-io-testcases

Conversation

@dparmar18
Copy link
Contributor

@dparmar18 dparmar18 commented Nov 9, 2023

add async i/o test cases

Fixes: https://tracker.ceph.com/issues/63104
Signed-off-by: Dhairya Parmar dparmar@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the tests label Nov 9, 2023
@dparmar18 dparmar18 force-pushed the libcephfs-nonblocking-io-testcases branch 4 times, most recently from c0ef42b to 2e7f65f Compare November 10, 2023 21:01
@dparmar18 dparmar18 force-pushed the libcephfs-nonblocking-io-testcases branch 15 times, most recently from cb22b10 to cc60032 Compare November 27, 2023 10:36
@dparmar18 dparmar18 force-pushed the libcephfs-nonblocking-io-testcases branch 3 times, most recently from b1aac0e to be7ab35 Compare December 18, 2023 13:02
@dparmar18 dparmar18 force-pushed the libcephfs-nonblocking-io-testcases branch 2 times, most recently from 3e2898a to d429557 Compare January 11, 2024 13:26
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18 dparmar18 force-pushed the libcephfs-nonblocking-io-testcases branch from d429557 to ba7868c Compare February 5, 2024 11:58
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@dparmar18
Copy link
Contributor Author

@vshankar is it good to go?

@dparmar18
Copy link
Contributor Author

dparmar18 commented Jun 23, 2025

I remember CEPHFS_* errnos are no more in use, last push fixed one such test case https://github.com/ceph/ceph/compare/a4c298a3e63f9f6e5afd550926ea862ebae9e11b..6437afc20c2de45416627dbbf8bd42a8c5da2f4b, nothing apart from this

@vshankar
Copy link
Contributor

@vshankar is it good to go?

Pretty much, yes. I will bundle this up with other PRs this week.

@vshankar
Copy link
Contributor

vshankar commented Jul 1, 2025

jenkins retest this please

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18
Copy link
Contributor Author

@vshankar any update on this?

@dparmar18
Copy link
Contributor Author

jenkins test api

@dparmar18
Copy link
Contributor Author

@vshankar any update?

@dparmar18
Copy link
Contributor Author

@vshankar was this run thro' the suite? asking since https://tracker.ceph.com/issues/73037 and this PR has an fsync enabled test case. Curious if it passed.

@vshankar
Copy link
Contributor

@vshankar was this run thro' the suite? asking since https://tracker.ceph.com/issues/73037 and this PR has an fsync enabled test case. Curious if it passed.

I think it was run at least once, but I will have to check. But, I think the issues reported exists with squid, so its an existing bug in the async preadv_pwritev call.

@dparmar18
Copy link
Contributor Author

@vshankar was this run thro' the suite? asking since https://tracker.ceph.com/issues/73037 and this PR has an fsync enabled test case. Curious if it passed.

I think it was run at least once, but I will have to check. But, I think the issues reported exists with squid, so its an existing bug in the async preadv_pwritev call.

in my testing it had passed since the case is using non-zero length buffers but the issue is reported with zero length buffers, maybe it exists only with the zero length case

@dparmar18
Copy link
Contributor Author

@vshankar any ETA on this getting a merge?

@vshankar
Copy link
Contributor

@vshankar any ETA on this getting a merge?

I can run it through tests, yes. But, reading update here: https://tracker.ceph.com/issues/73037#note-7

Do you think we can reproduce this with a test case?

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/73647.

} No newline at end of file
}

TEST_F(TestClient, LlreadvLlwritevOverlimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (and some others) are failing here: https://pulpito.ceph.com/vshankar-2025-10-28_13:07:05-fs-wip-vshankar-testing-20251028.073424-debug-for-linus-default-smithi/8572921/

2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] 6 tests, listed below:
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevOverlimit
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevNonContiguous
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevWriteOnlyFile
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevFsync
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevBufferOverflow
2025-10-28T20:13:32.023 INFO:tasks.workunit.client.0.smithi069.stdout:[  FAILED  ] TestClient.LlreadvLlwritevQuotaFull

Mind checking @dparmar18 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these have the same failure reason - retval is EBADF while async write. Some code change should've triggered this. Digging deeper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar tests are running fine locally before and after rebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be related to the way we run tests w/ teuthology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be related to the way we run tests w/ teuthology.

can you elaborate on this? it's kind of strange to have such behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar the problem is the check

if(fh == NULL || !_ll_fh_exists(fh)) {
      ldout(cct, 3) << "(fh)" << fh << " is invalid" << dendl;
      retval = -EBADF;
    }

is returning

2025-11-06T05:53:58.339 INFO:tasks.workunit.client.0.smithi077.stderr:2025-11-06T05:53:58.253+0000 7f2b4f1d9540  3 client.4749 (fh)0 is invalid
2025-11-06T05:53:58.340 INFO:tasks.workunit.client.0.smithi077.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3820-g3c987ae7/src/test/client/nonblocking.cc:1151: Failure

and the func is

bool _ll_fh_exists(Fh *f) {
    return ll_unclosed_fh_set.count(f);
  }

this means the fh 0 isn't in the ll_unclosed_fh_set 0_o. I'm surprised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hold on its a fh not a fd, it shouldn't be 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here was that the pointers for fh and inode are released after returning from ll_createx and thus they return 0(null) which is evident in the logs. I've removed the conflicting helper func. Test results below.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#include "test/client/TestClient.h"

TEST_F(TestClient, LlreadvLlwritevDataPoolFull) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pulpito.ceph.com/vshankar-2025-10-28_13:07:05-fs-wip-vshankar-testing-20251028.073424-debug-for-linus-default-smithi/8572722/

Not sure why these added tests are failing in this run. Last run was way better than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-10-28T13:49:51.955 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/nonblocking_full.cc:75: Failure
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Value of: client->wait_for_osdmap_epoch_update(osd_epoch)
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:  Actual: false
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Expected: true

Hmm, this is strange..

Copy link
Contributor Author

@dparmar18 dparmar18 Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #54435 (comment), the reason it resulted in false is the because no write was done:

seq 2) from mds.0
2025-10-28T13:49:40.474 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/TestClient.h:166: Failure
2025-10-28T13:49:40.474 INFO:tasks.workunit.client.0.smithi146.stdout:Expected equality of these values:
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:  bytes_written
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:    Which is: -9
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:  bytes_expected
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:    Which is: 276824064
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stdout:
2025-10-28T13:49:40.475 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:40.472+0000 7f47f5d21a00  3 client.4653 (fh)0 is invalid
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47e0ff9640 11 objectcacher flusher 0 / 209715200:  0 tx, 0 rx, 0 clean, 0 dirty (8388608 target, 104857600 max)
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 tick
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 collect_and_send_metrics
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 collect_and_send_global_metrics
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.0 aggregate
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.0 aggregate res size 0
2025-10-28T13:49:41.273 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640  1 -- 172.21.15.146:0/3210938130 --> [v2:172.21.15.162:6834/3017020185,v1:172.21.15.162:6835/3017020185] -- client_metrics [client_metric_type: READ_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: WRITE_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: METADATA_LATENCY latency: 2050-04-11T16:55:54.147995+0000, avg_latency: 2050-04-11T16:55:54.147995+0000, sq_sum: 0, count=1][client_metric_type: CAP_INFO cap_hits: 0 cap_misses: 0 num_caps: 0][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 0 num_dentries: 0][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 1][client_metric_type: PINNED_ICAPS pinned_icaps: 1 total_inodes: 1][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 1][client_metric_type: READ_IO_SIZES total_ops: 0 total_size: 0][client_metric_type: WRITE_IO_SIZES total_ops: 0 total_size: 0] -- 0x7f47cc002a50 con 0x55c4a0698990
2025-10-28T13:49:41.274 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 trim_cache size 0 max 16384
2025-10-28T13:49:41.274 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.270+0000 7f47d7fff640 20 client.4653 upkeep thread waiting interval 1.000000000s
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/TestClient.h:166: Failure
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:Expected equality of these values:
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:  bytes_written
2025-10-28T13:49:41.814 INFO:tasks.workunit.client.0.smithi146.stdout:    Which is: -9
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout:  bytes_expected
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout:    Which is: 304506470
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stdout:
2025-10-28T13:49:41.815 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.812+0000 7f47f5d21a00  3 client.4653 (fh)0 is invalid
2025-10-28T13:49:41.955 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:41.952+0000 7f47f5d21a00 10 client.4653.objecter _maybe_request_map subscribing (onetime) to next osd map

and thus no epoch update:

2025-10-28T13:49:51.275 INFO:tasks.workunit.client.0.smithi146.stderr:2025-10-28T13:49:51.272+0000 7f47d7fff640 20 client.4653 upkeep thread waiting interval 1.000000000s
2025-10-28T13:49:51.955 INFO:tasks.workunit.client.0.smithi146.stdout:/ceph/rpmbuild/BUILD/ceph-20.3.0-3819-gae7657f5/src/test/client/nonblocking_full.cc:75: Failure
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Value of: client->wait_for_osdmap_epoch_update(osd_epoch)
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:  Actual: false
2025-10-28T13:49:51.956 INFO:tasks.workunit.client.0.smithi146.stdout:Expected: true

Results in the below link (green)

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@dparmar18
Copy link
Contributor Author

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Fixes: https://tracker.ceph.com/issues/63104
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
This requires a new suite, cannot be run with other async i/o test cases
therefore apart from adding the test case, add a binary, a shell script
to run it and a YAML file to pick it up in teuthology.

Fixes: https://tracker.ceph.com/issues/63104
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
@dparmar18
Copy link
Contributor Author

had to do another run post rebase to eliminate any side effect https://pulpito.ceph.com/dparmar-2025-11-14_08:26:03-fs:libcephfs-libcephfs-nonblocking-io-testcases-distro-default-smithi/ -- all green

@vshankar
Copy link
Contributor

jenkins retest this please

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/73938.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants