client: fix async/sync I/O stalling due to buffer list exceeding INT_MAX#58564
client: fix async/sync I/O stalling due to buffer list exceeding INT_MAX#58564
Conversation
|
this passes locally: |
|
I've tried compiling this on my local machine am I'm getting: I was able to work around the issue by adding |
this is added here. I'm expecting that to be merged first but I guess I should add a temp commit adding the dependency |
|
Hey @chrisphoffman, added a temp commit. Thanks for taking a look :) |
src/client/Client.cc
Outdated
| if (total_appended >= static_cast<ssize_t>(size)) { | ||
| bl.append(nullptr, 0); |
There was a problem hiding this comment.
shouldn't we just stop appending nullptr buffer pointers and break out of the for loop here ?
|
last push enhanced the code to handle cases where if:
|
57bed9a to
c4b591d
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
RFR @ceph/cephfs |
|
@dparmar18 this required a rebase please. |
5bdfa53 to
0216885
Compare
dd26301 to
4beb883
Compare
|
@vshankar ping |
Noted. In my review for this week. |
|
@vshankar do we have any update on this? |
I was catching up on backlog. Getting to this soon. |
|
jenkins retest this please |
|
jenkins test make check |
|
jenkins test make check |
|
@vshankar was this tested? |
No, not really. But I'll put this with the other client side fixes. |
|
jenkins test make check |
|
This PR is under test in https://tracker.ceph.com/issues/71837. |
* refs/pull/58564/head: client: clamp sizes to INT_MAX in sync i/o code paths client: restrict bufferlist to total write size src/test: test sync/async i/o code paths with huge (4GiB) buffers Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com> Reviewed-by: Christopher Hoffman <choffman@redhat.com>
Seems to be that the comments have been addressed.
In linux, systems calls like write() anyways don't allow writes > 2GiB,
the total write size passed to the Client::_write is clamped to INT_MAX
but bufferlist is not handled. This edge case is patched here.
bug that arises due to buffer list beyond INT_MAX stalls async i/o due to:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
which results in disconnected inode and cap leaks:
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
This commit changes the way Client::_write accepts data. So, now instead of accepting ptr,
iovec array and iovcnt, the helper now accepts a bufferlist which should be contructed by
the caller itself. The reason behind this change is simple - to declutter the API.
For more context checkout this conversation ceph/ceph#58564 (comment)
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
In linux, systems calls like write() anyways don't allow writes > 2GiB,
the total write size passed to the Client::_write is clamped to INT_MAX
but bufferlist is not handled. This edge case is patched here.
bug that arises due to buffer list beyond INT_MAX stalls async i/o due to:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
which results in disconnected inode and cap leaks:
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
This commit changes the way Client::_write accepts data. So, now instead of accepting ptr,
iovec array and iovcnt, the helper now accepts a bufferlist which should be contructed by
the caller itself. The reason behind this change is simple - to declutter the API.
For more context checkout this conversation ceph#58564 (comment)
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit f4f359e)
In linux, systems calls like write() anyways don't allow writes > 2GiB,
the total write size passed to the Client::_write is clamped to INT_MAX
but bufferlist is not handled. This edge case is patched here.
bug that arises due to buffer list beyond INT_MAX stalls async i/o due to:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
which results in disconnected inode and cap leaks:
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
This commit changes the way Client::_write accepts data. So, now instead of accepting ptr,
iovec array and iovcnt, the helper now accepts a bufferlist which should be contructed by
the caller itself. The reason behind this change is simple - to declutter the API.
For more context checkout this conversation ceph#58564 (comment)
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit f4f359e)
In linux, systems calls like write() anyways don't allow writes > 2GiB,
the total write size passed to the Client::_write is clamped to INT_MAX
but bufferlist is not handled. This edge case is patched here.
bug that arises due to buffer list beyond INT_MAX stalls async i/o due to:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
which results in disconnected inode and cap leaks:
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
This commit changes the way Client::_write accepts data. So, now instead of accepting ptr,
iovec array and iovcnt, the helper now accepts a bufferlist which should be contructed by
the caller itself. The reason behind this change is simple - to declutter the API.
For more context checkout this conversation ceph#58564 (comment)
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit f4f359e)
In linux, systems calls like write() anyways don't allow writes > 2GiB,
the total write size passed to the Client::_write is clamped to INT_MAX
but bufferlist is not handled. This edge case is patched here.
bug that arises due to buffer list beyond INT_MAX stalls async i/o due to:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
which results in disconnected inode and cap leaks:
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
This commit changes the way Client::_write accepts data. So, now instead of accepting ptr,
iovec array and iovcnt, the helper now accepts a bufferlist which should be contructed by
the caller itself. The reason behind this change is simple - to declutter the API.
For more context checkout this conversation ceph#58564 (comment)
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit f4f359e)
In linux, systems calls like
write()anyways don't allow writes > 2GiB, the total write size passed to theClient::_writeis clamped toINT_MAXbut bufferlist is not handled. This PR addresses this issue which stalls async i/o due to:which results in disconnected inode and cap leaks:
Fixes: https://tracker.ceph.com/issues/66245
Signed-off-by: Dhairya Parmar dparmar@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e