Bug #66245
openclient: async I/O stalls while writing buffers >=4GiB (int overflow)
0%
Description
Here's a test case trying to write 4GiB of buffers(two buffers each of 2GiB) which should fail because of int overflow but the thing is it stalls the async I/O.
TEST_F(TestClient, LlreadvLlwritevLargeBuffers) {
/* Test that async I/O code paths handle large buffers*/
int mypid = getpid();
char filename[256];
client->unmount();
TearDown();
SetUp();
sprintf(filename, "test_llreadvllwritevlargebuffers%u", mypid);
Inode *root, *file;
root = client->get_root();
ASSERT_NE(root, (Inode *)NULL);
Fh *fh;
struct ceph_statx stx;
ASSERT_EQ(0, client->ll_createx(root, filename, 0666,
O_RDWR | O_CREAT | O_TRUNC,
&file, &fh, &stx, 0, 0, myperm));
struct statvfs stbuf;
int64_t rc;
rc = client->ll_statfs(root, &stbuf, myperm);
ASSERT_EQ(rc, 0);
int64_t fs_available_space = stbuf.f_bfree * stbuf.f_bsize;
ASSERT_GT(fs_available_space, 0);
const int64_t BUFSIZE = 1024 * 1024 * 1024;
int64_t bytes_written = 0;
std::unique_ptr<C_SaferCond> writefinish = nullptr;
writefinish.reset(new C_SaferCond("test-nonblocking-writefinish-large-buffers"));
auto out_buf_0 = std::make_unique<char[]>(BUFSIZE * 2);
memset(out_buf_0.get(), 0xDD, BUFSIZE * 2);
auto out_buf_1 = std::make_unique<char[]>(BUFSIZE * 2);
memset(out_buf_1.get(), 0xFF, BUFSIZE * 2);
struct iovec iov_out[2] = {
{out_buf_0.get(), BUFSIZE * 2},
{out_buf_1.get(), BUFSIZE * 2}
};
bufferlist bl;
rc = client->ll_preadv_pwritev(fh, iov_out, 2, 0, true, writefinish.get(),
nullptr);
ASSERT_EQ(rc, 0);
bytes_written = writefinish->wait();
ASSERT_EQ(bytes_written, -CEPHFS_ENOSPC);
client->ll_release(fh);
ASSERT_EQ(0, client->ll_unlink(root, filename, myperm));
}
One might wonder why not have `BUFSIZE` set to 2GiB straightaway instead of having it set to 1GiB and then have it multiplied, the reason is that case is handled fine because the compiler would give warning of int overflow and the test case would end gracefully, it is especially this case where the initial buffer size was fine but then we made it overflow.
we have this error as expected:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmount
2024-05-28T16:17:06.854+0530 7f9a5d24c9c0 2 client.4311 unmounting
and then the stall:
2024-05-28T16:17:06.855+0530 7f9a5d24c9c0 10 client.4311 _put_inode on 0x10000000001.head(faked_ino=0 nref=11 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530) n = 8
2024-05-28T16:17:06.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
2024-05-28T16:17:07.672+0530 7f9a44ff96c0 20 client.4311 tick
2024-05-28T16:17:07.672+0530 7f9a44ff96c0 20 client.4311 collect_and_send_metrics
2024-05-28T16:17:07.672+0530 7f9a44ff96c0 20 client.4311 collect_and_send_global_metrics
2024-05-28T16:17:07.672+0530 7f9a44ff96c0 20 client.4311 trim_cache size 0 max 16384
2024-05-28T16:17:07.672+0530 7f9a44ff96c0 20 client.4311 upkeep thread waiting interval 1.000000000s
2024-05-28T16:17:08.673+0530 7f9a44ff96c0 20 client.4311 tick
2024-05-28T16:17:08.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_metrics
2024-05-28T16:17:08.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_global_metrics
2024-05-28T16:17:08.673+0530 7f9a44ff96c0 20 client.4311 trim_cache size 0 max 16384
2024-05-28T16:17:08.673+0530 7f9a44ff96c0 20 client.4311 upkeep thread waiting interval 1.000000000s
2024-05-28T16:17:09.673+0530 7f9a44ff96c0 20 client.4311 tick
2024-05-28T16:17:09.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_metrics
2024-05-28T16:17:09.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_global_metrics
2024-05-28T16:17:09.673+0530 7f9a44ff96c0 20 client.4311 trim_cache size 0 max 16384
2024-05-28T16:17:09.673+0530 7f9a44ff96c0 20 client.4311 upkeep thread waiting interval 1.000000000s
2024-05-28T16:17:10.673+0530 7f9a44ff96c0 20 client.4311 tick
2024-05-28T16:17:10.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_metrics
2024-05-28T16:17:10.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_global_metrics
2024-05-28T16:17:10.673+0530 7f9a44ff96c0 20 client.4311 trim_cache size 0 max 16384
2024-05-28T16:17:10.673+0530 7f9a44ff96c0 20 client.4311 upkeep thread waiting interval 1.000000000s
2024-05-28T16:17:11.673+0530 7f9a44ff96c0 20 client.4311 tick
2024-05-28T16:17:11.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_metrics
2024-05-28T16:17:11.673+0530 7f9a44ff96c0 20 client.4311 collect_and_send_global_metrics
2024-05-28T16:17:11.673+0530 7f9a44ff96c0 20 client.4311 trim_cache size 0 max 16384
2024-05-28T16:17:11.673+0530 7f9a44ff96c0 20 client.4311 upkeep thread waiting interval 1.000000000s
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_cache
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 1 client.4311 dump_inode: DISCONNECTED inode 0x10000000001 #0x10000000001 ref 3 0x10000000001.head(faked_ino=0 nref=3 ll_ref=0 cap_refs={4=0,1024=1,4096=1,8192=2} open={3=0} mode=100666 size=0/4294967296 nlink=1 btime=2024-05-28T16:17:03.387546+0530 mtime=2024-05-28T16:17:03.387546+0530 ctime=2024-05-28T16:17:03.387546+0530 change_attr=0 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000000001 ts 0/0 objects 1 dirty_or_tx 0] 0x7f9a2c009530)
2024-05-28T16:17:11.855+0530 7f9a5d24c9c0 2 client.4311 cache still has 0+1 items, waiting (for caps to release?)
Updated by Venky Shankar almost 2 years ago
- Status changed from New to Triaged
- Assignee set to Dhairya Parmar
- Severity changed from 3 - minor to 2 - major
This could be a generic issue with the error handling in pwritev type to calls. Please audit those and also check if the cap refs are dropped.
Updated by Venky Shankar almost 2 years ago
@Dhairya Parmar This too looks to be similar to https://tracker.ceph.com/issues/64390, yes?
Updated by Dhairya Parmar almost 2 years ago
Venky Shankar wrote in #note-2:
@Dhairya Parmar This too looks to be similar to https://tracker.ceph.com/issues/64390, yes?
not really, that issue was about umount getting stuck because the pools being full didn't allow the client to trim cache and if you see here there's a disconnected inode with refs.
Updated by Dhairya Parmar almost 2 years ago
I read the buffer code and it is the usage of `unsigned int` that is the limitation. Specifically `unsigned length() const { return _len; }`. Since for `unsigned int` the overflow does not produce undefined behaviour like that it does for signed int but the value is wrapped around using modulo arithmatic i.e N - UNSIGNED_INT_MAX - 1 (where N is the value supplied) so what happens at the client side is that `Client::_write` happily appends all the iov structs(each of 1GiB used in test case) to the `bl` but now since the size of the `bl` is 4294967296(i.e. 4GiB), there is an int overflow and the `bl.length()` is now `0` and hence the error:
unknown file: Failure
C++ exception with description "End of buffer [buffer:2]" thrown in the test body.
E.g.:
with:
auto out_buf_0 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_0.get(), 'a', BUFSIZE);
auto out_buf_1 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_1.get(), 'b', BUFSIZE);
auto out_buf_2 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_2.get(), 'c', BUFSIZE);
auto out_buf_3 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_3.get(), 'd', BUFSIZE);
struct iovec iov_out[4] = {
{out_buf_0.get(), BUFSIZE},
{out_buf_1.get(), BUFSIZE},
{out_buf_2.get(), BUFSIZE},
{out_buf_3.get(), BUFSIZE}
};
bufferlist:
2024-06-15T04:11:33.079+0530 7f5983879a00 10 client.4291 bl.length: 0
2024-06-15T04:11:33.079+0530 7f5983879a00 10 client.4291 bl:
buffer::list(len=0,
buffer::ptr(0~1073745832 0x7f57e7600010 in raw 0x7f57e7600010 len 1073745832 nref 1),
buffer::ptr(0~1073741736 0x7f57a7400010 in raw 0x7f57a7400010 len 1073741736 nref 1),
buffer::ptr(0~1073741736 0x7f5767200010 in raw 0x7f5767200010 len 1073741736 nref 1),
buffer::ptr(0~1073737992 0x7f5727000010 in raw 0x7f5727000010 len 1073741736 nref 1)
)
while with:
auto out_buf_0 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_0.get(), 'a', BUFSIZE);
auto out_buf_1 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_1.get(), 'b', BUFSIZE);
auto out_buf_2 = std::make_unique<char[]>(BUFSIZE);
memset(out_buf_2.get(), 'c', BUFSIZE);
struct iovec iov_out[3] = {
{out_buf_0.get(), BUFSIZE},
{out_buf_1.get(), BUFSIZE},
{out_buf_2.get(), BUFSIZE}
};
bufferlist:
2024-06-15T04:13:13.398+0530 7f44898b2a00 10 client.4300 bl.length: 3221225472
2024-06-15T04:13:13.398+0530 7f44898b2a00 10 client.4300 bl:
buffer::list(len=3221225472,
buffer::ptr(0~1073745832 0x7f432b800010 in raw 0x7f432b800010 len 1073745832 nref 1),
buffer::ptr(0~1073741736 0x7f42eb600010 in raw 0x7f42eb600010 len 1073741736 nref 1),
buffer::ptr(0~1073737904 0x7f42ab400010 in raw 0x7f42ab400010 len 1073741736 nref 1)
)
So either we return from `Client::_write` when we see the bufferlist is >= 4GiB or we enhance the buffer code to make it capable of handling more data
Updated by Dhairya Parmar almost 2 years ago
- Subject changed from client: async I/O stalls while writing large buffers(int overflow) to client: async I/O stalls while writing buffers >=4GiB (int overflow)
Updated by Dhairya Parmar almost 2 years ago
Dhairya Parmar wrote in #note-4:
I read the buffer code and it is the usage of `unsigned int` that is the limitation. Specifically `unsigned length() const { return _len; }`. Since for `unsigned int` the overflow does not produce undefined behaviour like that it does for signed int but the value is wrapped around using modulo arithmatic i.e N - UNSIGNED_INT_MAX - 1 (where N is the value supplied) so what happens at the client side is that `Client::_write` happily appends all the iov structs(each of 1GiB used in test case) to the `bl` but now since the size of the `bl` is 4294967296(i.e. 4GiB), there is an int overflow and the `bl.length()` is now `0` and hence the error:
[...]E.g.:
with:
[...]bufferlist:
[...]while with:
[...]bufferlist:
[...]So either we return from `Client::_write` when we see the bufferlist is >= 4GiB or we enhance the buffer code to make it capable of handling more data
FYI this is a generic issue with i/o code paths and not async specific
Updated by Dhairya Parmar over 1 year ago
from my discussion with venky during today's standup, it was decided to move with the solution to return some relevant errno if the buffers go beyond 4GiB.
Updated by Dhairya Parmar over 1 year ago
to my surprise, write calls are able to do max 2GiB. When i initiated an io of 3GiB, the bytes that it returned to me was shockingly 2GiB.
Updated by Dhairya Parmar over 1 year ago · Edited
okay so as per https://man7.org/linux/man-pages/man2/write.2.html the max bytes that can be transferred through write() and similar system calls is 2147479552 bytes. Therefore this isn't actually a bug. (this was for https://tracker.ceph.com/issues/66245#note-8)
Updated by Venky Shankar over 1 year ago
- Status changed from Triaged to Rejected
Updated by Dhairya Parmar over 1 year ago
@Venky Shankar this is still an issue, using buffers > 4GiB crashes the i/o call. This still needs to be taken care of.
Updated by Venky Shankar over 1 year ago
Dhairya Parmar wrote in #note-11:
@Venky Shankar this is still an issue, using buffers > 4GiB crashes the i/o call. This still needs to be taken care of.
It's against the standard though, so its probably fine to not worry about it. Or if we really want to, then this is low priority and an intern task :)
Updated by Dhairya Parmar over 1 year ago
Venky Shankar wrote in #note-12:
Dhairya Parmar wrote in #note-11:
@Venky Shankar this is still an issue, using buffers > 4GiB crashes the i/o call. This still needs to be taken care of.
It's against the standard though, so its probably fine to not worry about it. Or if we really want to, then this is low priority and an intern task :)
would linux calls also crash when supplied with such huge buffers?
Updated by Dhairya Parmar over 1 year ago · Edited
a snippet from _write():
if (buf) {
if (size > 0)
bl.append(buf, size);
} else if (iov){
for (int i = 0; i < iovcnt; i++) {
if (iov[i].iov_len > 0) {
bl.append((const char *)iov[i].iov_base, iov[i].iov_len);
}
}
}
the size over here is a clamped (_preadv_pwritev_locked() i.e. its caller does it) version but the same isn't being done while iterating over the iovec structs, meaning this is an incomplete implementation. The for loop should also do what clamp_to_int does:
loff_t totallen = 0;
for (int i = 0; i < iovcnt; i++) {
totallen += iov[i].iov_len;
}
if (clamp_to_int) {
totallen = std::min(totallen, (loff_t)INT_MAX);
}
meaning non-vectored I/O would go through since they only depend on size but vectored I/O (async or sync) will suffer.
which translates to having calls like this:
r = objectcacher->file_write(&in->oset, &in->layout,
in->snaprealm->get_snap_context(),
offset, size, bl, ceph::real_clock::now(),
0, iofinish.get(),
onfinish == nullptr
? objectcacher->CFG_block_writes_upfront()
: false);
have a proper
size but improper bl length which throws the C++ error mentioned in description. This is a case of inconsistency. The for loop (or any other part) must clamp the size of iov.Updated by Dhairya Parmar over 1 year ago
so the final code would something be like this:
ssize_t total_appended = 0;
if (buf) {
if (size > 0)
bl.append(buf, size);
} else if (iov){
for (int i = 0; i < iovcnt; i++) {
if (iov[i].iov_len > 0) {
if (total_appended >= size) {
bl.append(nullptr, 0);
} else {
bl.append((const char *)iov[i].iov_base, iov[i].iov_len);
total_appended += iov[i].iov_len;
}
}
}
}
Updated by Dhairya Parmar over 1 year ago
- Status changed from Rejected to In Progress
- Pull request ID set to 58564
Updated by Venky Shankar about 1 year ago
- Status changed from In Progress to Fix Under Review
- Backport set to reef,squid
Updated by Venky Shankar 8 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from reef,squid to tentacle,squid,reef
Updated by Upkeep Bot 8 months ago
- Merge Commit set to fa6e8849b73071aec7ab413d21d866649e9c012d
- Fixed In set to v20.3.0-1788-gfa6e8849b7
- Upkeep Timestamp set to 2025-07-21T04:11:01+00:00
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72196: reef: client: async I/O stalls while writing buffers >=4GiB (int overflow) added
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72197: tentacle: client: async I/O stalls while writing buffers >=4GiB (int overflow) added
Updated by Upkeep Bot 8 months ago
- Copied to Backport #72198: squid: client: async I/O stalls while writing buffers >=4GiB (int overflow) added