osd: Remove aios_size argument from submit_batch by RobinGeuze · Pull Request #44065 · ceph/ceph

RobinGeuze · 2021-11-23T14:35:38Z

Due to aios_size being a uint16 and the source value for the actual
call being an int there was a possible overflow. This was "fixed"
with an assert, however that still causes a crash.

This pull removes the need for aios_size completely by iterating
over the list and submitting it in max_iodepth batches.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests
Teuthology
- Completed teuthology run
- No teuthology test necessary (e.g., documentation)

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

Due to aios_size being a uint16 and the source value for the actual call being an int there was a possible overflow. This was "fixed" with an assert, however that still causes a crash. This commit removes the need for aios_size completely by iterating over the list and submitting it in max_iodepth batches. Fixes: https://tracker.ceph.com/issues/46366 Signed-off-by: Robin Geuze <robin.geuze@nl.team.blue>

RobinGeuze · 2021-11-30T15:11:14Z

@tchaikov could you take a look at this? I think its a proper fix for https://tracker.ceph.com/issues/46366

RobinGeuze · 2021-12-02T11:58:50Z

We managed to reproduce the crash described in the issue (and confirmed that it still occurs with master) and verified that this patch fixes it succesfully.

github-actions · 2022-08-01T22:01:42Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

ljflores

Hey @RobinGeuze, can I ask what you did to verify the fix?

Also, per @sebastian-philipp, it looks like this fix may need a test (perhaps a unit test in test/objectstore/test_bdev.cc would work).

RobinGeuze · 2022-08-16T06:46:26Z

Hey @ljflores,

We tested this by first reproducing the issue. We set up a minimal ceph cluster (eg 3 OSD's), write some data to it to make sure there is something there.

Once that is done we kill one of the OSD's. Then we create an object with a very large number of extends, for example using the following python code:

import rados

cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
cluster.connect()

ioctx = cluster.open_ioctx('test')
for i in range(0,1024*1024,2):
    print(i)
    ioctx.write('beep', b'a', i)

print(ioctx.read('beep'))

ioctx.close()

If you then bring the OSD back up it will start recovery and crash once it gets to that object. If this patch is applied it does not crash.

I was trying to figure out how to test this properly but I was unable to get teuthology to work locally. Writing a unit test could work though, I might be able to take a look at that later this week.

github-actions · 2022-12-28T12:02:09Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2023-01-27T13:05:11Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

github-actions bot added the core label Nov 23, 2021

sebastian-philipp added the needs-test label Dec 2, 2021

djgalloway changed the base branch from master to main June 2, 2022 21:25

djgalloway requested a review from a team as a code owner June 2, 2022 21:25

github-actions bot added the stale label Aug 1, 2022

ljflores reviewed Aug 1, 2022

View reviewed changes

ljflores requested review from ifed01 and rzarzynski August 1, 2022 22:25

github-actions bot removed the stale label Aug 1, 2022

github-actions bot added the stale label Dec 28, 2022

github-actions bot closed this Jan 27, 2023

ifed01 mentioned this pull request Mar 20, 2024

blk/aio: fix long batch (64+K entries) submission. #56352

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: Remove aios_size argument from submit_batch#44065

osd: Remove aios_size argument from submit_batch#44065
RobinGeuze wants to merge 1 commit intoceph:mainfrom
RobinGeuze:fixSubmitBatch

RobinGeuze commented Nov 23, 2021

Uh oh!

RobinGeuze commented Nov 30, 2021

Uh oh!

RobinGeuze commented Dec 2, 2021

Uh oh!

github-actions bot commented Aug 1, 2022

Uh oh!

ljflores left a comment

Uh oh!

RobinGeuze commented Aug 16, 2022

Uh oh!

github-actions bot commented Dec 28, 2022

Uh oh!

github-actions bot commented Jan 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RobinGeuze commented Nov 23, 2021

Checklist

Uh oh!

RobinGeuze commented Nov 30, 2021

Uh oh!

RobinGeuze commented Dec 2, 2021

Uh oh!

github-actions bot commented Aug 1, 2022

Uh oh!

ljflores left a comment

Choose a reason for hiding this comment

Uh oh!

RobinGeuze commented Aug 16, 2022

Uh oh!

github-actions bot commented Dec 28, 2022

Uh oh!

github-actions bot commented Jan 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants