osd: allow FULL_TRY after failsafe by liupan1111 · Pull Request #17177 · ceph/ceph

liupan1111 · 2017-08-23T04:03:48Z

In #12627 and #14193, I've supported "rbd rm" when osd is full. But I find that support is not enough: only when the "full osd" is not primary, "rbd rm" could work. I did experiment: use vstart to create only one osd, and write until full, then rm, it still hangs there. This fix in this pr could resolve it.

Signed-off-by: Pan Liu wanjun.lp@alibaba-inc.com

liupan1111 · 2017-08-23T09:40:14Z

retest this please

liewegas · 2017-08-23T22:35:41Z

Hmm, I doesn't seem like you should be hitting the failsafe threshold.

Oh, it's because vstart sets the thresholds too high:

        osd failsafe full ratio = .99
        mon osd nearfull ratio = .99
        mon osd backfillfull ratio = .99
        mon osd full ratio = .99

should should be .99, .96, .97, .98, or similar. Update vstart.sh?

liupan1111 · 2017-08-23T23:58:35Z

@liewegas i got the test result by seting these options all to 15, so that This osd Could be filled full quickly. I donnot think This issue is related to option values... Could you give me some suggestion if we dont do This change to resolve this issue?

liewegas · 2017-08-24T01:02:31Z

The important thing is that the full_ratio is less than the failsafe ratio, so that the clsuter is marked full and clients stop writing before hitting the failsafe.

The failsafe is a last-ditch safety check to prevent the OSD from filling itself up. You shouldn't be allowed to override it with the force flag.

liupan1111 · 2017-08-24T01:27:45Z

@liewegas yes, full_ratio is normally less than the failsafe ratio, but in my case, there is possible the failsafe reached first: that is because the "statfs" is called in osd(every one or five seconds?), and set cur_stat of this OSD to full by fail_safe ratio, and then send to monitor, and check osdmap change by full_ratio, and send to client, then pause client io... So there is a time interval...

In addition, I didn't override with full_force, but full_try.

I will not insist on if you think we should tune failsafe and full_ratio to avoid this issue... But I think it maybe a little tricky for this tuning...

liupan1111 · 2017-08-24T01:34:22Z

@liewegas And in this case, I set all this options to 15%, but I found both full_ratio and full_try are 20% when fio pause... I use 1m bs to write it... For this case, I think 1 or 2 percent difference between full_try and fail_safe could not reolve it...

liewegas

Ok, since this is just a FULL_TRY, it's probably harmless... we will only proceed if the transaction is a net reduction in usage. There is still some risk, though: it may be that the operation forces recovery of an object that then fills things up. The failsafe should block that from happening, though!

liewegas · 2017-08-25T18:54:09Z

Do you mind updating the commit description?

Signed-off-by: Pan Liu <wanjun.lp@alibaba-inc.com>

liupan1111 · 2017-08-26T02:14:23Z

@liewegas commit description has been updated, thanks.

Ok, since this is just a FULL_TRY, it's probably harmless... we will only proceed if the transaction is a net reduction in usage. There is still some risk, though: it may be that the operation forces recovery of an object that then fills things up. The failsafe should block that from happening, though!

I searched the code, and found there were no other places set this CEPH_OSD_FLAG_FULL_TRY... I think we could avoid this risk by strictly limit the operations to set FULL_TRY/FULL_FORCE flag, so that failsafe will really safe to block that.

tchaikov · 2017-08-29T01:13:37Z

liupan1111 requested a review from liewegas August 23, 2017 04:07

liupan1111 added bug-fix core labels Aug 23, 2017

liewegas approved these changes Aug 25, 2017

View reviewed changes

liewegas added the needs-qa label Aug 25, 2017

liewegas changed the title ~~osd: support "rbd rm" when osd is full~~ osd: allow FULL_TRY after failsafe Aug 25, 2017

osd: allow FULL_TRY after failsafe

40cf32b

Signed-off-by: Pan Liu <wanjun.lp@alibaba-inc.com>

tchaikov added the wip-kefu-testing label Aug 27, 2017

tchaikov removed the wip-kefu-testing label Aug 29, 2017

liewegas added the wip-sage-testing label Aug 29, 2017

liewegas merged commit 7abe19e into ceph:master Aug 29, 2017

liupan1111 deleted the wip-fix-rm branch August 30, 2017 00:31

xiexingguo mentioned this pull request Oct 10, 2018

tools/rados/rados.cc: fix rados rm --force-full blocking problem #24264

Merged

rzarzynski mentioned this pull request Feb 27, 2024

librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove #55348

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: allow FULL_TRY after failsafe#17177

osd: allow FULL_TRY after failsafe#17177
liewegas merged 1 commit intoceph:masterfrom
liupan1111:wip-fix-rm

liupan1111 commented Aug 23, 2017 •

edited

Loading

Uh oh!

liupan1111 commented Aug 23, 2017

Uh oh!

liewegas commented Aug 23, 2017

Uh oh!

liupan1111 commented Aug 23, 2017

Uh oh!

liewegas commented Aug 24, 2017

Uh oh!

liupan1111 commented Aug 24, 2017

Uh oh!

liupan1111 commented Aug 24, 2017

Uh oh!

liewegas left a comment

Uh oh!

liewegas commented Aug 25, 2017

Uh oh!

liupan1111 commented Aug 26, 2017 •

edited

Loading

Uh oh!

tchaikov commented Aug 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liupan1111 commented Aug 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liupan1111 commented Aug 23, 2017

Uh oh!

liewegas commented Aug 23, 2017

Uh oh!

liupan1111 commented Aug 23, 2017

Uh oh!

liewegas commented Aug 24, 2017

Uh oh!

liupan1111 commented Aug 24, 2017

Uh oh!

liupan1111 commented Aug 24, 2017

Uh oh!

liewegas left a comment

Choose a reason for hiding this comment

Uh oh!

liewegas commented Aug 25, 2017

Uh oh!

liupan1111 commented Aug 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchaikov commented Aug 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liupan1111 commented Aug 23, 2017 •

edited

Loading

liupan1111 commented Aug 26, 2017 •

edited

Loading