Skip to content

mds: Fix readdir when osd is full.#64663

Merged
vshankar merged 2 commits intoceph:mainfrom
kotreshhr:osdfull-subvolumels-issue
Sep 1, 2025
Merged

mds: Fix readdir when osd is full.#64663
vshankar merged 2 commits intoceph:mainfrom
kotreshhr:osdfull-subvolumels-issue

Conversation

@kotreshhr
Copy link
Contributor

@kotreshhr kotreshhr commented Jul 24, 2025

Problem:
The readdir wouldn't list all the entries in the directory when the osd is full with rstats enabled.

Cause:
The issue happens only in multi-mds cephfs cluster. If rstats is enabled, the readdir would request 'Fa' cap on every dentry, basically to fetch the size of the directories. Note that 'Fa' is CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is used by CEPH_STAT_RSTAT.

The request for the cap is a getattr call and it need not go to the auth mds. If rstats is enabled, the getattr would go with the mask CEPH_STAT_RSTAT which mandates the requirement for auth-mds in 'handle_client_getattr', so that the request gets forwarded to auth mds if it's not the auth. But if the osd is full, in the 'dispatch_client_request' before calling the handler function of respective op, the inode is fetched to check the FULL cap access for certain metadata write operations. If the inode doesn't exist, ESTALE is returned. This is wrong for the operations like getattr, where the inode might not be in memory on the non-auth mds and returning ESTALE is confusing and client wouldn't retry. This is introduced by the commit 6db81d8 which fixes subvolume deletion when osd is full.

Fix:
Fetch the inode required for the FULL cap access check for the relevant operations in osd full scenario. This makes sense because all the operations would mostly be preceded with lookup and load the inode in memory or they would handle ESTALE gracefully.

Fixes: https://tracker.ceph.com/issues/72260
Introduced-by: 6db81d8

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@github-actions github-actions bot added cephfs Ceph File System documentation labels Jul 24, 2025
@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch from 380d679 to 7429624 Compare July 24, 2025 09:37
@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch from 7429624 to 6464603 Compare July 24, 2025 17:34
@github-actions github-actions bot added the tests label Jul 24, 2025
@kotreshhr kotreshhr marked this pull request as ready for review July 24, 2025 17:42
@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch 4 times, most recently from 635b0ce to 9803260 Compare July 25, 2025 11:22
@kotreshhr
Copy link
Contributor Author

jenkins test make check

@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch from 2174ee4 to fff7d78 Compare July 28, 2025 11:18
@kotreshhr kotreshhr requested a review from a team July 28, 2025 12:37
@kotreshhr
Copy link
Contributor Author

jenkins test make check arm64

@kotreshhr kotreshhr requested a review from batrick July 29, 2025 05:05
@vshankar vshankar self-assigned this Aug 4, 2025
Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.

Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.

The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the  handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8 which fixes subvolume
deletion when osd is full.

Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.

Fixes: https://tracker.ceph.com/issues/72260
Introduced-by: 6db81d8
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch from fff7d78 to 41e8afa Compare August 7, 2025 09:29
@kotreshhr
Copy link
Contributor Author

The test is validating the fix. Please check the following teuthology run with and without fix.

With Fix:
https://pulpito.ceph.com/khiremat-2025-08-07_08:55:40-fs:full-wip-khiremat-64663-subvolumels-distro-default-smithi/

Without Fix, ran against main branch:
https://pulpito.ceph.com/khiremat-2025-08-07_09:33:28-fs:full-main-distro-default-smithi/

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

(Please also address the comment request in #64663 (comment) )

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditionally approved (pending: #64663 (review))

Fixes: https://tracker.ceph.com/issues/72260
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr kotreshhr force-pushed the osdfull-subvolumels-issue branch from 41e8afa to 8547e57 Compare August 8, 2025 08:25
@kotreshhr
Copy link
Contributor Author

Good work!

(Please also address the comment request in #64663 (comment) )

Thank you. Done.

# The suite (qa/suites/fs/full/tasks/mgr-osd-full.yaml) sets the 'bluestore block size'

@vshankar
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/72684.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit 100415d into ceph:main Sep 1, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants