client: use path supplied in statfs#64864
Conversation
If path provided, use in statfs. Replumb internal statfs for internal only to allow for use in ll_statfs and statfs Fixes: https://tracker.ceph.com/issues/72355 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
76144d7 to
ddbeb2f
Compare
ffilz
left a comment
There was a problem hiding this comment.
A few comments, but overall this looks good. Nice that it ended up being a simple and concise change.
| stbuf->f_files = total_files_on_fs; | ||
| stbuf->f_ffree = -1; | ||
| stbuf->f_favail = -1; | ||
| stbuf->f_fsid = -1; // ?? |
There was a problem hiding this comment.
So outside a sub-volume f_fsid will be -1? I guess that will work.
I do hope using an inode number is sufficiently unique if multiple filesystems are involved.
There was a problem hiding this comment.
It's possible to run into collisions if multiple FSes are used.
There was a problem hiding this comment.
Yea, I'm not sure we have to deal with multiple FSes. Is there any way to make more unique?
src/client/Client.cc
Outdated
| @@ -12973,7 +12974,21 @@ int Client::ll_statfs(Inode *in, struct statvfs *stbuf, const UserPerm& perms) | |||
| /* Since the only thing this does is wrap a call to statfs, and | |||
| statfs takes a lock, it doesn't seem we have a need to split it | |||
| out. */ | |||
There was a problem hiding this comment.
And those look great, so nice addition of statfs method with path.
Although Ganesha won't use it, you might want to add a path based ll_statfs_path and a ceph_ll_statfs_path.
There was a problem hiding this comment.
I see what you mean.
I'm more of a minimalist for adding/changing apis. I'm not aware of a hard requirement for those since the other statfs calls were added in 2013. Perhaps if it's required we add them later?
src/client/Client.cc
Outdated
| stbuf->f_fsid = -1; // ?? | ||
| if (in->snaprealm) { | ||
| // this is subvolume root ino | ||
| stbuf->f_fsid = in->snaprealm->ino; |
There was a problem hiding this comment.
No, this would be an unexpected divergence in behavior for f_fsid. It should remain the same for any CephFS file system. (It also should definitely not be -1.)
I would suggest the proper fix here is to use the last 4 bytes of ceph_fsid (the Ceph cluster UUID) as the most significant 4 bytes of stbuf->f_fsid and the file system ID as the least significant 4 bytes of stbuf->f_fsid.
There was a problem hiding this comment.
@ffilz do you really need this to be unique for each subvolume? That's not typical for any quota enabled file system I'm aware of.
There was a problem hiding this comment.
On the Linux NFS client, when a user does df, it will do a statfs on each filesystem mounted. The Linux NFS client creates additional mounts whenever it sees a new filesystem (NFS fsid attribute changes). I'm pretty sure statfs will also translate into a statfs on the root inode of such filesystems.
So if we don't show the fsid changing, the Linux NFS client won't understand that there are different statfs attributes (the sub-volume quota for ceph).
There was a problem hiding this comment.
So, ideally, the fsid Ganesha gives to the NFS client should be whatever CephFS gives.
I'm concerned about making subvolumes a "file system" abstraction even though they masquerade as such in various ways (e.g. no external links). At the very least, this fsid needs to include parts of the cluster id and CephFS FSID. I'm not sure how to consolidate all of that into a 16 byte field that will work as you intend. The subvolume inode number itself can be 8 bytes and the CEphFS FSID is probably safely considered a 4 byte int.
There was a problem hiding this comment.
concatenate the bytes and hash (e.g., xxhash, or even a cryptographic hash)?
There was a problem hiding this comment.
The question is if we want the client to understand that a sub-volume is something different. The documented way to clue in the client is to change fsid which in NFSv4 is 2 uint64_t so 16 bytes. In NFSv3 it's 2 uint32_t so only 8 bytes.
Hmm, but it looks like Cephfs allows a quota to be set on any directory. Does that create a sub-volume? Or is quota root something different from sub-volumes (with the no external links and such)?
There was a problem hiding this comment.
The question is if we want the client to understand that a sub-volume is something different. The documented way to clue in the client is to change fsid which in NFSv4 is 2 uint64_t so 16 bytes. In NFSv3 it's 2 uint32_t so only 8 bytes.
So I think the better question is: can CephFS provide enough information to Ganesha using libcephfs that it's dealing with a subvolume root? Would NFS-Ganesha be able to communicate that to the NFS client in some standard way that doesn't require encoding it as a 16 byte (or 8 byte for NFSv3) identifier?
Hmm, but it looks like Cephfs allows a quota to be set on any directory. Does that create a sub-volume? Or is quota root something different from sub-volumes (with the no external links and such)?
Setting a quota does not create a subvolume, no. In practice, we expect that clients will not have the necessary CephFS ("mds") capabilities to create/set quotas for subvolumes. Those quotas would be set when the subvolume is created.
There was a problem hiding this comment.
So it really sounds like for Ganesha and given the Linux kernel behavior, we need to fake up a per-subvolume FSID to get any sensible behavior.
If Rook/Kubernetes had the same experience, would that break things? Since they're sharing a single mount, but have a bunch of sub volumes? Would diverging between these clients for this API call be a problem?
There was a problem hiding this comment.
So, I don't see f_fsid changes in the recent revision. Did we decide not to do that? If yes, can someone update why is that bit is not required?
There was a problem hiding this comment.
This was removed as we didn't want to hold up the changes with statfs using path/inode provided. Discussion on f_fsid wasn't finalized or fully agreed on. We broke out this requirement here: https://tracker.ceph.com/issues/72575
ddbeb2f to
3ffa92f
Compare
| stbuf->f_files = total_files_on_fs; | ||
| stbuf->f_ffree = -1; | ||
| stbuf->f_favail = -1; | ||
| stbuf->f_fsid = -1; // ?? |
There was a problem hiding this comment.
Yea, I'm not sure we have to deal with multiple FSes. Is there any way to make more unique?
| @@ -12529,24 +12529,25 @@ int Client::getcwd(string& dir, const UserPerm& perms) | |||
| int Client::_statfs(Inode *in, struct statvfs *stbuf, | |||
| const UserPerm& perms) | |||
| { | |||
There was a problem hiding this comment.
I don't quite understand the changes in this patch, perhaps more description in the commit message?
b64b164 to
699a1cf
Compare
In statfs, get quota_root for inode provided. Check if a quota is directly applied to inode. If not, reverse tree walk up and maybe find a quota set higher up the tree. Fixes: https://tracker.ceph.com/issues/72355 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
|
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test windows |
|
@chrisphoffman please open a ticket for discussion on |
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
699a1cf to
dd7bdd3
Compare
|
I did one more minor repush. |
|
More general comprehensive test here: https://pulpito.ceph.com/choffman-2025-08-15_15:02:24-fs-wip-choffman-72355-distro-default-gibba/ |
During test_df_for_invalid_directory, path_walk is now called. Use a more general error message as more errnos can be returned and this will be a better catch all. Signed-off-by: Christopher Hoffman <choffman@redhat.com>
One error pertaining to the change:https://pulpito.ceph.com/choffman-2025-08-15_15:02:24-fs-wip-choffman-72355-distro-default-gibba/8445011/ |
If path provided, use in statfs. Replumb internal statfs
for internal only to allow for use in ll_statfs and statfs
In statfs, get quota_root for inode provided. This will allow for
quotas not set at root to be viewed.
Fixes: https://tracker.ceph.com/issues/72355
Signed-off-by: Christopher Hoffman choffman@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition