Skip to content

mds/client: file blockdiff support#61937

Merged
kotreshhr merged 4 commits intoceph:mainfrom
vshankar:wip-file-block-diff
Mar 20, 2025
Merged

mds/client: file blockdiff support#61937
kotreshhr merged 4 commits intoceph:mainfrom
vshankar:wip-file-block-diff

Conversation

@vshankar
Copy link
Contributor

@vshankar vshankar commented Feb 20, 2025

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@vshankar vshankar added the cephfs Ceph File System label Feb 20, 2025
@vshankar vshankar requested a review from a team February 20, 2025 13:31
Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some early comments of a partial review

@vshankar
Copy link
Contributor Author

I will push in the mirror daemon changes to and then mark this PR as ready-for-review.

@vshankar vshankar force-pushed the wip-file-block-diff branch from da173b1 to 6b48bbd Compare February 28, 2025 05:00
@vshankar vshankar marked this pull request as ready for review February 28, 2025 05:02
@vshankar vshankar requested a review from a team as a code owner February 28, 2025 05:02
@vshankar
Copy link
Contributor Author

Marking this ready for review as the latest push has integration of the block diff API with the mirror daemon. The mirror daemon code needs a bit more polishing, however, I think its good to seek reviews before starting that effort.

@vshankar
Copy link
Contributor Author

@joscollin The latest push includes commits from PR #61539. I'll will probably integrate those changes into an existing commit and add your credits in the commit message.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 3, 2025

I'm working on fixing a bug identified in testing the mirror daemon block diff integration.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 3, 2025

I'm working on fixing a bug identified in testing the mirror daemon block diff integration.

The issue seems to be related to snapdiff not returning deleted dentry information for a file which is a symlink in a snapshot and a regular file in another snap. Discussing with @ifed01 on slack.

@vshankar vshankar force-pushed the wip-file-block-diff branch from 6b48bbd to 0e52abe Compare March 6, 2025 05:47
@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

To be tested with #62126

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

This PR is under test in https://tracker.ceph.com/issues/70332.

@vshankar vshankar force-pushed the wip-file-block-diff branch from 0e52abe to 5accf36 Compare March 6, 2025 10:00
@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

Also fixed up a minor test refactor -- nothing but, just rearranging the test case.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

Question to @ceph/cephfs - does this new mds operation need a feature bit?

@vshankar
Copy link
Contributor Author

vshankar commented Mar 6, 2025

Question to @ceph/cephfs - does this new mds operation need a feature bit?

EDIT: The MDS would return -ENOTSUPP if it does not support the (block diff) op. The mirror daemon can handle this error and fallback to full file copy.

@vshankar
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/70332.

There is a failed mirror daemon test that seems related: https://pulpito.ceph.com/vshankar-2025-03-10_05:25:01-fs:mirror-wip-vshankar-testing-20250306.055054-debug-testing-default-smithi/8178206/

Working on that.

@vshankar
Copy link
Contributor Author

Full fs suite run: https://pulpito.ceph.com/vshankar-2025-03-10_04:27:16-fs-wip-vshankar-testing-20250306.055054-debug-testing-default-smithi/

Nothing stood out at first pass, but I'll go through the files in detail to see if something new broke.

@vshankar vshankar force-pushed the wip-file-block-diff branch from 91e1125 to bb3be44 Compare March 12, 2025 13:59
@vshankar
Copy link
Contributor Author

I refreshed this PR with just the mds and client changes for the new block diff operation. The mirror daemon changes will be a separate PR.

@vshankar
Copy link
Contributor Author

vshankar commented Mar 13, 2025

@kotreshhr @mchangir PTAL - this change is not now only the mds and client side changes for the new block diff call. This should not break anything since it's an entirely new MDS operation. The thing to focus a bit more during review would be the blockdiff call to older MDSs which does not understand this (new) operation. I remember some older MDSs would ceph_abort() on seeing an operation that it does not understand, but I think we fixed those (in quincy+) to derr log and return -ENOTSUP.

@vshankar
Copy link
Contributor Author

jenkins test make check

@vshankar
Copy link
Contributor Author

jenkins test make check arm64

@kotreshhr
Copy link
Contributor

jenkins test make check

@kotreshhr
Copy link
Contributor

kotreshhr commented Mar 13, 2025

Review Comments:

Overall the code looks good.

I particularly tested with mds (not having blockdiff api) with the libcephfs program compiled with blockdiff api support.
I didn't see any issues and got the "Operation Not supported" error as expected. Please find the mds log snippet below.

2025-03-13T18:16:22.506+0530 7f30df0ee6c0  1 -- [v2:192.168.1.114:6810/4235873411,v1:192.168.1.114:6811/4235873411] <== client.4163 192.168.1.114:0/1873501055 13 ==== client_request
(client.4163:11 ??? #0x10000000002//snap1/fileA #0x10000000002//snap2/fileA 2025-03-13T18:16:22.508074+0530 caller_uid=1000, caller_gid=1000{}) ==== 192+0+0 (crc 0 0 0) 0x564125f6bc
00 con 0x564125fa1000
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  4 mds.0.server handle_client_request client_request(client.4163:11 ??? #0x10000000002//snap1/fileA #0x10000000002//snap2/fileA 2025-03-13T
18:16:22.508074+0530 caller_uid=1000, caller_gid=1000{})
2025-03-13T18:16:22.506+0530 7f30df0ee6c0 20 mds.0.3 get_session have 0x564125f9c500 client.4163 192.168.1.114:0/1873501055 state open
2025-03-13T18:16:22.506+0530 7f30df0ee6c0 15 mds.0.server  oldest_client_tid=11
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  7 mds.0.cache request_start request(client.4163:11 nref=2 cr=0x564125f6bc00)
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  7 mds.0.server dispatch_client_request client_request(client.4163:11 ??? #0x10000000002//snap1/fileA #0x10000000002//snap2/fileA 2025-03-1
3T18:16:22.508074+0530 caller_uid=1000, caller_gid=1000{})
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  1 mds.0.server  unknown client op 5125
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  7 mds.0.server reply_client_request -95 ((95) Operation not supported) client_request(client.4163:11 ??? #0x10000000002//snap1/fileA #0x10000000002//snap2/fileA 2025-03-13T18:16:22.508074+0530 caller_uid=1000, caller_gid=1000{})
2025-03-13T18:16:22.506+0530 7f30df0ee6c0 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  1 mds.0.server : unknown client op
2025-03-13T18:16:22.506+0530 7f30df0ee6c0 20 mds.0.server lat 0.000027
2025-03-13T18:16:22.506+0530 7f30df0ee6c0 10 mds.0.3 send_message_client client.4163 192.168.1.114:0/1873501055 client_reply(???:11 = -95 (95) Operation not supported safe)
2025-03-13T18:16:22.506+0530 7f30df0ee6c0  1 -- [v2:192.168.1.114:6810/4235873411,v1:192.168.1.114:6811/4235873411] --> 192.168.1.114:0/1873501055 -- client_reply(???:11 = -95 (95) Operation not supported safe) -- 0x564125f27a00 con 0x564125fa1000

I am attaching the test program blockdiff.txt here for reference.

Following is the program output

kotresh:ceph$ g++ -o blockdiff ./blockdiff.cc -I./src/include/ -I./build/include -I./build -I./build/src/include -L./build/lib/ -lcephfs -lceph-common -D_FILE_OFFSET_BITS=64 -std=c++20 -Wl,-rpath,/home/kotresh/sandbox/upstream/kotresh-ceph2/ceph/build/lib

kotresh:ceph$ 
kotresh:ceph$ ./blockdiff 
CephFS mounted successfully!
 Failed to get next changed block, ret:-95

@vshankar
Copy link
Contributor Author

Tests running here: https://tracker.ceph.com/issues/70394

@vshankar
Copy link
Contributor Author

Review Comments:

Overall the code looks good.

I particularly tested with mds (not having blockdiff api) with the libcephfs program compiled with blockdiff api support. I didn't see any issues and got the "Operation Not supported" error as expected. Please find the mds log snippet below.

While that fine, I was talking to Greg today regarding this PR in general and it would benefit to add a feature bit (feature exchange b/w the client and the mds) that's used by the client to check if the mds supports block diff.

@vshankar
Copy link
Contributor Author

The fs suite run didn't report any failures related to this change. I plan to merge this as it is an follow up with feature bit check as mentioned in #61937 (comment).

@mchangir
Copy link
Contributor

The fs suite run didn't report any failures related to this change. I plan to merge this as it is an follow up with feature bit check as mentioned in #61937 (comment).

Any updates regarding the block size used by RADOS and the experiment that I had run to test fetching of rados block overlaps between two snaps?

@kotreshhr
Copy link
Contributor

@vshankar
Copy link
Contributor Author

The fs suite run didn't report any failures related to this change. I plan to merge this as it is an follow up with feature bit check as mentioned in #61937 (comment).

Any updates regarding the block size used by RADOS and the experiment that I had run to test fetching of rados block overlaps between two snaps?

Err. no. I need to spend time on the case you mention (currently occupied with other stuff).

@joscollin
Copy link
Member

@joscollin The latest push includes commits from PR #61539. I'll will probably integrate those changes into an existing commit and add your credits in the commit message.

@vshankar
#61539 commits are dropped from this PR?

@vshankar
Copy link
Contributor Author

@joscollin The latest push includes commits from PR #61539. I'll will probably integrate those changes into an existing commit and add your credits in the commit message.

@vshankar
#61539 commits are dropped from this PR?

Yes and included in the other pr for integrating blockdiff with mirror daemon.

@vshankar
Copy link
Contributor Author

@joscollin See #62250

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Went through test results. Didn't find any relevant or new failures.

@vshankar
Copy link
Contributor Author

Looks good to me. Went through test results. Didn't find any relevant or new failures.

Thanks! I will have a look at @mchangir query and if no-one has any other comments, then this change can be merged.

I will follow this up with the future check change.

@vshankar
Copy link
Contributor Author

For @mchangir query on slack

I tried running rados listsnaps and here's the output of a file with 4096 As in snap1 followed by an append of 4096 Bs in snap2 followed by an append of 4096 Cs in snap3
$ ./bin/rados listsnaps --pool=cephfs.a.data 10000000000.00000000
10000000000.00000000:
cloneid	snaps	size	overlap
2	2	4096	[]
3	3	8192	[0~8192]
head	-	12288
what is the block size considered as an overlap unit ?

I guess when the Bs were written, it was written to a separate file and copied over the existing file, yes? Doing it that way calls truncate(0) which essentially results in no overlaps in between snaps. And the next operation basically appened data to this file (after a snap is taken), and that's the reason there is a [0-8k] overlap b/w snap=id 3 and head.

@vshankar
Copy link
Contributor Author

@mchangir - I ran your test case and listsnaps shows the correct overlapping region.

   for ((i=0; i <4096; i++)); do echo -n 'A'; done > ./test-file
   stat test-file
   python -c 'print(hex(1099511631268))'
   mkdir .snap/snap_a
   dd ibs=4096 skip=1 if=~/local-test-file of=./test-file count=1 conv=notrunc oflag=append
   mkdir .snap/snap_b
   for ((i=0; i <$((4096*3)); i++)); do echo -n 'C'; done > ~/local-test-file2
   dd ibs=4096 skip=2 if=~/local-test-file2 of=./test-file count=1 conv=notrunc oflag=append

listsnaps:

./bin/rados -p cephfs.a.data listsnaps 10000000da4.00000000
10000000da4.00000000 (seq:220):
cloneid snaps   size    overlap
219     219     4096    [0~4096]
220     220     8192    [0~8192]
head    -       12288

@mchangir
Copy link
Contributor

@mchangir - I ran your test case and listsnaps shows the correct overlapping region.

   for ((i=0; i <4096; i++)); do echo -n 'A'; done > ./test-file
   stat test-file
   python -c 'print(hex(1099511631268))'
   mkdir .snap/snap_a
   dd ibs=4096 skip=1 if=~/local-test-file of=./test-file count=1 conv=notrunc oflag=append
   mkdir .snap/snap_b
   for ((i=0; i <$((4096*3)); i++)); do echo -n 'C'; done > ~/local-test-file2
   dd ibs=4096 skip=2 if=~/local-test-file2 of=./test-file count=1 conv=notrunc oflag=append

listsnaps:

./bin/rados -p cephfs.a.data listsnaps 10000000da4.00000000
10000000da4.00000000 (seq:220):
cloneid snaps   size    overlap
219     219     4096    [0~4096]
220     220     8192    [0~8192]
head    -       12288

that looks much better
has anything been fixed in the rados layer ?
I'll be running some tests to test holes.

@vshankar
Copy link
Contributor Author

has anything been fixed in the rados layer ?

Not that I know of.

I'll be running some tests to test holes.

Good line of thought. For holes, the object would be missing. blockdiff reports the whole extent if there is hole. See: https://github.com/ceph/ceph/pull/61937/files#diff-ea7cb1a6ba9fa08363b14dd00a86bc6b79e01673e93af84ffdfcdbd0d3f26b19R14639

This is fine, since there is no hole punch support in cephfs client (at least the user-space client).

@vshankar
Copy link
Contributor Author

@mchangir @kotreshhr If there is nothing pending for this, please merge.

@kotreshhr
Copy link
Contributor

@mchangir @kotreshhr If there is nothing pending for this, please merge.

Please find run wiki here.
https://tracker.ceph.com/projects/cephfs/wiki/QA_main_2025#wip-vshankar-testing-20250313072951-debug

@kotreshhr kotreshhr merged commit 3f67d1f into ceph:main Mar 20, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants