Skip to content

cephfs: implement snapdiff via fake .snap subfolder [RFC]#42517

Closed
ghost wants to merge 2 commits intoceph:masterfrom
croit:snap-diff
Closed

cephfs: implement snapdiff via fake .snap subfolder [RFC]#42517
ghost wants to merge 2 commits intoceph:masterfrom
croit:snap-diff

Conversation

@ghost
Copy link

@ghost ghost commented Jul 28, 2021

This patch allows to obtain snapshots' file delta (aka Snap Diff) by
reading fake 'snapdiff-query-formatted' subfolders under .snap directory.
Snapdiff subfolders are not visible when reading from .snap folder, one
has to build and issue such a "query" manually.

Resulting output (directory listing) contains just entries which have
been altered (created/updated/removed) in the final shapshot since the
initial one. New/updated entries are presented as regular files, names
of the removed ones are prefixed with tilda '~'.
E.g. to compare snapshots named snap1 and snap2 one can issue:

ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2

which would return something like that:
total 8
-rw-r--r-- 1 root root 3 Jul 19 16:40 b
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~c
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~C
-rw-r--r-- 1 root root 3 Jul 19 16:40 d
-rw-r--r-- 1 root root 3 Jul 19 16:40 f
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~g
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~G
drwxr-xr-x 0 root root 0 Jul 19 16:40 I
-rw-r--r-- 1 root root 3 Jul 19 16:40 k
drwxr-xr-x 0 root root 0 Jul 19 16:40 K
-rw-r--r-- 1 root root 3 Jul 19 16:40 l
drwxr-xr-x 4 root root 12 Jul 19 16:41 L
drwxr-xr-x 2 root root 6 Jul 19 16:40 S
drwxr-xr-x 2 root root 3 Jul 19 16:40 T

or

ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap3
total 7.5K
-rw-r--r-- 1 root root 3 Jul 19 16:40 a
-rw-r--r-- 1 root root 3 Jul 19 16:40 b
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~c
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~C
-rw-r--r-- 1 root root 3 Jul 19 16:40 d
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~f
-rw-r--r-- 1 root root 3 Jul 19 16:40 g
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~G
drwxr-xr-x 0 root root 0 Jul 19 16:41 G
drwxr-xr-x 2 root root 3 Jul 19 16:40 H
drwxr-xr-x 0 root root 0 Jul 19 16:40 I
-rw-r--r-- 1 root root 3 Jul 19 16:40 l
drwxr-xr-x 4 root root 12 Jul 19 16:41 L
drwxr-xr-x 2 root root 6 Jul 19 16:40 S
drwxr-xr-x 2 root root 3 Jul 19 16:40 T

then diving deeper in the subfolder might show:

ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C
total 1
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~C1
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~cc1

and so on and so forth:

ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C/~C1
total 1
-rw-r--r-- 1 root root 6 Jul 19 16:40 ~c2

File content reading is also available. It returns the full(!) file
content in the target snapshot for new/updated files and one in the
initial snapshot for removed files.
E.g.

less /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C/~C1/~c2
snap1

Order of snapshot names in a snapdiff ""query" isn't important - they're
properly sorted properly according to their ids when processed.
Comparing snapshot and live data isn't supported. Byte-level "deltas"
are not supported.

Signed-off-by: Denis Barahtanov denis.barahtanov@croit.io

@github-actions github-actions bot added the cephfs Ceph File System label Jul 28, 2021
@ghost ghost changed the title cephfs: implement snapdiff via fake .snap subfolder cephfs: implement snapdiff via fake .snap subfolder [RFC] Jul 28, 2021
@batrick batrick requested review from mchangir and vshankar July 29, 2021 02:50
@vshankar
Copy link
Contributor

@denisb-croit I'll start taking a look at this soon (starting sometime next week).

This would be extremely useful for cephfs-mirror. BTW, @mchangir is working on fixing recursive timestamps in CephFS for use with mirror dameon.

@vshankar
Copy link
Contributor

vshankar commented Aug 9, 2021

Leaving my initial thoughts since I'm still going through the changes.

@denisb-croit Did you consider having a readdir_diff() kind of interface or something like that? I.e., having an interface similar to readdir() and just returning the diff (new, updated, missing). IMO, this would be a much cleaner interface (and avoids relying on special chars (tilda) as remove markers).

@ghost
Copy link
Author

ghost commented Aug 11, 2021

@denisb-croit Did you consider having a readdir_diff() kind of interface or something like that? I.e., having an interface similar to readdir() and just returning the diff (new, updated, missing). IMO, this would be a much cleaner interface (and avoids relying on special chars (tilda) as remove markers).

I agree that 'tilda' logic looks a bit ugly. But this provides the major benefit - an ability to use regular file management tools to access the diff.
Moreover the proposed client side changes are minor and hopefully we would be able finally come with a design which doesn't need client stuff upgrade at all. May be at cost of reduced funtionality for legacy clients or something...

@vshankar
Copy link
Contributor

I agree that 'tilda' logic looks a bit ugly. But this provides the major benefit - an ability to use regular file management tools to access the diff.

The immediate use of snap-diff would be cephfs mirror daemon which currently has to walk the entire directory tree to figure out changes (based on [mc]time). This snap-diff would be immensely useful to cut down the walk by only traversing those directories which have some update underneath it. Thereby, my proposal for having something like readdir_diff.

Listing entries via fake subdir is handy for regular fs tools, however, having a nice clean interface via libcephfs would be immensely useful for applications.

@ghost
Copy link
Author

ghost commented Aug 13, 2021

Listing entries via fake subdir is handy for regular fs tools, however, having a nice clean interface via libcephfs would be immensely useful for applications.

Right, in our case we would prefer to use regular file-system access to SnapDiff though.
The intention is to use that primarily for backup purposes. So looks like both specific readdir_diff API and regular file system access make sense.
What do you think about the idea to expose both?

Backend implementation to be refactored in a way to prepare results in readdir_diff API format and expose that API.
Plus additional proxy on top of it to support "tilda"-formatted access, i.e. handle relevant incoming readdir requests on "tilda URLs", invoke readdir_diff internally and translate its results to "tilda" format.

@vshankar
Copy link
Contributor

Listing entries via fake subdir is handy for regular fs tools, however, having a nice clean interface via libcephfs would be immensely useful for applications.

Right, in our case we would prefer to use regular file-system access to SnapDiff though.
The intention is to use that primarily for backup purposes. So looks like both specific readdir_diff API and regular file system access make sense.
What do you think about the idea to expose both?

Yep -- that's where I am coming from.

Backend implementation to be refactored in a way to prepare results in readdir_diff API format and expose that API.
Plus additional proxy on top of it to support "tilda"-formatted access, i.e. handle relevant incoming readdir requests on "tilda URLs", invoke readdir_diff internally and translate its results to "tilda" format.

ACK.

@vshankar
Copy link
Contributor

@denisb-croit Would be good to have tests too (whenever you plan to update next).

@vshankar
Copy link
Contributor

@denisb-croit Would be good to have tests too (whenever you plan to update next).

never mind -- didn't realize you already had a commit for the test.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denisb-croit I'll continue to play with this. Feel free to update the PR with our discussion (readdir_diff, etc..).

string pname;
inodeno_t pino;
if (n.length() && n[0] == '_') {
char first_char = n.length() ? n[0] : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change?

}

snapid_t SnapRealm::resolve_snapname(std::string_view n, inodeno_t atino, snapid_t first, snapid_t last)
std::tuple<snapid_t, bool, snapid_t> SnapRealm::resolve_snapname(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning something like a struct SnapRealInfo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. and BTW, you could parse the special diff keyword snapshot name and invoke ->resolve_snapname() with the respective snapshot name(s). That way, the actual parse logic moves out of SnapRealm to the caller (during path traversal and/or readdir_diff).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!
Refactored this part..

int r = 0;
C_SaferCond onfinish("Client::_read_async flock");
r = objectcacher->file_read(&in->oset, &in->layout, in->snapid,
r = objectcacher->file_read(&in->oset, &in->layout, snapid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear about the usage of listing file contents for these synthetic snap ids? What is the use-case for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can read the content of the file's shapshot listed by snapdiff request.
Hence e.g. for the new/updated files backup software can easily get the new content without reading through the regular snapshot dir structure...
For deleted entries that's rather an overkill but it's provided for the sake of uniformity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. Although that's a craft technique, I a bit wary about it. @batrick what do you think?

std::swap(s1, s2);
}
snapid_t res = (s1 & CEPH_SNAPDIFF_ID_MASK) << CEPH_SNAPDIFF_ID_BITS;
res = res | (s2 & CEPH_SNAPDIFF_ID_MASK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would generating the synthetic snap-id this way be safe (for the mds and/or client)? Could we run into some sort of "id collision" between this (synthetic) and a real snap-id?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically this can cause such a collision when 2^32 snapshots are created.
In practice this would take ~136 years to make that many snapshots at 1 snapshot per second pace.
Hence this looks safe enough.
On the other hand it looks like alternative approaches would require dramatic modifications to the code which do not worth the effort IMO

int Client::fill_stat(Inode *in, struct stat *st, frag_info_t *dirstat, nest_info_t *rstat)
{
ldout(cct, 10) << __func__ << " on " << in->ino << " snap/dev" << in->snapid
ldout(cct, 10) << __func__ << " on " << in->ino << " snap/dev " << in->snapid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change?

f->close_section();
}

void Server::_readdir_diff(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large routine and should be split into smaller callable parts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that when implementing explicit readdir_snapdiff API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss updating this part in the latest push?

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denisb-croit ping?

@ghost ghost force-pushed the snap-diff branch 2 times, most recently from fc26056 to 5a8ec2c Compare September 3, 2021 10:18
@vshankar
Copy link
Contributor

vshankar commented Sep 6, 2021

@denisb-croit Thanks for the update. I'll take a look sometime this week.

This patch allows to obtain snapshots' file delta (aka Snap Diff) by
reading fake 'snapdiff-query-formatted' subfolders under .snap
directory.
Snapdiff subfolders are not visible when reading from .snap folder, one
has to build and issue such a "query" manually.

Resulting output (directory listing) contains just entries which have
been altered (created/updated/removed) in the final shapshot since the
initial one. New/updated entries are presented as regular files, names
of the removed ones are prefixed with tilda '~'.
E.g. to compare snapshots named snap1 and snap2 one can issue:
>ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2

which would return something like that:
total 8
-rw-r--r-- 1 root root  3 Jul 19 16:40 b
-rw-r--r-- 1 root root  3 Jul 19 16:40 ~c
drwxr-xr-x 0 root root  0 Jul 19 16:40 ~C
-rw-r--r-- 1 root root  3 Jul 19 16:40 d
-rw-r--r-- 1 root root  3 Jul 19 16:40 f
-rw-r--r-- 1 root root  3 Jul 19 16:40 ~g
drwxr-xr-x 0 root root  0 Jul 19 16:40 ~G
drwxr-xr-x 0 root root  0 Jul 19 16:40 I
-rw-r--r-- 1 root root  3 Jul 19 16:40 k
drwxr-xr-x 0 root root  0 Jul 19 16:40 K
-rw-r--r-- 1 root root  3 Jul 19 16:40 l
drwxr-xr-x 4 root root 12 Jul 19 16:41 L
drwxr-xr-x 2 root root  6 Jul 19 16:40 S
drwxr-xr-x 2 root root  3 Jul 19 16:40 T

or
> ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap3
total 7.5K
-rw-r--r-- 1 root root  3 Jul 19 16:40 a
-rw-r--r-- 1 root root  3 Jul 19 16:40 b
-rw-r--r-- 1 root root  3 Jul 19 16:40 ~c
drwxr-xr-x 0 root root  0 Jul 19 16:40 ~C
-rw-r--r-- 1 root root  3 Jul 19 16:40 d
-rw-r--r-- 1 root root  3 Jul 19 16:40 ~f
-rw-r--r-- 1 root root  3 Jul 19 16:40 g
drwxr-xr-x 0 root root  0 Jul 19 16:40 ~G
drwxr-xr-x 0 root root  0 Jul 19 16:41 G
drwxr-xr-x 2 root root  3 Jul 19 16:40 H
drwxr-xr-x 0 root root  0 Jul 19 16:40 I
-rw-r--r-- 1 root root  3 Jul 19 16:40 l
drwxr-xr-x 4 root root 12 Jul 19 16:41 L
drwxr-xr-x 2 root root  6 Jul 19 16:40 S
drwxr-xr-x 2 root root  3 Jul 19 16:40 T

then diving deeper in the subfolder might show:
> ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C
total 1
drwxr-xr-x 0 root root 0 Jul 19 16:40 ~C1
-rw-r--r-- 1 root root 3 Jul 19 16:40 ~cc1

and so on and so forth:
> ls -l /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C/~C1
total 1
-rw-r--r-- 1 root root 6 Jul 19 16:40 ~c2

File content reading is also available. It returns the full(!) file
content in the target snapshot for new/updated files and one in the
initial snapshot for removed files.
E.g.
> less /mnt/mycephfs/dir0/.snap/.~diff=snap1.~diff=snap2/~C/~C1/~c2
snap1

Order of snapshot names in a snapdiff ""query" isn't important - they're
properly sorted properly according to their ids when processed.
Comparing snapshot and live data isn't supported. Byte-level "deltas"
are not supported.

Signed-off-by: Denis Barahtanov denis.barahtanov@croit.io
Signed-off-by: Denis Barahtanov denis.barahtanov@croit.io
@ghost ghost force-pushed the snap-diff branch from 5a8ec2c to 3ea2ef5 Compare September 8, 2021 10:42
@vshankar
Copy link
Contributor

@denisb-croit Thanks for the update. I'll take a look sometime this week.

I could not complete the review this week. Sorry!

Will finish it up next week.

@vshankar
Copy link
Contributor

Backend implementation to be refactored in a way to prepare results in readdir_diff API format and expose that API.
Plus additional proxy on top of it to support "tilda"-formatted access, i.e. handle relevant incoming readdir requests on "tilda URLs", invoke readdir_diff internally and translate its results to "tilda" format.

@denisb-croit Did you miss pushing this change as part of the update or do you plan to do this as a follow-up?

@ghost
Copy link
Author

ghost commented Sep 28, 2021

Backend implementation to be refactored in a way to prepare results in readdir_diff API format and expose that API.
Plus additional proxy on top of it to support "tilda"-formatted access, i.e. handle relevant incoming readdir requests on "tilda URLs", invoke readdir_diff internally and translate its results to "tilda" format.

@denisb-croit Did you miss pushing this change as part of the update or do you plan to do this as a follow-up?

Changes are provided in the different PR #43328.
Which is accumulative (includes both initial snapdiff implementation and new readdir-like API)

@vshankar
Copy link
Contributor

Changes are provided in the different PR #43328.
Which is accumulative (includes both initial snapdiff implementation and new readdir-like API)

Nice. Will take a look and do some tests...

@ifed01 ifed01 mentioned this pull request Oct 14, 2021
3 tasks
@ghost
Copy link
Author

ghost commented Oct 16, 2021

This has been superseded by #43546

@ghost ghost closed this Oct 16, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants