Skip to content

fs: filehandle-based File#5160

Draft
MichaelEischer wants to merge 15 commits intorestic:masterfrom
MichaelEischer:fs-fd-handles
Draft

fs: filehandle-based File#5160
MichaelEischer wants to merge 15 commits intorestic:masterfrom
MichaelEischer:fs-fd-handles

Conversation

@MichaelEischer
Copy link
Copy Markdown
Member

@MichaelEischer MichaelEischer commented Nov 30, 2024

What does this PR change? What problem does it solve?

Currently the archiver collects metadata and the file content in multiple steps during which a file could disappear or be renamed. The underlying reason for this is that files are currently access in multiple steps (stat, xattrs etc.) using their filepath. This PR introduces a new code path that instead opens a filehandle once and uses that to collect metadata and file content. This ensures that the file cannot disappear in the middle of the operation and ensures that the metadata actually belongs to the intended file.

The implementation turned out to be far more complicated than I've expected. Unfortunately, there is no standardized way across operating systems to open filehandles for arbitrary filetypes (well, reading all required metadata from a filehandle is also much more complex than it should be). In particular, symlinks turned out to be a major problem. Each filesystem API is broken or weird in its own way:

  • Linux: the only option is to open the symlink using the O_PATH|O_NOFOLLOW flags. However, this filehandle cannot be used to retrieve xattrs. Luckily there is a workaround: read the xattrs from /proc/self/fd/%d. Reading the file content requires atomically reopening the file, this is possible by opening the just mentioned procfs filepath. Linux before 3.6 does not support the required syscalls (fstat on a file handle opened using O_PATH)
  • Windows: most file operations are already filehandle based, making the implementation rather straightforward. According to the Windows docs, a cloud file is usually not downloaded when opening a metadata-only filehandle. To read from this filehandle later on, it must be reopened, which seems to be possible using ReOpenFile. Symlinks are relatively straightforward to handle by specifying OPEN_REPARSE_POINT where necessary.
  • macOS: Symlinks can be opened using the O_SYMLINK flag. There is no concept of a metadata-only filehandle, so the file has always be opened for reading. This could become a problem if restic is not allowed to do that. However, this case has already resulted in an error in the past. xattrs can be read from this filehandle. In contrast, reading the symlink target from a filehandle is only possible since macOS 13, but is not exposed in Go, therefore requiring a fallback.
  • Other BSDs: This is a total mess. Not all BSDs can even open a filehandle to a symlink (looking at you OpenBSD) and those that do use various different solutions. Thus to keep my sanity, there's no support for a filehandle based archiver on those OSes.

This leads to a few requirements.

  • filehandle versus path-based (the existing variant) implementation must be selectable at runtime
  • A metadata-only filehandle must be openable for arbitary filetype
  • A read filehandle must be supported for file/directories
  • Atomically reopening a metadata-only filehandle must be possible for files/directories. (the filehandle must refer to the same file before and afterwards)

For this purpose, the implementation introduces an new metadataHandle which is used by nodeFromFileInfo to collect all metadata. The interface is implemented by fdMetadataHandle (the just describe filehandle approach) and pathMetadataHandle. The active implementation can be selected at runtime.
fs.Local internally used a localFile type. It is now wrapped by either fdLocalFile or pathLocalFile. Those two structs implement the open and reopen lifecycle described in the requirements.

The PR copies a lot of code for freadlink from the Go standard library as it unfortunately only exposes a path based interface, whereas this PR absolutely require filehandles.

As a drive-by bugfix, the PR fixes xattr retrieval if a component of the backup source paths is a symlink. For example if /test is a symlink to /example which context /example/file. Then a backup of /test/file should retrieve all metadata for /test from /example. However, only basic metadata was collected from /example whereas the xattrs were read from /test.

Remaining TODOs

  • Add runtime check for Linux <3.6
  • Fix test failures
  • Manually test whether ReopenHandle on Windows works for non-NTFS filesystems.

Was the change previously discussed in an issue or on the forum?

Part of #5021
Builds upon #5143

Fixes #3098
Fixes #2165

Checklist

  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I'm done! This pull request is ready for review.

@MichaelEischer MichaelEischer changed the title fs: filehandle-based archiver fs: filehandle-based File Nov 30, 2024
@MichaelEischer MichaelEischer force-pushed the fs-fd-handles branch 2 times, most recently from 1b3915a to a149c40 Compare November 30, 2024 18:10
Add the skeleton, but leave filling in the details to later commits.
The implementations are 90% copy&paste from the go standard library as
the existing code does not offer any way to read the symlink target
based on a filehandle.

Fall back to a standard readlink on platforms other than Linux and
Windows as those either don't even provide the necessary syscall or in
case of macOS are not yet available in Go.
The xattrs are read from /proc/self/fd/%d but instead show the original
file path to make the error message more useful.
@MichaelEischer MichaelEischer marked this pull request as draft February 5, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ignore disappeared source files lstat No Such file or directory

1 participant