This proposal is alternative to #42201.
When working with pathnames on filesystems, it is common to inspect the relations between two different paths. In prototypes, it is common to assume that pathnames are uniques and thus two pathnames refer to the same file iff the pathnames are equal. Since many applications operate on tree of files, it is also common to check whether a file is part of a specified tree given its root; this can naively be done with strings.HasPrefix(path, root).
In real world, filesystems offer a variety of different types of links which can make things harder. On top of this, Windows and UNIX have different filesystems with different concepts and incompatible APIs. So finding a common semantic (which is one of the goals of path/filepath) is complicated, and Go does not provide useful primitives for this.
In other languages such as C, it became common under UNIX to use a function such as realpath(3) to obtain a so-called "canonical name" of a path. By using such primitive, it is technically possible to implement equality and containment tests in the following way:
- Equality:
realpath(path1) == realpath(path2)
- Containment:
realpath(leaf) starts with realpath(root)
(Notice that there is a different kind of containment test, in which we don't want to follow leaf links, but this can be done with something like realpath(dir(leaf)) starts with realpath(root) so it's not really worth defining a different operation).
There are a few problems with canonicalization:
- It's not really clear what a canonical pathname is. There is no official definition for it. For instance,
realpath does not follow bind mounts. On Windows, there is a syscall called GetFinalPathNameByHandle which provides 4 different canonicalization modes, and it's not immediately clear which one should be the one to use (moreover, to make things more complicated, not all modes work on all different kind of file objects).
- In general, if a user provides a pathname, it expects the application to use that specific way of naming files or directories in prints, logs and error messages. An API to canonicalize filenames encourages an engineering approach in which applications might store canonicalized pathnames, that would provide surprising results if provided back to the users. Again, this is probably less important on UNIX systems, but on Windows it is common for a sysadmin to map network drives such as
K:, while users have absolute no idea what \\server\share means.
My proposal is to avoid the temptation to define a canonicalization function, and rather provide helpers to implement pathname tests for applications, to cover their needs. Thus, I propose to add the following functions:
package filepath
// SameFile checks whether path1 and path2 refer to the same underlying file, after all links
// have been resolved. It is a thin wrapper over os.SameFile, that provides convenience.
// On UNIX, SameFile works across symlinks, hardlinks and bind mounts.
// On Windows, SameFile works across symlinks, hardlinks, junctions, drive mappings.
func SameFile(path1, path2 string) bool
// InTree checks whether the file or directory referenced by leaf is contained within the
// directory tree pointed by root. All links (in either leaf or root) are resolved before checking
// for containment.
// One typical case in which this function is useful is to implement a jail in a virtual filesystem,
// to make sure that no I/O is performed outside of the specified root.
// On UNIX, InTree works across symlinks, hardlinks, and bind mounts.
// On Windows, InTree works across symlinks, hardlinks, junctions, drive mappings.
func InTree(leaf, root string) bool
It seems to me that these two functions should provide two primitives that can be used to solve canonicalization problems. I would be interested in hearing use cases which are not covered by these two functions but are solved by realpath or GetFinalPathNameByHandle.
This proposal is alternative to #42201.
When working with pathnames on filesystems, it is common to inspect the relations between two different paths. In prototypes, it is common to assume that pathnames are uniques and thus two pathnames refer to the same file iff the pathnames are equal. Since many applications operate on tree of files, it is also common to check whether a file is part of a specified tree given its root; this can naively be done with
strings.HasPrefix(path, root).In real world, filesystems offer a variety of different types of links which can make things harder. On top of this, Windows and UNIX have different filesystems with different concepts and incompatible APIs. So finding a common semantic (which is one of the goals of
path/filepath) is complicated, and Go does not provide useful primitives for this.In other languages such as C, it became common under UNIX to use a function such as
realpath(3)to obtain a so-called "canonical name" of a path. By using such primitive, it is technically possible to implement equality and containment tests in the following way:realpath(path1) == realpath(path2)realpath(leaf)starts withrealpath(root)(Notice that there is a different kind of containment test, in which we don't want to follow
leaflinks, but this can be done with something likerealpath(dir(leaf)) starts with realpath(root)so it's not really worth defining a different operation).There are a few problems with canonicalization:
realpathdoes not follow bind mounts. On Windows, there is a syscall calledGetFinalPathNameByHandlewhich provides 4 different canonicalization modes, and it's not immediately clear which one should be the one to use (moreover, to make things more complicated, not all modes work on all different kind of file objects).K:, while users have absolute no idea what\\server\sharemeans.My proposal is to avoid the temptation to define a canonicalization function, and rather provide helpers to implement pathname tests for applications, to cover their needs. Thus, I propose to add the following functions:
It seems to me that these two functions should provide two primitives that can be used to solve canonicalization problems. I would be interested in hearing use cases which are not covered by these two functions but are solved by
realpathorGetFinalPathNameByHandle.