Skip to content

fileutils: new function to check for path existence#1875

Merged
openshift-merge-bot[bot] merged 13 commits intocontainers:mainfrom
giuseppe:path-exists
Apr 9, 2024
Merged

fileutils: new function to check for path existence#1875
openshift-merge-bot[bot] merged 13 commits intocontainers:mainfrom
giuseppe:path-exists

Conversation

@giuseppe
Copy link
Copy Markdown
Member

@giuseppe giuseppe commented Apr 4, 2024

On unix it uses the access syscall which is faster than stat for simply checking the existence of a file.

with a file in the cache I see on my machine:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/containers/storage/pkg/fileutils
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkPathExists-12 1000000 1037 ns/op
BenchmarkStat-12 897534 1330 ns/op

I'd expect even better results if the file is not already in the cache.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

}

// PathExists checks whether a file or directory exists at the given path.
func PathExists(path string, followSymlinks bool) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the name of this function just return an err.

err:=fileutils.Exist(PATH);
err = nil -> True
err == ENOEXIST == False
err == (!nil && not ENOEXIST) == err.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps have a fileutils.Lexist(PATH) for not follow symlinks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the type of the returned value because it is easier to replace the existing calls to os.Stat(). Every existing call to _, err := os.Stat(...) simply becomes err := fileutils.PathExists(..., true) and the existing code for handling err is the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the difference, since my function will do the same, except there is two functions, Exists, and Lexists, which is similar to Stat and Lstat.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a new version with Exist and Lexist

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 5, 2024

LGTM, can you squash the commits.

@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Apr 5, 2024

I think keeping them separate helps with the review and potentially the backport. That's what @mtrmac asked me to do last time I've sent a big patch

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 5, 2024

Except that you are really only making a single change here. Adding two functions and then using them everywhere. I don't see any difference between the changes, so no need to split them out.

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 5, 2024


// Exist checks whether a file or directory exists at the given path.
func Exist(path string) error {
_, err := os.Stat(path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golang.org/x/sys/unix defines an Faccessat() on Darwin. What's the rationale for not using it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no way to test it (and if it is worth in terms of performance)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it works the same (the test just needs the OS check removed), and

BenchmarkExist-10    	  413298	      2890 ns/op
BenchmarkStat-10     	  333810	      3559 ns/op

… but also, the tests of that package are now failing on macOS (because /tmp is a symlink), we are clearly not running them.

So, at this moment I’d also lean towards keeping the simplest possible implementation. (Or, alternatively, we could start testing things in CI, but I can’t build a case for doing that based on this PR alone.)

Maybe add a todo here to help future maintainers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for testing it, I will add it to Darwin too

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Efficiency improvements are always nice
  • Looking at git grep -C5 '_, .*os.Lstat' and it grep -C5 '_, .*os.Stat', there seem to be more places where this could be used — e.g. drivers/aufs, and other files in pkg/archive. That’s not blocking this PR, of course.
  • (Aesthetically, the name feels a bit off to me. fileutils.{Exists,Lexists} would be a small improvement, but really I’d expect a function named like that to return a bool, not an error. OTOH the returned error is clearly useful… and I can’t think of a good name. EnsurePathExists is so ugly.)
  • Do we want this in a ~stable public API of the module? If it is expected to be beneficial in other repos like Podman, sure, let’s make it stable. OTOH making it private would make the naming concerns much less relevant. I don’t really have an opinion here, I just want to make sure this is an intentional decision.

@mtrmac
Copy link
Copy Markdown
Collaborator

mtrmac commented Apr 5, 2024

I think keeping them separate helps with the review and potentially the backport. That's what @mtrmac asked me to do last time I've sent a big patch

My personal view of this is that it is ideal if there is one idea per patch. Introduce one new feature. Fix one bug, or a homogenous class of bugs. Do one refactor. (Ideally the refactors are so trivial that in case of a merge conflict, they could just be thrown away and made again just based on the commit description.)

In that sense, the migration to the new API could well be a single commit. OTOH more commits never really hurt — and they make allocating time for reviews a tiny bit easier, because I know there are natural stopping points and I know before starting to look at that patch that I can make definite progress in 15 minutes, (as opposed to allocating an hour or two to stare at a huge do-12-things commit; that can happen only a few times a day).

Or course, everyone thinks and works a bit differently. Seeing small commits probably makes me more likely to take on the review, and to do it quickly — OTOH there is a definite cost to creating those commits in the first place.

@giuseppe giuseppe force-pushed the path-exists branch 3 times, most recently from 872d522 to b1b8f9a Compare April 8, 2024 10:05
@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Apr 8, 2024

  • Looking at git grep -C5 '_, .*os.Lstat' and it grep -C5 '_, .*os.Stat', there seem to be more places where this could be used — e.g. drivers/aufs, and other files in pkg/archive. That’s not blocking this PR, of course.

I've fixed these callers too.

  • (Aesthetically, the name feels a bit off to me. fileutils.{Exists,Lexists} would be a small improvement, but really I’d expect a function named like that to return a bool, not an error. OTOH the returned error is clearly useful… and I can’t think of a good name. EnsurePathExists is so ugly.)

Renamed the functions to `fileutils.{Exists, Lexists}.

My first version of the API looked like func PathExists(path string) (bool, error) but it made more difficult and error-prone to fix the current callers (if we really wanted to use the found return value). With the current API instead we can mechanically replace the current code with _, err := os.Stat(...) => err := fileutils.Exists(...) and _, err := os.Lstat(...) => => err := fileutils.Lexists(...) without changing any of the conditions in place.

  • Do we want this in a ~stable public API of the module? If it is expected to be beneficial in other repos like Podman, sure, let’s make it stable. OTOH making it private would make the naming concerns much less relevant. I don’t really have an opinion here, I just want to make sure this is an intentional decision.

Yes, let's make it stable and fix the same pattern in other repos.

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; let’s maybe wait a few hours to confirm consensus on the macOS situation.

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than that one thing, feel free to merge without another review.

giuseppe added 9 commits April 8, 2024 16:07
On unix they use the access syscall which is faster than stat for
simply checking the existence of a file.

with a file in the cache I see on my machine:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/containers/storage/pkg/fileutils
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkExists-12    	  925832	      1245 ns/op
BenchmarkStat-12      	  745896	      1602 ns/op
PASS

I'd expect even better results if the file is not already in the cache.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added 4 commits April 8, 2024 16:07
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Apr 8, 2024

@nalind PTAL

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Apr 9, 2024

/approve
/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@giuseppe
Copy link
Copy Markdown
Member Author

giuseppe commented Apr 9, 2024

fix for c/common: containers/common#1953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants