Define lockable files, make read-only in working copy when not locked by sinbad · Pull Request #1822 · git-lfs/git-lfs

sinbad · 2017-01-02T17:09:55Z

This PR addresses the following locking user story points:

We want to avoid accidentally editing unmergeable files which have not been locked first
The best way to do that before any changes have been made is to make files read-only in the working copy, suggesting to the user that they shouldn't just edit them
Not all LFS files are unmergeable. Therefore we want to identify which file patterns this rule should apply to
Locking / unlocking a file should flip the read-only flag locally
We need to maintain that read-only flag across other git operations

To this end, this PR adds the following features:

Adds a --lockable option to git lfs track. This enables an additional attribute flag lockable which means files matching this pattern will be read-only when not locked.
Adds a post-checkout hook to make files read-only when checked out if needed
Adds a post-commit hook which makes added files read-only after they're committed. New files won't have been checked out but the locking rules should apply to them as soon as they enter git; post-commit is the event that needs to trigger that transition
Adds a post-merge hook which checks & fixes the read-only flag, which can be reset during merge (post-checkout unfortunately does not get called)

On this basis we can be reasonably confident that files marked as lockable will be kept read-only in the working copy unless the user locks them, or overrides this flag themselves.

There are a couple of caveats:

Some apps blatantly override read-only flags without asking. Some Apple tools do this. There's not a great deal we can do about this; this feature is mainly an advisory / reminder. Luckily it's not many tools.
git reset <file> can override the read-only flag and there's no hook to catch it. However, the only reason to use that is to undo edits you've already made, so it's reasonable to assume you'd only have changes to undo if you'd already locked the file. Other reset routes involve a ref checkout so are caught by the post-checkout hook.

The behaviour can be disabled by setting env var GIT_LFS_SET_LOCKABLE_READONLY to false, in case there are environments where you don't want this (e.g. automated scratch copies)

Hope this all makes sense!

NB: This PR targets a new integration branch, locking-master-2 since I can't fast-forward locking-master to master because of the review checks on it. I'd be fine with targeting locking-master as well if that could be brought forward to latest master, I just can't do that as I have no admin access.

Also add functions to fix writeable flag based on these patterns

…ehaviour

… too

Also requires dropping recursive option which is fine since always recursive

shana · 2017-01-05T15:25:49Z

This is great! The fact that it's opt-in and per file/file type and can be globally disabled is excellent, it means that if a certain tool/workflow just breaks because of the readonly flags, it can be turned off while still being useful for tools that can take advantage of it.

Can we have some sort of "reset" command for git-lfs that restores readonly flags to all files that should have them (given env variables and lockable flag attributes) but for some reason don't? This would be useful for the scenario where a reset happens and a tool that is lfs-locking-aware detects it and wants to set the correct readonly states back. A tool would need to verify the environment, lockable configuration and locked files list to figure out which files need the flag reset. Would a git checkout [lfs file] restore it?

sinbad · 2017-01-05T17:13:24Z

@shana glad you like it! I've based this on my own experience in game projects, hoping it pans out for others.

A git checkout will reset the readonly flags so I didn't think there was a need for a separate command (it's done by the post-checkout hook). If you checkout individual files then actually the whole repo has to be scanned, since git doesn't tell hooks what files were checked out (only when you checkout a ref can we optimise it to just files that changed between the refs, since we get that info). However the new repo file scanning tools are extremely fast so this shouldn't be an issue.

technoweenie

This looks pretty cool. I think the track command seriously needs to be rewritten, but not in this PR :)

I also second @shana's request for a command to check/reset the readonly settings. It may make sense to add this to git lfs checkout, though that could overload its purpose (which right now, is to replace any LFS pointer files with the content from .git/lfs/objects).

technoweenie · 2017-01-05T16:52:24Z

commands/command_post_checkout.go

+
+	// Skip this hook if no lockable patterns have been configured
+	if len(lockClient.GetLockablePatterns()) == 0 ||
+		!cfg.Os.Bool("GIT_LFS_SET_LOCKABLE_READONLY", true) {


I think it makes more sense if this check is moved higher, so it can skip initializing the lock client if readonly files are disabled.

technoweenie · 2017-01-05T16:55:10Z

docs/man/git-lfs-config.5.ronn

+  This environment variable controls whether files marked as 'lockable' in 
+  `git lfs track` are made read-only in the working copy when not locked by 
+  the current user. The default is `true`, you can disable this behaviour and
+  have all files writeable by setting this variable to 0, 'no' or 'false'.


Any thoughts on adding a git config value? That would let users configure this per repo, if they wanted.

Oh yes indeed, good point, having granularity at the repo level on whether this is enabled or not would be very welcome.

Good shout, adding that now and a wrapper Configuration method to combine the two.

technoweenie · 2017-01-05T16:58:01Z

git/attribs.go

+	return paths
+}
+
+func findAttributeFiles(workingDir, gitDir string) []string {


Would returning an unbound channel here let this start parsing the .gitattributes contents while tools.FastWalkGitRepo is doing its thing?

Unfortunately this is not possible; see the end of the function where the order of the files is reversed to ensure the correct precedence of attributes is maintained.

technoweenie · 2017-01-05T17:07:35Z

locking/lockable.go

+	// Always make non-nil even if empty
+	c.lockablePatterns = make([]string, 0, 10)
+
+	paths := git.GetAttributePaths(config.LocalWorkingDir, config.LocalGitDir)


This brings back the config import to the locking package. I'd prefer locking.Client get LocalWorkingDir and LocalGitDir properties, set in locking.NewClient(*config.Configuration).

The API refactor will move that to this commands helper.

OK; even after doing this it will still require the config package, until the per-package config comes in, but at least it puts it in one place ready for that I guess.

I'm with @technoweenie, let's keep this import light.

technoweenie · 2017-01-05T17:11:36Z

locking/locks.go

+
+	// Make non-writeable if required
+	if c.IsFileLockable(path) &&
+		config.Config.Os.Bool("GIT_LFS_SET_LOCKABLE_READONLY", true) {


I'd like this to be a property on locking.Client too, set with the default value in locking.NewClient().

technoweenie · 2017-01-05T19:03:22Z

One question: is it possible to add a lockable, but not LFS file? Should it be? I suppose the entry would just look something like: *.yml -text lockable

shana · 2017-01-05T19:53:06Z

One question: is it possible to add a lockable, but not LFS file? Should it be? I suppose the entry would just look something like: *.yml -text lockable

So this would be really interesting to have, for this: There are files that are technically hard to merge by git or by humans (due to unpredictable serialization behaviours requiring smart merge tools that can be brittle), but are not necessarily binary files (or they are binary files but very small ones). Putting these in LFS for the purposes of having a locking feature for them adds a lot of overhead that isn't really needed, and in the case of text files that are just hard to merge, is actual a detriment to the process (turning a text file into a binary blob unnecessarily).

sinbad · 2017-01-06T09:29:15Z

One question: is it possible to add a lockable, but not LFS file? Should it be?

It's not possible at the moment. I think you could probably find some cases where this would be useful but they'd be a minority; most of the time it's binary files that you'd want in LFS anyway I think. Adding this as an orthogonal concept would make things more complex (and may even suggest making lockable a separate tool to git lfs) and I don't think that's justified.

sinbad · 2017-01-06T09:31:31Z

I also second @shana's request for a command to check/reset the readonly settings. It may make sense to add this to git lfs checkout, though that could overload its purpose (which right now, is to replace any LFS pointer files with the content from .git/lfs/objects).

I mentioned you can trigger this via git checkout already but my comment may have overlapped with yours. Do you still think it needs a separate command?

…laces `locking` still depends on `config` but will be refactored out later

ttaylorr

Looking good. I left a few comments that I'd love your thoughts on, and then I'll give this another review.

ttaylorr · 2017-01-06T22:09:20Z

git/attribs.go

+				if reldir := filepath.Dir(relfile); len(reldir) > 0 {
+					pattern = filepath.Join(reldir, pattern)
+				}
+				lockable := strings.Contains(line, LockableAttrib)


This function looks good, but I'm thinking that this line would lock a file named "lockable" (or one that contains the substring "lockable") as lockable on disk, even if the user didn't want it to be. This is sort of edgecase-y, but I think it's still behavior worth checking. Maybe a more proper .giattributes parser would be worth investigating?

That's a good point, if the file entry in .gitattributes was specific enough that could be a problem - it's not likely but could be a nasty edge case in that 0.0001% and experience says that's worth addressing.

Turns out there's a cheap way to do it, we're already splitting on whitespace so just need to check fields[1:] instead. .gitattributes always splits on plain whitespace (you have to use [[:space:]] in file patterns) so this works.

ttaylorr · 2017-01-06T22:13:37Z

errors/errors.go

 }
+
+func Combine(errs []error) error {
+	if len(errs) > 0 {


Nit: I'd prefer a smaller if statement here:

if len(errs) == 0 { return nil } // ...

It'd be nice if we didn't need the if statement at all, but I think this is at least a small step forward 👍

👍 I submit PRs to projects with opposing code styles for this case and sometimes I forget 😄

ttaylorr · 2017-01-06T22:14:08Z

locking/lockable.go

+// marked as lockable
+func (c *Client) GetLockablePatterns() []string {
+	c.ensureLockablesLoaded()
+	return c.lockablePatterns


This read isn't guarded by a mutex-lock. I think using a sync.RWMutex would be appropriate here.

A read-only mutex here wouldn't do anything useful unless the caller also locked it.

ensureLockablesLoaded ensures that c.lockablePatterns is only updated by one thread, and refreshLockablePatterns deliberately replaces the entire map (rather than clearing and re-populating it) so that previous callers to GetLockablePatterns can continue to use the old map. Therefore a read-only mutex is not required provided the return from GetLockablePatterns is not modified. Unfortunately Go doesn't have a const marker for returns, I thought about copying the map each time but ultimately thought it wasn't worth it.

ttaylorr · 2017-01-06T22:16:42Z

locking/lockable.go

+	// Always make non-nil even if empty
+	c.lockablePatterns = make([]string, 0, 10)
+
+	paths := git.GetAttributePaths(config.LocalWorkingDir, config.LocalGitDir)


I'm with @technoweenie, let's keep this import light.

ttaylorr · 2017-01-06T22:18:30Z

locking/lockable.go

+// You must have locked the c.lockableMutex in the caller
+func (c *Client) refreshLockablePatterns() {
+	// Always make non-nil even if empty
+	c.lockablePatterns = make([]string, 0, 10)


This can be initialized after calling git.GetAttributePaths to avoid resizing:

paths := git.GetAttributePaths(...) c.lockablePatterns = make([]string, 0, len(paths)) for _, p := range paths { if p.Lockable { c.lockablePatterns = append(c.lockablePatterns, p.Path) } } // ...

ttaylorr · 2017-01-06T22:32:52Z

locking/locks.go


-	return lock, nil
+	// Ensure writeable on return
+	return lock, tools.SetFileWriteFlag(path, true)


I dislike the pattern of returning a single error via a method call in the return signature, because I think this hides too much. I prefer:

if err := tools.SetFileWriteFlag(path, true); err != nil { return Lock{}, err } return lock, nil

I'm sure I've had the opposite feedback before 😆 Fair enough.

ttaylorr · 2017-01-06T22:34:29Z

lfs/hook.go

+func NewStandardHook(theType string, upgradeables []string) *Hook {
+	return &Hook{
+		Type:         theType,
+		Contents:     strings.Replace(hookBaseContent, "__CMD__", theType, -1),


I'd prefer to use a text/template instead of string replacement, since the later is a common Go convention and is well understood by looking at the hookBaseContent variable alone.

That would be a lot of cruft for what will boil down to a global find/replace even with a template. If you like I can make the string look like a template and still use strings.Replace but I see no reason to clutter this with all the init/struct param/execute patterns of templates.

ttaylorr · 2017-01-06T22:36:16Z

commands/command_post_checkout.go

+//
+// This hook checks that files which are lockable and not locked are made read-only,
+// optimising that as best it can based on the available information.
+func postCheckoutCommand(cmd *cobra.Command, args []string) {


This command needs a manpage entry.

ttaylorr · 2017-01-06T22:36:53Z

commands/command_post_checkout.go

+		LoggedError(err, "Warning: post-checkout rev diff %v:%v failed: %v\nFalling back on full scan.", pre, post, err)
+		postCheckoutFileChange(client)
+	}
+	err = client.FixLockableFileWriteFlags(files)


Very nice 👍

ttaylorr · 2017-01-06T22:39:30Z

commands/command_post_merge.go

+//
+// This hook checks that files which are lockable and not locked are made read-only,
+// optimising that as best it can based on the available information.
+func postMergeCommand(cmd *cobra.Command, args []string) {


This command needs a manpage entry.

ttaylorr · 2017-01-10T17:57:30Z

That would be a lot of cruft for what will boil down to a global find/replace even with a template. If you like I can make the string look like a template and still use `strings.Replace` but I see no reason to clutter this with all the init/struct param/execute patterns of templates.

I think that would be a great compromise 👍 .

ttaylorr · 2017-01-10T17:58:50Z

Problem with that is that `grep -c` returns a failure code when there are no results (which we validly test for in many cases), which means you lose the gained clarity trying to ignore that. I think `wc -l` is clearer while still dealing with `0` result cases.

I've had success with stringify-ing it, but either works for me :-) ```sh [ "0" -eq "$(grep -c foo bar)" ] ```

ttaylorr · 2017-01-10T17:59:55Z

Yes, because I'm overwriting the same file. I can't progressively read and overwrite it at the same time 😄

My mistake, I totally missed this on my first read-through of the code.

I could write a new attributes file then delete & rename, but it's more complex and I prefer simpler. I recognise that this way is heavier on memory usage, but `.gitattributes` files are never going to be massive (even a 1MB attributes file would be unthinkable) so it's a not really an issue.

I think this is a great analysis, and OK to leave as-is.

ttaylorr · 2017-01-10T18:00:28Z

Fantastic!

ttaylorr · 2017-01-10T18:01:25Z

A read-only mutex here wouldn't do anything useful unless the caller also locked it. `ensureLockablesLoaded` ensures that `c.lockablePatterns` is only updated by one thread, and `refreshLockablePatterns` deliberately replaces the entire map (rather than clearing and re-populating it) so that previous callers to `GetLockablePatterns` can continue to use the old map. Therefore a read-only mutex is not required provided the return from `GetLockablePatterns` is not modified. Unfortunately Go doesn't have a `const` marker for returns, I thought about copying the map each time but ultimately thought it wasn't worth it.

Sounds good, and thanks for the detailed explanation :-).

Still using strings.Replace, this doesn’t justify using text/template

larsxschneider · 2018-05-09T17:39:23Z

docs/man/git-lfs-config.5.ronn

+  setting, control whether files marked as 'lockable' in `git lfs track` are
+  made read-only in the working copy when not locked by the current user.
+  The default is `true`; you can disable this behaviour and have all files
+  writeable by setting either variable to 0, 'no' or 'false'.


@ttaylorr If the lockable feature is on, then Git LFS cannot skip the post-checkout command. Consequently, Git LFS needs to search for all .gitattributes files to check them for lockable patterns.

My team is working with a large repo (>20k directories, >300k files) and just searching for the .gitattributes file in all directories takes ~6 sec on macOS with a fast SSD. Consider these timings:

$ export GIT_LFS_SET_LOCKABLE_READONLY=1; time git-lfs post-checkout a0c5174a19f2dece44c09bc76f91feb69799e66f a0c5174a19f2dece44c09bc76f91feb69799e66f 0 git-lfs post-checkout a0c5174a19f2dece44c09bc76f91feb69799e66f 0 35.09s user 6.41s system 730% cpu 5.679 total $ export GIT_LFS_SET_LOCKABLE_READONLY=0; time git-lfs post-checkout a0c5174a19f2dece44c09bc76f91feb69799e66f a0c5174a19f2dece44c09bc76f91feb69799e66f 0 git-lfs post-checkout a0c5174a19f2dece44c09bc76f91feb69799e66f 0 0.01s user 0.02s system 89% cpu 0.030 total

I plan to distribute the setting lfs.setlockablereadonly=false in my company gitconfig but it took me a while to figure out the culprit of the slowness. I don't know how widely the lockable feature is used, but I wonder if lfs.setlockablereadonly=falseis a more sensible default? Alternatively we could find ways to speed up the .gitattributes discovery.

I don't know how widely the lockable feature is used, but I wonder if lfs.setlockablereadonly=false is a more sensible default?

I think =true is an important default value to keep. Here are a few thoughts on how we can get the best of both worlds:

Speed up the time it takes to search for .gitattributes. This has been optimized by @sinbad in Fast directory walk #1616. I think that this work could be augmented by first stat()-ing directories to search, and then seeing if the stat is out-of-date with the last time that the directory was traversed, effectively memoizing the search. I presume that this would improve the speed a great deal.

Warn when the search is taking too long. We can set a timer (similar to what Microsoft does with GVFS when the --ahead-behind check takes too long) and warn if Git LFS is unable to determine the set of lockable files quickly.

What do you think? Perhaps we should have this discussion in a new issue, as well.

In commit d91980f of PR git-lfs#1822 the GetFilesChanged() function was updated to use "git diff-tree" rather than "git diff". However, the relevant error messages were not updated at the same time, so we do that now.

In PR git-lfs#1870, which was based on the work in PR git-lfs#1822, we introduced support for "lockable" files whose permissions would be managed by several new Git LFS commands such as "git lfs post-checkout" and "git lfs post-merge", both of which we register as Git hooks. In commit 585befa of PRs git-lfs#1822 and git-lfs#1870 we added manual pages for these two commands. (Although these PRs also introduced the "git lfs post-commit" command, they did not include a manual page for it, so we added one belatedly in commit ae5d5b0 of PR git-lfs#4052.) From the time it was written, our git-lfs-post-checkout(1) manual page has indicated that the command accepts three arguments. This accurately reflects the fact that Git will invoke our hook command with parameters specifying the SHAs from before and after the checkout operation, plus a status flag indicating whether a branch checkout was performed or not. However, our manual page does not explain what values these arguments are expected to contain, other than by dint of the names given to the arguments in the page's synopsis section. We therefore expand our description of the "git lfs post-checkout" command to detail the values Git will provide in each of the three arguments. We also take the opportunity to adjust the name of the second argument to clarify that we expect this parameter to contain a Git revision SHA and not the name of a Git reference such as a branch or tag. As well, we replace the underscores in the argument names with hyphens to match the style we use in all our other manual pages for the names of command parameters.

In PR git-lfs#1870, which was based on the work in PR git-lfs#1822, we introduced support for "lockable" files whose permissions would be managed by several new Git LFS commands such as "git lfs post-checkout" and "git lfs post-merge", both of which we register as Git hooks. In commit 585befa of PRs git-lfs#1822 and git-lfs#1870 we added manual pages for these two commands. (Although these PRs also introduced the "git lfs post-commit" command, they did not include a manual page for it, so we added one belatedly in commit ae5d5b0 of PR git-lfs#4052.) From the time it was written, our git-lfs-post-merge(1) manual page has indicated that the command accepts a single "<is_squash>" argument. This accurately reflects the fact that Git will invoke our hook command with a single flag parameter that indicates whether or not a squash merge was performed. However, our "git lfs post-merge" command disregards this parameter entirely, so we adjust our manual page now to clarify that this command requires no arguments and ignores the one provided by Git.

sinbad added 21 commits December 19, 2016 12:39

Add utility function to enable/disable write flag on files

fe3288a

Add gitattributes reading to git package & support lockable attrib

22b1b58

Add knowledge of cached lockable patterns to locking.Client

08e3e5b

Also add functions to fix writeable flag based on these patterns

Add the ability to change lockable attribute to git lfs track

9891945

Document GIT_LFS_SET_LOCKABLE_READONLY which will control read-only b…

1b229f4

…ehaviour

Make lockable files read-only on unlock & writeable on lock

2e0dcc4

Merge branch 'master' into locking-wip/lockable-read-only

b004f90

Add GetFilesChanged to list files changed between 2 commits

097a22a

Generalise hook setup so common parts can be re-used

f7e3ced

Add post-checkout hook to make lockable files read-only if not locked

7ab2492

Add post-checkout test

3c7dd3a

Add more tracing

be96882

Ignore missing files when making unlockable files writeable

0af7735

GetFilesChanged now supports getting files changed at a single commit…

d91980f

… too

Add post-commit hook to make newly added lockable files read-only

268e978

post-checkout should skip if GIT_LFS_SET_LOCKABLE_READONLY is false

ad489d7

Use nil instead of empty function

24dc9c8

Switch to FastWalkGitRepo for fixing up write flags

eeff764

Also requires dropping recursive option which is fine since always recursive

Explain the need for the post-commit hook (added files)

62c8118

Do full scan in post-checkout in clone case (all zeroes from rev)

88f17d1

Add post-merge hook to make files read-only again if not locked

e1068a2

sinbad requested review from technoweenie and ttaylorr January 2, 2017 17:09

technoweenie suggested changes Jan 5, 2017

View reviewed changes

sinbad added 4 commits January 6, 2017 09:37

Move readonly config check to before lock client instantiation

581e109

Support disabling read-only behaviour via gitconfig as well

43acc16

Store working/git dirs in lock client to avoid using config in many p…

6f3aa52

…laces `locking` still depends on `config` but will be refactored out later

Shadow SetLockableFilesReadOnly config setting in lock client

ea40bc9

ttaylorr suggested changes Jan 6, 2017

View reviewed changes

sinbad added 6 commits January 9, 2017 10:54

Deal with edge case of file pattern which includes the word “lockable”

1856693

Early return instead of nesting (code style)

c99d924

Better guide size for lockable patterns

b4d42ca

Create assert_attributes_count helper to tidy up lockable tests

3652681

Return style change per PR feedback

0b12dfd

Add man pages for post-checkout and post-merge commands

585befa

sinbad mentioned this pull request Jan 9, 2017

Proposal: locking next steps - serialising changes on lock #1844

Open

ttaylorr approved these changes Jan 10, 2017

View reviewed changes

technoweenie approved these changes Jan 10, 2017

View reviewed changes

Use template style replacements in hooks

d60368c

Still using strings.Replace, this doesn’t justify using text/template

sinbad merged commit ab839a4 into locking-master-2 Jan 16, 2017

sinbad deleted the locking-wip/lockable-read-only branch January 16, 2017 11:33

sinbad mentioned this pull request Jan 16, 2017

Define lockable files, make read-only in working copy #1870

Merged

larsxschneider reviewed May 9, 2018

View reviewed changes

larsxschneider mentioned this pull request Jun 6, 2018

git-lfs performance without objects #3042

Closed

chrisd8088 mentioned this pull request Apr 14, 2022

Update and correct several error message strings #4943

Merged

Conversation

sinbad commented Jan 2, 2017

Uh oh!

shana commented Jan 5, 2017

Uh oh!

sinbad commented Jan 5, 2017

Uh oh!

technoweenie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

technoweenie commented Jan 5, 2017

Uh oh!

shana commented Jan 5, 2017

Uh oh!

sinbad commented Jan 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sinbad commented Jan 6, 2017

Uh oh!

ttaylorr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

Uh oh!

ttaylorr commented Jan 10, 2017 via email

Uh oh!

ttaylorr commented Jan 10, 2017 via email

Uh oh!

sinbad commented Jan 6, 2017 •

edited

Loading