WIP (!) Locking read-only behaviour by sinbad · Pull Request #1590 · git-lfs/git-lfs

sinbad · 2016-10-17T16:54:56Z

OK this branch has been running for too long and it's still not ready yet 😞 Not ready for merging, this is just for visibility/discussion.

The general thrust of this work is to try to expand on the locking feature to assist with workflow, to avoid accidents that require work to be lost later. This part of the work is about setting files to be read-only when they're marked as needing locking, and are not locked by the current user, as a kind of 'rumble strip' to hopefully stop you from accidentally doing work on a file before you realise you should have checked the locking situation. This has required the introduction of a bunch of new concepts:

A custom attribute lockable, which can be enabled / disabled on a file pattern using git lfs track --lockable. Files flagged like this are assumed to need locking because they're unmergeable (some LFS files may actually be mergeable e.g. text data)
git lfs lock|unlock now changes the write flag of these files
post-checkout and post-commit hooks to make files read-only when they're marked as lockable and not currently locked by this user. post-merge is still TODO
A local cache of locked file state to make it feasible to perform the above without hitting the server (uses boltdb)

Open questions / risk factors right now:

Practical performance of keeping tabs on write flags in practice on large repos
Gaps in available hook behaviour that may make write flag state not 100% watertight; known cases discussed in proposals/locking.md
Edge cases / undefined behaviour with read-only working copy files

I think this work is going to be required to make locking workable in practice but it's not there yet. Feel free to comment but I'm not proposing we merge this yet.

Since this can now modify lines as well as append them, the way the file is updated has changed a bit.

Do this with an incoming list of patterns which have been made unlockable so that we can use this in `untrack` or `track —not-lockable` but not test every single non-lockable file every time

Install of hook still TODO

… too

technoweenie · 2016-10-19T18:03:03Z

Thanks for submitting this. I have a review in progress. Though my first question is: why bolt? That seems like a pretty heavy dependency for caching. It seems like we could get away with flat files:

# key is hashed path abcdef...
$ cat .git/lfs/lockcache/path/ab/cd/abcdef...
123

$ cat .git/lfs/lockcache/id/123
abcdef...

That seems simple, but probably very inefficient for large numbers of locks...

sinbad · 2016-10-20T08:24:18Z

Though my first question is: why bolt?

Yeah, I know what you mean. Mostly because if I was going to do it, it would be a single file loaded into memory once & written out once at the end too (separate files for each lock is overkill since all you need is a boolean). Given that bolt already does this and has handling in place for parallel processes trying to access that file already (parallel git processes), seemed like not reinventing was preferable, especially since to do a good job handling all the edge cases would have distracted me from what I was actually trying to do.

If we decide this whole PR is viable as an approach and that bolt is a bit heavyweight we can replace it for the final version.

technoweenie

This PR is way too big to review (beyond just chrome having issues rendering this page 😉). I'd really like to see it split out into individual components that we can discuss separately. Some components that I notice right away:

locking package - In general, 👍 on extracting that out. I really want to get away from static caches (like locking.cachedLockablePatterns) in all packages except commands.
lock cache (boltdb or whatever)
track command additions
post-commit hook
post-checkout hook

I think the general idea here is really solid though 👍 I'm really happy that you're digging into this aspect of locking. I really liked the doc updates too, with the known limitations that you've written down.

technoweenie · 2016-10-19T17:50:01Z

locking/cache.go

+		id2path, err := tx.CreateBucketIfNotExists(idToPathBucketName)
+		if err != nil {
+			return err
+		}


@ttaylorr suggests creating the two buckets in initDb() instead of here, every time a lock is cached. By doing this in the sync.Once, a git-lfs process with a locked db only has to check for bucket existence once.

Also: any reason this isn't using api.Lock instead of the filepath and id strings? Two consecutive arguments of the same type force callers to look up the documentation.

I think when I do this over I'm going to replace boltdb, but for reference, the reason for using CreateBucketIfNotExists is that you have to Get the bucket inside the transaction anyway because you're not allowed to store bucket references outside of transactions in Bolt. So there's no benefit to having a separate creation process done once at init because you can't hold on to the reference anyway. You might as well not have separate code to create and get when one call does either/or.

I didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures; if I did that anything calling it then depends on the api package and from experience that sort of thing leads to eventual circular deps. If you'd prefer a structure I'd introduce a new one in this package.

didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures

👍 I'm thinking we could even eventually get rid of api. Callers should just be using packages like locking, transfer, batch, etc.

technoweenie · 2016-10-19T17:54:03Z

locking/cache.go

+		path2id := tx.Bucket(pathToIdBucketName)
+		id2path := tx.Bucket(idToPathBucketName)
+		if path2id == nil || id2path == nil {
+			return nil


I think this should be an error if either bucket does not exist. Especially if the sync.Once is creating the buckets as suggested above.

technoweenie · 2016-10-24T14:20:30Z

commands/command_post_checkout.go

+}
+
+func init() {
+	RegisterCommand("post-checkout", postCheckoutCommand, func(cmd *cobra.Command) {


Ew, RegisterCommand() should probably just take a nil callback :)

sinbad · 2016-10-24T14:39:53Z

This PR is way too big to review (beyond just chrome having issues rendering this page 😉). I'd really like to see it split out into individual components that we can discuss separately

Yep, I totally agree 😞 Unfortunately I needed to hit a certain number of features before I could even demo where I was trying to go with it, which was necessary both for my own evaluation of the approach and to be able to justify it to anyone else. This PR wasn't for merging or proper review, more for "does going down this path look reasonable or am I insane". I'm glad you think that on principle it seems worth carrying on with.

I'm going to go a few steps further in this branch to try to dig to the bottom of a few of the concerns I still have (like performance and potential unknown edge cases with read-only files). Then when I'm happy it's practical I'll do the whole thing over as smaller PRs.

sinbad · 2016-10-24T14:43:39Z

I really want to get away from static caches (like locking.cachedLockablePatterns) in all packages except commands.

Can you explain why? It would be vastly less performant not to cache these patterns, and passing caches backwards and forwards to command breaks encapsulation. Unless you mean having some sort of encapsulated extensible 'state' object which command manages the lifecycle of but which is opaque to it so that other packages can keep data in it for performance reasons?

technoweenie · 2016-10-24T14:57:09Z

I think of packages like locking, scanner, etc, as libraries that other go devs could use for their own lfs apps. I'd prefer to have GetLockablePatterns() return some *locking.Lockables struct, with funcs like:

func (*Lockables) IsLockable(path string) bool

// replace locking.FixLockableFileWriteFlags()
func (*Lockables) FixWriteFlags(files []string) error

Would something like that work? I wonder if this and config.GetAttributePaths() could go into their own package? attributes.GetPaths(gitDir, workingDir string) *attributes.Paths

sinbad · 2016-10-24T15:02:39Z

Closing; now I'm relatively happy this is a sane direction it will return as several smaller PRs later.

sinbad · 2016-10-24T15:05:43Z

I'd prefer to have GetLockablePatterns() return some *locking.Lockables struct, with funcs

Hmm, I guess so, more of an OO approach then. It does imply either most of the package interface would become part of that struct (since a lot of it depends on the lockables list), or that it becomes a required function parameter.

technoweenie · 2016-10-25T22:50:40Z

Yes, I'd prefer a more OO approach. It will give people more flexibility if they want to build on the internal LFS packages, as well as making testing more straightforward.

Can you submit the doc changes as a small PR so we can retain that valuable info? If not, I'll just plagiarize it :)

sinbad · 2016-10-26T10:31:21Z

Sure, I'll do a separate PR in a sec.

sinbad added 30 commits October 17, 2016 17:32

WIP locking workflow improvements

f120a30

Generalise ChannelWrapper in tools package, including base impl

f28db88

Create a locking package to support functionality beyond just api calls

60b8452

First pass at implementing lock cache using boltdb

cd55c6e

Ensure cache db dir exists

b130c6f

Basic cache test

e883280

Must cleanup

c2c534b

Emulate lock server to test fetching locks to the cache

8bf5e45

Don’t need HttpLifecycle base for test lifecycle any more

3e208b3

Remove cfg arguments, they weren’t doing anything with the API as-is

51b9514

Implement query for locally cached locks & test

d74831e

Utility method to turn on/off write bit on files

ef3ff04

Define interface on lfs track for locking

de7c007

Add ‘lockable’ attribute to git lfs track

50942aa

Since this can now modify lines as well as append them, the way the file is updated has changed a bit.

Implement enabling / disabling lock in existing tracked files, add test

e74b3ae

Move reading of attributes files to config

5ee6653

Cache gitattributes patterns and determine if files are lockable

be77b5b

Don’t chmod if not necessary

ea242e9

Make lockable files writeable on lock, read-only on unlock

2dd54e0

Query func to check if a specific file is locked in the cache

d027208

Func to scan dirs & set write flag according to lockable & cached lock

5217286

Refactor wildcard matching into general utility function

c3bdcc7

Enable fixing of write flags on files where lockable attrib is removed

3d8684a

Do this with an incoming list of patterns which have been made unlockable so that we can use this in `untrack` or `track —not-lockable` but not test every single non-lockable file every time

Changing lockable flag in track now changes read-only for matching files

124feb3

Move StringChannelWrapper to tools for re-use

d058919

Add git utility to list changed files between 2 commits

0c4251e

Add post-checkout command which can fix lockable write flags

c9f7853

Install of hook still TODO

Generalise hook setup

83a12ed

Install post-checkout hook

8c9ba38

Early-out optimisations if lockable files aren’t in use

0a23e25

sinbad added 7 commits October 17, 2016 17:34

Add more tracing

62f87ac

Ignore missing files when asked to flip write flag

9a0155d

post-checkout tests

750949a

GetFilesChanged now supports getting files changed at a single commit…

aceef4f

… too

Add post-commit hook which sets lockable files read-only

132f0b4

Embed bolt in repo, forgot we did that even using glide

515af93

Fixed test that was skipped locally

174ed4c

Fix unit tests on Windows, chmod support on Windows is limited

304233e

technoweenie suggested changes Oct 24, 2016

View reviewed changes

Use nil instead of empty func for post-checkout/post-commit

23fc020

sinbad closed this Oct 24, 2016

ttaylorr mentioned this pull request Oct 24, 2016

Some design notes I wrote a while back when thinking about locking #666

Merged

sinbad mentioned this pull request Nov 7, 2016

Locking part 1: locking package #1625

Closed

Conversation

sinbad commented Oct 17, 2016

Uh oh!

technoweenie commented Oct 19, 2016

Uh oh!

sinbad commented Oct 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

technoweenie left a comment

Choose a reason for hiding this comment

Uh oh!

technoweenie Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

sinbad Oct 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

technoweenie Oct 24, 2016

Choose a reason for hiding this comment

Uh oh!

technoweenie Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

technoweenie Oct 24, 2016

Choose a reason for hiding this comment

Uh oh!

sinbad commented Oct 24, 2016

Uh oh!

sinbad commented Oct 24, 2016

Uh oh!

technoweenie commented Oct 24, 2016

Uh oh!

sinbad commented Oct 24, 2016

Uh oh!

sinbad commented Oct 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

technoweenie commented Oct 25, 2016

Uh oh!

sinbad commented Oct 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sinbad commented Oct 20, 2016 •

edited

Loading

sinbad Oct 24, 2016 •

edited

Loading

sinbad commented Oct 24, 2016 •

edited

Loading