Skip to content

WIP (!) Locking read-only behaviour#1590

Closed
sinbad wants to merge 39 commits intomasterfrom
locking-workflow
Closed

WIP (!) Locking read-only behaviour#1590
sinbad wants to merge 39 commits intomasterfrom
locking-workflow

Conversation

@sinbad
Copy link
Contributor

@sinbad sinbad commented Oct 17, 2016

OK this branch has been running for too long and it's still not ready yet 😞 Not ready for merging, this is just for visibility/discussion.

The general thrust of this work is to try to expand on the locking feature to assist with workflow, to avoid accidents that require work to be lost later. This part of the work is about setting files to be read-only when they're marked as needing locking, and are not locked by the current user, as a kind of 'rumble strip' to hopefully stop you from accidentally doing work on a file before you realise you should have checked the locking situation. This has required the introduction of a bunch of new concepts:

  • A custom attribute lockable, which can be enabled / disabled on a file pattern using git lfs track --lockable. Files flagged like this are assumed to need locking because they're unmergeable (some LFS files may actually be mergeable e.g. text data)
  • git lfs lock|unlock now changes the write flag of these files
  • post-checkout and post-commit hooks to make files read-only when they're marked as lockable and not currently locked by this user. post-merge is still TODO
  • A local cache of locked file state to make it feasible to perform the above without hitting the server (uses boltdb)

Open questions / risk factors right now:

  • Practical performance of keeping tabs on write flags in practice on large repos
  • Gaps in available hook behaviour that may make write flag state not 100% watertight; known cases discussed in proposals/locking.md
  • Edge cases / undefined behaviour with read-only working copy files

I think this work is going to be required to make locking workable in practice but it's not there yet. Feel free to comment but I'm not proposing we merge this yet.

sinbad added 30 commits October 17, 2016 17:32
Since this can now modify lines as well as append them, the way the file
is updated has changed a bit.
Do this with an incoming list of patterns which have been made unlockable
so that we can use this in `untrack` or `track —not-lockable` but not
test every single non-lockable file every time
@technoweenie
Copy link
Contributor

Thanks for submitting this. I have a review in progress. Though my first question is: why bolt? That seems like a pretty heavy dependency for caching. It seems like we could get away with flat files:

# key is hashed path abcdef...
$ cat .git/lfs/lockcache/path/ab/cd/abcdef...
123

$ cat .git/lfs/lockcache/id/123
abcdef...

That seems simple, but probably very inefficient for large numbers of locks...

@sinbad
Copy link
Contributor Author

sinbad commented Oct 20, 2016

Though my first question is: why bolt?

Yeah, I know what you mean. Mostly because if I was going to do it, it would be a single file loaded into memory once & written out once at the end too (separate files for each lock is overkill since all you need is a boolean). Given that bolt already does this and has handling in place for parallel processes trying to access that file already (parallel git processes), seemed like not reinventing was preferable, especially since to do a good job handling all the edge cases would have distracted me from what I was actually trying to do.

If we decide this whole PR is viable as an approach and that bolt is a bit heavyweight we can replace it for the final version.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is way too big to review (beyond just chrome having issues rendering this page 😉). I'd really like to see it split out into individual components that we can discuss separately. Some components that I notice right away:

  • locking package - In general, 👍 on extracting that out. I really want to get away from static caches (like locking.cachedLockablePatterns) in all packages except commands.
  • lock cache (boltdb or whatever)
  • track command additions
  • post-commit hook
  • post-checkout hook

I think the general idea here is really solid though 👍 I'm really happy that you're digging into this aspect of locking. I really liked the doc updates too, with the known limitations that you've written down.

id2path, err := tx.CreateBucketIfNotExists(idToPathBucketName)
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttaylorr suggests creating the two buckets in initDb() instead of here, every time a lock is cached. By doing this in the sync.Once, a git-lfs process with a locked db only has to check for bucket existence once.

Also: any reason this isn't using api.Lock instead of the filepath and id strings? Two consecutive arguments of the same type force callers to look up the documentation.

Copy link
Contributor Author

@sinbad sinbad Oct 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when I do this over I'm going to replace boltdb, but for reference, the reason for using CreateBucketIfNotExists is that you have to Get the bucket inside the transaction anyway because you're not allowed to store bucket references outside of transactions in Bolt. So there's no benefit to having a separate creation process done once at init because you can't hold on to the reference anyway. You might as well not have separate code to create and get when one call does either/or.

I didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures; if I did that anything calling it then depends on the api package and from experience that sort of thing leads to eventual circular deps. If you'd prefer a structure I'd introduce a new one in this package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures

👍 I'm thinking we could even eventually get rid of api. Callers should just be using packages like locking, transfer, batch, etc.

path2id := tx.Bucket(pathToIdBucketName)
id2path := tx.Bucket(idToPathBucketName)
if path2id == nil || id2path == nil {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an error if either bucket does not exist. Especially if the sync.Once is creating the buckets as suggested above.

}

func init() {
RegisterCommand("post-checkout", postCheckoutCommand, func(cmd *cobra.Command) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew, RegisterCommand() should probably just take a nil callback :)

@sinbad
Copy link
Contributor Author

sinbad commented Oct 24, 2016

This PR is way too big to review (beyond just chrome having issues rendering this page 😉). I'd really like to see it split out into individual components that we can discuss separately

Yep, I totally agree 😞 Unfortunately I needed to hit a certain number of features before I could even demo where I was trying to go with it, which was necessary both for my own evaluation of the approach and to be able to justify it to anyone else. This PR wasn't for merging or proper review, more for "does going down this path look reasonable or am I insane". I'm glad you think that on principle it seems worth carrying on with.

I'm going to go a few steps further in this branch to try to dig to the bottom of a few of the concerns I still have (like performance and potential unknown edge cases with read-only files). Then when I'm happy it's practical I'll do the whole thing over as smaller PRs.

@sinbad
Copy link
Contributor Author

sinbad commented Oct 24, 2016

I really want to get away from static caches (like locking.cachedLockablePatterns) in all packages except commands.

Can you explain why? It would be vastly less performant not to cache these patterns, and passing caches backwards and forwards to command breaks encapsulation. Unless you mean having some sort of encapsulated extensible 'state' object which command manages the lifecycle of but which is opaque to it so that other packages can keep data in it for performance reasons?

@technoweenie
Copy link
Contributor

I think of packages like locking, scanner, etc, as libraries that other go devs could use for their own lfs apps. I'd prefer to have GetLockablePatterns() return some *locking.Lockables struct, with funcs like:

func (*Lockables) IsLockable(path string) bool

// replace locking.FixLockableFileWriteFlags()
func (*Lockables) FixWriteFlags(files []string) error

Would something like that work? I wonder if this and config.GetAttributePaths() could go into their own package? attributes.GetPaths(gitDir, workingDir string) *attributes.Paths

@sinbad
Copy link
Contributor Author

sinbad commented Oct 24, 2016

Closing; now I'm relatively happy this is a sane direction it will return as several smaller PRs later.

@sinbad sinbad closed this Oct 24, 2016
@sinbad
Copy link
Contributor Author

sinbad commented Oct 24, 2016

I'd prefer to have GetLockablePatterns() return some *locking.Lockables struct, with funcs

Hmm, I guess so, more of an OO approach then. It does imply either most of the package interface would become part of that struct (since a lot of it depends on the lockables list), or that it becomes a required function parameter.

@technoweenie
Copy link
Contributor

Yes, I'd prefer a more OO approach. It will give people more flexibility if they want to build on the internal LFS packages, as well as making testing more straightforward.

Can you submit the doc changes as a small PR so we can retain that valuable info? If not, I'll just plagiarize it :)

@sinbad
Copy link
Contributor Author

sinbad commented Oct 26, 2016

Sure, I'll do a separate PR in a sec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants