Conversation
Since this can now modify lines as well as append them, the way the file is updated has changed a bit.
Do this with an incoming list of patterns which have been made unlockable so that we can use this in `untrack` or `track —not-lockable` but not test every single non-lockable file every time
Install of hook still TODO
|
Thanks for submitting this. I have a review in progress. Though my first question is: why bolt? That seems like a pretty heavy dependency for caching. It seems like we could get away with flat files: # key is hashed path abcdef...
$ cat .git/lfs/lockcache/path/ab/cd/abcdef...
123
$ cat .git/lfs/lockcache/id/123
abcdef...That seems simple, but probably very inefficient for large numbers of locks... |
Yeah, I know what you mean. Mostly because if I was going to do it, it would be a single file loaded into memory once & written out once at the end too (separate files for each lock is overkill since all you need is a boolean). Given that bolt already does this and has handling in place for parallel processes trying to access that file already (parallel git processes), seemed like not reinventing was preferable, especially since to do a good job handling all the edge cases would have distracted me from what I was actually trying to do. If we decide this whole PR is viable as an approach and that bolt is a bit heavyweight we can replace it for the final version. |
technoweenie
left a comment
There was a problem hiding this comment.
This PR is way too big to review (beyond just chrome having issues rendering this page 😉). I'd really like to see it split out into individual components that we can discuss separately. Some components that I notice right away:
- locking package - In general, 👍 on extracting that out. I really want to get away from static caches (like
locking.cachedLockablePatterns) in all packages exceptcommands. - lock cache (boltdb or whatever)
trackcommand additionspost-commithookpost-checkouthook
I think the general idea here is really solid though 👍 I'm really happy that you're digging into this aspect of locking. I really liked the doc updates too, with the known limitations that you've written down.
| id2path, err := tx.CreateBucketIfNotExists(idToPathBucketName) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
@ttaylorr suggests creating the two buckets in initDb() instead of here, every time a lock is cached. By doing this in the sync.Once, a git-lfs process with a locked db only has to check for bucket existence once.
Also: any reason this isn't using api.Lock instead of the filepath and id strings? Two consecutive arguments of the same type force callers to look up the documentation.
There was a problem hiding this comment.
I think when I do this over I'm going to replace boltdb, but for reference, the reason for using CreateBucketIfNotExists is that you have to Get the bucket inside the transaction anyway because you're not allowed to store bucket references outside of transactions in Bolt. So there's no benefit to having a separate creation process done once at init because you can't hold on to the reference anyway. You might as well not have separate code to create and get when one call does either/or.
I didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures; if I did that anything calling it then depends on the api package and from experience that sort of thing leads to eventual circular deps. If you'd prefer a structure I'd introduce a new one in this package.
There was a problem hiding this comment.
didn't use api.Lock because I prefer to avoid introducing transitive dependencies in function signatures
👍 I'm thinking we could even eventually get rid of api. Callers should just be using packages like locking, transfer, batch, etc.
| path2id := tx.Bucket(pathToIdBucketName) | ||
| id2path := tx.Bucket(idToPathBucketName) | ||
| if path2id == nil || id2path == nil { | ||
| return nil |
There was a problem hiding this comment.
I think this should be an error if either bucket does not exist. Especially if the sync.Once is creating the buckets as suggested above.
commands/command_post_checkout.go
Outdated
| } | ||
|
|
||
| func init() { | ||
| RegisterCommand("post-checkout", postCheckoutCommand, func(cmd *cobra.Command) { |
There was a problem hiding this comment.
Ew, RegisterCommand() should probably just take a nil callback :)
Yep, I totally agree 😞 Unfortunately I needed to hit a certain number of features before I could even demo where I was trying to go with it, which was necessary both for my own evaluation of the approach and to be able to justify it to anyone else. This PR wasn't for merging or proper review, more for "does going down this path look reasonable or am I insane". I'm glad you think that on principle it seems worth carrying on with. I'm going to go a few steps further in this branch to try to dig to the bottom of a few of the concerns I still have (like performance and potential unknown edge cases with read-only files). Then when I'm happy it's practical I'll do the whole thing over as smaller PRs. |
Can you explain why? It would be vastly less performant not to cache these patterns, and passing caches backwards and forwards to |
|
I think of packages like Would something like that work? I wonder if this and |
|
Closing; now I'm relatively happy this is a sane direction it will return as several smaller PRs later. |
Hmm, I guess so, more of an OO approach then. It does imply either most of the package interface would become part of that struct (since a lot of it depends on the lockables list), or that it becomes a required function parameter. |
|
Yes, I'd prefer a more OO approach. It will give people more flexibility if they want to build on the internal LFS packages, as well as making testing more straightforward. Can you submit the doc changes as a small PR so we can retain that valuable info? If not, I'll just plagiarize it :) |
|
Sure, I'll do a separate PR in a sec. |
OK this branch has been running for too long and it's still not ready yet 😞 Not ready for merging, this is just for visibility/discussion.
The general thrust of this work is to try to expand on the locking feature to assist with workflow, to avoid accidents that require work to be lost later. This part of the work is about setting files to be read-only when they're marked as needing locking, and are not locked by the current user, as a kind of 'rumble strip' to hopefully stop you from accidentally doing work on a file before you realise you should have checked the locking situation. This has required the introduction of a bunch of new concepts:
lockable, which can be enabled / disabled on a file pattern usinggit lfs track --lockable. Files flagged like this are assumed to need locking because they're unmergeable (some LFS files may actually be mergeable e.g. text data)git lfs lock|unlocknow changes the write flag of these filespost-checkoutandpost-commithooks to make files read-only when they're marked as lockable and not currently locked by this user.post-mergeis still TODOOpen questions / risk factors right now:
I think this work is going to be required to make locking workable in practice but it's not there yet. Feel free to comment but I'm not proposing we merge this yet.