Key value store#1694
Conversation
|
Hold off a minute, I think there may be an issue with writing a DB that's truncated smaller, need to add a test for that. |
|
OK, handled that case now; actually still worked ( |
ttaylorr
left a comment
There was a problem hiding this comment.
Digging this. Left a few comments for you to 💭 about.
| } | ||
|
|
||
| // firstly peek at version; open read/write to keep lock between check & write | ||
| f, err := os.OpenFile(k.filename, os.O_RDWR|os.O_CREATE, 0664) |
There was a problem hiding this comment.
Thinking that we could do this eagerly and maintain a handle on the *os.File from the instance, instead of this method.
Since I assume that we Save() more than once, only having to do this somewhat expensive operation once would be a win for me.
There was a problem hiding this comment.
No, this is completely deliberate, and key to the optimistic locking technique. If you hold on to the handle, it will mean that parallel instances of git-lfs will block each other, which is limiting. The whole point of the optimistic locking system is to allow multiple instances of git-lfs to operate on shared key value stores and not end up locking each other out, while also resolving any parallel update issues.
Opening a file isn't that expensive and not unexpected on a Save() call. I doubt we'll be calling it very often; optimally you only call it once just before exit because it's an in-memory store. Conflicts are resolved at save time to avoid having to maintain stronger locks (this is a common DB trick).
There was a problem hiding this comment.
Got it, thanks for explaining.
|
|
||
| // Short-circuit if we have no changes | ||
| if len(k.log) == 0 { | ||
| return nil |
There was a problem hiding this comment.
Implicitly yes; it's a requirement that mu is always locked in the body of functions, either read-only or a full lock on mutating functions.
|
|
||
| k.version++ | ||
|
|
||
| enc := gob.NewEncoder(f) |
There was a problem hiding this comment.
Actually, we may want to just hold onto this instance instead of the file. Perhaps both, since we need to f.Seek and f.Truncate.
There was a problem hiding this comment.
No, see comments above. Being able to let go of the file handle is a deliberate feature :)
|
|
||
| enc := gob.NewEncoder(f) | ||
| err = enc.Encode(k.version) | ||
| if err != nil { |
There was a problem hiding this comment.
Nit: inline this:
if err = enc.Encode(k.Version); err != nil {
// ...
}| return fmt.Errorf("Error while writing version data to %v: %v", k.filename, err) | ||
| } | ||
| err = enc.Encode(k.db) | ||
| if err != nil { |
| db map[string]interface{} | ||
| } | ||
|
|
||
| type keyValueOperation int |
There was a problem hiding this comment.
Possible to use a byte, since it's smaller than an int?
There was a problem hiding this comment.
Yeah, although because of data alignment, in practice you don't actually save anything unless you have an array of them :)
| @@ -0,0 +1,201 @@ | |||
| package tools | |||
There was a problem hiding this comment.
I think this would be appropriate to put in a sub-package of tools: github.com/git-lfs/git-lfs/tools/kv, which would allow us to get rid of a lot of KeyValue* prefixes.
There was a problem hiding this comment.
OK, can I call it 'keyvalue' for clarity? I even considered the option of making it a separate top-level package since it's not even specific to git-lfs...
There was a problem hiding this comment.
Let's stick with tools/kv since kv is a common abbreviation.
There was a problem hiding this comment.
I'm 👍 on kv because it's a common abbreviation, and go packages tend to prefer that (such as io). I'm totally fine with leaving it if you have strong feelings about it though.
| func NewKeyValueStore(filepath string) (*KeyValueStore, error) { | ||
| kv := &KeyValueStore{filename: filepath, db: make(map[string]interface{})} | ||
| err := kv.loadAndMergeIfNeeded() | ||
| return kv, err |
There was a problem hiding this comment.
Nit: since you can return non-nil values for both, this could be simplified to:
return kv, kv.loadAndMergeIfNeeded()| defer k.mu.Unlock() | ||
|
|
||
| delete(k.db, key) | ||
| k.log = append(k.log, keyValueChange{keyValueRemoveOperation, key, nil}) |
There was a problem hiding this comment.
Looks like these transactional methods are pretty similar. What do you think about teaching the keyValueChange type how to perform these operations?
| if err == nil { | ||
| defer f.Close() | ||
| return k.loadAndMergeReaderIfNeeded(f) | ||
| } else if !os.IsNotExist(err) { |
There was a problem hiding this comment.
Is this possible given the above?
technoweenie
left a comment
There was a problem hiding this comment.
Made a few code style improvements, but nothing crucial. Thanks for doing this!
| if err != nil { | ||
| return err | ||
| } | ||
| stat, _ := os.Stat(k.filename) |
There was a problem hiding this comment.
Small nit, but you can call f.Stat() here.
| case setOperation: | ||
| baseDb[change.key] = change.value | ||
| case removeOperation: | ||
| delete(baseDb, change.key) |
There was a problem hiding this comment.
There's a bit of duplication here with Set() and Remove(). Though since this is just a wrapper around a map, it's probably fine for now. One way to solve it is to turn change into an interface, and calling Apply() on *setKey or *removeKey implementations in reapplyChanges() and logChange().
It looks like a mutex is needed here, but I think it's fine this code path is only called on Save() (which has a mutex) and NewStore() (which shouldn't need one).
There was a problem hiding this comment.
It's only one line of duplication in practice since at this point we're replaying the change and don't want to re-record it in the log; I think a function or interface for 1 line is probably overkill.
The pattern I've used for the mutex is that externally visible functions lock it and internal functions rely on it already being locked. This distinction is required because Go's mutexes aren't recursive, so locks have to be at the edge of the interface and not re-locked inside that.
|
Hmm.. I think we have a new intermittent failure: I've seen this one pop up a few times here and there, and only on Windows. Haven't done any investigation, so I'm not sure what's going on 😭 I'm going to re-run the tests for now, since your PR didn't introduce the transient failure. |
|
Gah, yeah I saw this a couple of weeks ago in an unrelated PR. Only seems to affect AppVeyor and I can't reproduce it on Windows :/ |
Could do this automatically using reflection? This is faster for now.
Some refactoring to make it cleaner when doing this
This actually didn’t cause any decoding issues to leave old data at the end, but we should be efficient with on-disk data.
79824bb to
a047f0a
Compare
|
Re-targetted this at |
This PR adds an in-memory, optimistically locked key-value store. It's basically a replacement for Boltdb for caching active locks locally as requested in the lock PR.
Submitted separately because it's generic and hopefully easy to review. Rewrite of lock PR is still ongoing but will use this.