Refactor cache metadata interface.#2316
Conversation
| const gitSnapshotIndex = keyGitSnapshot + "::" | ||
|
|
||
| const keyDeleted = "cache.deleted" | ||
| type MetadataStore interface { |
There was a problem hiding this comment.
Don't like this. Cache/metadata package should not know about every single possible caller it might have. A new caller can define a unique name for itself and get access to store information. If there are helper methods they should be on caller side. Same for GetSharedKey/GetGitRemote/GetGitSnapshot/Etag methods.
There was a problem hiding this comment.
I liked it because it made it made it possible to easily know all the metadata that's being tracked for a given cache record by just checking one place (in the cache package) rather than searching all over the codebase. Understanding what metadata is being tracked for a given record can obviously be important when you are making behavioral changes to the cache package.
If you disagree with the approach, what would you think of exposing the generic methods for setting/getting arbitrary values and then having outside callers just use those? While not as convenient, you can at least still find usages of those generic getters/setters to track down metadata that's being associate with records.
There was a problem hiding this comment.
know all the metadata that's being tracked
There shouldn't be any need to manage this together. A caller only knows about its use case, it should not have dependency/knowledge of other callers. If new callers are added or removed it should not concern cache pkg or other callers. Cache package just provides an arbitrary amount of callers access to persistent storage for a ref.
what would you think of exposing the generic methods for setting/getting arbitrary values
Yes, generic helpers without fixed keys are fine. Helpers with fixed keys are also fine on caller side (should be private though if they are only used by one package).
cache/metadata.go
Outdated
| return si.SetValue(b, keyDiffID, v) | ||
| }) | ||
| return nil | ||
| type Metadata interface { |
There was a problem hiding this comment.
This is more like RefMetadata now. No changes needed if you disagree.
There was a problem hiding this comment.
Sure, that clarifies that it's specific to a record rather than metadata for the cache as a whole, will update
cache/manager.go
Outdated
| if err := MigrateV2( | ||
| context.TODO(), | ||
| filepath.Join(opt.MetadataStoreRoot, "metadata.db"), | ||
| filepath.Join(opt.MetadataStoreRoot, "metadata_v2.db"), |
There was a problem hiding this comment.
Not sure about this change. Why would the cache database need to have a constant name from the package standpoint?
There was a problem hiding this comment.
The idea is that the cache package is the only one with direct access to the raw store and then just provides an interface for interacting with it to other packages (as opposed to before where a bunch of different packages had direct access to the raw store). It still needs to be told where to put the data on disk though, so I switched to this parameter.
This approach makes more sense to me but if you disagree I don't have any issue with switching back to passing the store as long as NewManager is the only place it gets passed to.
There was a problem hiding this comment.
It doesn't make sense to me that you can't create two instances of manager with metadata.db in same dir. It is some internal structure limitation that leaks to the public API. I'm ok with for example just adding MetadataDBV1 or smth in manageropt if you want to move migration here. It is bit weird though that all other stores are initialized and passed in as interface, and these are now passed as strings.
cache/refs.go
Outdated
| type Ref interface { | ||
| Mountable | ||
| ID() string | ||
| Metadata |
There was a problem hiding this comment.
No change needed, but I kind of liked it as separate from the rest of the implementation. Guess it can still be nullable if implementation just defines embedded interface as well.
There was a problem hiding this comment.
I just found that having all the getters/setters as methods on refs rather than detached functions made it way easier to write code and remember what metadata exists. It works much better with IDE autocompletion and is slightly less verbose.
| Release(context.Context) error | ||
| Size(ctx context.Context) (int64, error) | ||
| Metadata() *metadata.StorageItem | ||
| IdentityMapping() *idtools.IdentityMapping |
There was a problem hiding this comment.
Why is IdentityMapping removed. It is not fully connected (used atm only in moby) but seems like required data
There was a problem hiding this comment.
I guess this can be accessed after Mount is called from snapshot.Mountable. Not sure if this is enough for all cases or something might need this info without actually doing a mount.
There was a problem hiding this comment.
Yeah I didn't see that it was ever used so I got rid of it. If moby needs it though then I'll just re-add it, not a big deal at all.
0336182 to
8cbffe8
Compare
cache/metadata.go
Outdated
| GetString(string) string | ||
| GetExternal(string) ([]byte, error) | ||
|
|
||
| SetValue(string, interface{}, string) error |
There was a problem hiding this comment.
Where is this used where it requires to be public and interface{}? Current interface is weird as there is GetString/SetValue that don't have a matching getter/setter.
There was a problem hiding this comment.
Agreed it didn't make sense anymore now that the methods were public, changed to SetString
| GetExternal(string) ([]byte, error) | ||
|
|
||
| SetValue(string, interface{}, string) error | ||
| SetExternal(string, []byte) error |
|
There was a single random test failure in CI: https://gist.github.com/sipsma/a171fd146fa01cee69175e5a69ca793c Failure is from this line: buildkit/client/client_test.go Line 2665 in f314c4b I can't reproduce locally, even running with |
There are a few goals with this refactor: 1. Remove external access to fields that no longer make sense and/or won't make sense soon due to other potential changes. For example, there can now be multiple blobs associated with a ref (for different compression types), so the fact that you could access the "Blob" field from the Info method on Ref incorrectly implied there was just a single blob for the ref. This is on top of the fact that there is no need for external access to blob digests. 2. Centralize use of cache metadata inside the cache package. Previously, many parts of the code outside the cache package could obtain the bolt storage item for any ref and read/write it directly. This made it hard to understand what fields are used and when. Now, the Metadata method has been removed from the Ref interface and replaced with getters+setters for metadata fields we want to expose outside the package, which makes it much easier to track and understand. Similar changes have been made to the metadata search interface. 3. Use a consistent getter+setter interface for metadata, replacing the mix of interfaces like Metadata(), Size(), Info() and other inconsistencies. Signed-off-by: Erik Sipsma <erik@sipsma.dev>
|
Rebased. There was another ephemeral test failure on a previous run, different from the one I mentioned previously: https://gist.github.com/sipsma/593107a30f4886ff2b3dbb02a4862361 I looked around and can't see any clear way those failure could be related to the changes here and can't reproduce them locally after many runs. I also looked through past CI runs and found similar failures from before this PR: So I don't think it's likely those ephemeral failures are related to the change here. |
There are a few goals with this refactor:
won't make sense soon due to other potential changes. For example,
there can now be multiple blobs associated with a ref (for different
compression types), so the fact that you could access the "Blob"
field from the Info method on Ref incorrectly implied there was just
a single blob for the ref. This is on top of the fact that there is
no need for external access to blob digests.
Previously, many parts of the code outside the cache package could
obtain the bolt storage item for any ref and read/write it directly.
This made it hard to understand what fields are used and when. Now,
the Metadata method has been removed from the Ref interface and
replaced with getters+setters for metadata fields we want to expose
outside the package, which makes it much easier to track and
understand. Similar changes have been made to the metadata search
interface.
the mix of interfaces like Metadata(), Size(), Info() and other
inconsistencies.
Signed-off-by: Erik Sipsma erik@sipsma.dev