LCOW: Service VM lifetime changes#33969
Conversation
|
Fixed golint errors |
|
More golint fixes to follow tomorrow.... |
|
|
4917dfa to
d1d1a76
Compare
|
ping @johnstep :D |
|
@vieux Sorry, I had been reviewing some separately with @jhowardmsft and we got some lock issues addressed. I will finish up the review soon; so far, so good. |
71278e8 to
bb64e86
Compare
johnstep
left a comment
There was a problem hiding this comment.
LGTM on green, after changes from external review; thanks @jhowardmsft for real-time review.
|
Is green :) |
|
@thaJeztah Who else could review this? (Windows only code) |
|
I'll have a look if there's someone available; maybe @StefanScherer is also interested to have a look |
|
Rebased. @darrenstahlmsft Could you take a look? |
darstahl
left a comment
There was a problem hiding this comment.
Not a maintainer, but LGTM other than one missing Unlock
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Debug log indicating mutex is being released, but unless I'm missing something it's still held.
There was a problem hiding this comment.
Yes, good catch. Thanks. Updated.
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
nit: This could be a defer unlock above
There was a problem hiding this comment.
RE offline discussion, this is fine as is for now.
|
@thaJeztah @vieux - OK with the above reviews with Windows-eyes on it? I need to build on this code base, hence would like it to get merged sooner, and similarly does Akash for the remote filesystem work. Thanks. |
thaJeztah
left a comment
There was a problem hiding this comment.
Started reviewing, and left some comments
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Nit: would be good to have this function return a type; 4 return variables is hard to read
type layerDetails {
fileName string
fileSize int64
isSandbox bool
}Could even be (if additional details of the file are useful)
type layerDetails struct {
fileInfo os.FileInfo
isSandbox
}There was a problem hiding this comment.
Not necessarily for this PR ^^
There was a problem hiding this comment.
Yeah, let me do that in the current branch I'm working in for a future PR which builds on the lifetime changes in this PR. Will avoid a confusing rebase.
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
If this is the package description, it should be at the top
There was a problem hiding this comment.
Sure, will move to above package.
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Any reason for including this metadata here?
There was a problem hiding this comment.
Before the likes of you start reading the comments too closely 🇬🇧
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Don't forget to update the documentation for this
There was a problem hiding this comment.
actually, can we change this option to --storage-opt lcow.globalmode=1?
Storage-driver specific options use a <driver>. prefix (see https://github.com/docker/cli/blob/ff2552f7a1073834e79793aeb5dc5a4e486956a4/docs/reference/commandline/dockerd.md#options-per-storage-driver)
There was a problem hiding this comment.
Sure will be in next update.
There was a problem hiding this comment.
BTW - documentation is going to come later. Let's get it working first.... docs will be a combined effort which @PatrickLang will have to take on the majority of
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Should this be a tmpdir if it's for "scratch"?
There was a problem hiding this comment.
See my comment 1 line higher....
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
nit: isMounted perhaps? hasBeenMounted is easily mis-interpreted as "it was mounted at some point, but not anymore"
daemon/graphdriver/lcow/lcow.go
Outdated
There was a problem hiding this comment.
Same here; actually, can you change this option to lcow.globalmode?
Storage-driver specific options use a <driver>. prefix (see https://github.com/docker/cli/blob/ff2552f7a1073834e79793aeb5dc5a4e486956a4/docs/reference/commandline/dockerd.md#options-per-storage-driver)
There was a problem hiding this comment.
Also can you make this a boolean instead of 1 / 0, and check the value here? --storage-opt lcow.globalmode=false should disable it. This is important, because we use these inside the daemon.json as well (for which we use booleans)
There was a problem hiding this comment.
BTW - the statement about the prefix isn't universally true - see btrfs.go L569 for example.
There was a problem hiding this comment.
Updated to true/false using strconv.ParseBool
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the LCOW driver to support both global SVM lifetime and per-instance lifetime. It also corrects the scratch implementation.
|
@thaJeztah Comments addressed, update pushed. Will address the return parameters in the follow-up PR once this is merged as mentioned ^^ |
|
@thaJeztah and still green 💚 |
| return err | ||
| } | ||
|
|
||
| return d.config.CreateSandbox(filepath.Join(d.dir(id), sandboxFilename), client.DefaultSandboxSizeMB, d.cachedSandboxFile) |
There was a problem hiding this comment.
any reason you changed this to if err := ... ; err != nil {?
| logrus.Debugf("%s: releasing cacheMutex. Ref count has dropped to zero", title) | ||
| d.cacheMutex.Unlock() | ||
|
|
||
| // To reach this point, the reference count has dropped to zero. If we have |
There was a problem hiding this comment.
This is confusing, because entry.refCount will never actually reach zero
There was a problem hiding this comment.
I'll update the comment. To something more like there are no more references if we have reached this point. Is that what you mean?
There was a problem hiding this comment.
Yes, or "don't" return after decrementing, but checking if referenceCount reaches 0. May want to combine that with the refactoring; for now, updating the comment should do
| logrus.Debugf("%s %s: refCount decremented to %d", title, id, entry.refCount) | ||
| d.cacheMu.Unlock() | ||
| logrus.Debugf("%s: releasing cache item for decrement and early get-out as refCount is now %d", title, entry.refCount) | ||
| entry.Unlock() |
There was a problem hiding this comment.
I realise there's a lot of debugging / WIP code in here still; but all the locking makes it quite easy to make slip-ups at some point.
Once we reach a bit more mature code; should we refactor, and separate out the whole cache into something more standalone?
// GetRefCount returns the reference count.
func (e * cacheEntry) GetRefCount() int {
e.Lock()
defer e.Unlock()
return e.refCount
}
// GetRefCount decrements and returns the reference count.
func (e * cacheEntry) DecrementRefCount() int {
e.Lock()
defer e.Unlock()
if e.refCount > 0 {
e.refCount--
}
return e.refCount
}There was a problem hiding this comment.
That's fair. Yes, the debugging stuff is deliberately paranoid right now. Once it's stable, I can make changes such as this.
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.10 - hcsshim: v0.5.28
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.10 - hcsshim: v0.5.28
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.10 - hcsshim: v0.5.28
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.12 - hcsshim: v0.6.x
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.12 - hcsshim: v0.6.x
Signed-off-by: John Howard <jhoward@microsoft.com> This changes the graphdriver to perform dynamic sandbox management. Previously, as a temporary 'hack', the service VM had a prebuilt sandbox in it. With this change, management is under the control of the client (docker) and executes a mkfs.ext4 on it. This enables sandboxes of non-default sizes too (a TODO previously in the code). It also addresses moby/moby#33969 (comment) Requires: - go-winio: v0.4.3 - opengcs: v0.0.12 - hcsshim: v0.6.x Upstream-commit: 8c279ef Component: engine
Signed-off-by: John Howard jhoward@microsoft.com
This changes the LCOW driver to support both global SVM lifetime and per-instance lifetime, and makes the driver thread-safe. It corrects a number of previously existing bugs. It also bumps to the latest version of jhowardmsft/opengcs.
Have validated basic CI internally here (building/pulling/committing/running/...) in both modes and validated there are no regressions from the previous behaviour
@johnstep PTAL.