Skip to content

LCOW: Service VM lifetime changes#33969

Merged
thaJeztah merged 1 commit intomoby:masterfrom
microsoft:jjh/lifetime
Jul 13, 2017
Merged

LCOW: Service VM lifetime changes#33969
thaJeztah merged 1 commit intomoby:masterfrom
microsoft:jjh/lifetime

Conversation

@lowenna
Copy link
Member

@lowenna lowenna commented Jul 5, 2017

Signed-off-by: John Howard jhoward@microsoft.com

This changes the LCOW driver to support both global SVM lifetime and per-instance lifetime, and makes the driver thread-safe. It corrects a number of previously existing bugs. It also bumps to the latest version of jhowardmsft/opengcs.

Have validated basic CI internally here (building/pulling/committing/running/...) in both modes and validated there are no regressions from the previous behaviour

@johnstep PTAL.

@lowenna
Copy link
Member Author

lowenna commented Jul 6, 2017

Fixed golint errors

@lowenna
Copy link
Member Author

lowenna commented Jul 6, 2017

More golint fixes to follow tomorrow....

@lowenna
Copy link
Member Author

lowenna commented Jul 6, 2017

go vet errors fixed. Should be green now in CI

@lowenna lowenna force-pushed the jjh/lifetime branch 5 times, most recently from 4917dfa to d1d1a76 Compare July 6, 2017 20:02
@vieux
Copy link
Contributor

vieux commented Jul 7, 2017

ping @johnstep :D

@johnstep
Copy link
Member

johnstep commented Jul 7, 2017

@vieux Sorry, I had been reviewing some separately with @jhowardmsft and we got some lock issues addressed. I will finish up the review soon; so far, so good.

@lowenna lowenna force-pushed the jjh/lifetime branch 2 times, most recently from 71278e8 to bb64e86 Compare July 7, 2017 18:16
Copy link
Member

@johnstep johnstep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green, after changes from external review; thanks @jhowardmsft for real-time review.

@lowenna
Copy link
Member Author

lowenna commented Jul 7, 2017

Is green :)

@lowenna
Copy link
Member Author

lowenna commented Jul 8, 2017

@thaJeztah Who else could review this? (Windows only code)

@thaJeztah
Copy link
Member

I'll have a look if there's someone available; maybe @StefanScherer is also interested to have a look

@lowenna
Copy link
Member Author

lowenna commented Jul 10, 2017

Rebased. @darrenstahlmsft Could you take a look?

Copy link
Contributor

@darstahl darstahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a maintainer, but LGTM other than one missing Unlock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log indicating mutex is being released, but unless I'm missing something it's still held.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. Thanks. Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be a defer unlock above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE offline discussion, this is fine as is for now.

@lowenna
Copy link
Member Author

lowenna commented Jul 12, 2017

@thaJeztah @vieux - OK with the above reviews with Windows-eyes on it? I need to build on this code base, hence would like it to get merged sooner, and similarly does Akash for the remote filesystem work. Thanks.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing, and left some comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would be good to have this function return a type; 4 return variables is hard to read

type layerDetails {
    fileName string
    fileSize int64
    isSandbox bool
}

Could even be (if additional details of the file are useful)

type layerDetails struct {
    fileInfo os.FileInfo
    isSandbox
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me do that in the current branch I'm working in for a future PR which builds on the lifetime changes in this PR. Will avoid a confusing rebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the package description, it should be at the top

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will move to above package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for including this metadata here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the likes of you start reading the comments too closely 🇬🇧

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the documentation for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, can we change this option to --storage-opt lcow.globalmode=1?

Storage-driver specific options use a <driver>. prefix (see https://github.com/docker/cli/blob/ff2552f7a1073834e79793aeb5dc5a4e486956a4/docs/reference/commandline/dockerd.md#options-per-storage-driver)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will be in next update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - documentation is going to come later. Let's get it working first.... docs will be a combined effort which @PatrickLang will have to take on the majority of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a tmpdir if it's for "scratch"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment 1 line higher....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isMounted perhaps? hasBeenMounted is easily mis-interpreted as "it was mounted at some point, but not anymore"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here; actually, can you change this option to lcow.globalmode?

Storage-driver specific options use a <driver>. prefix (see https://github.com/docker/cli/blob/ff2552f7a1073834e79793aeb5dc5a4e486956a4/docs/reference/commandline/dockerd.md#options-per-storage-driver)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you make this a boolean instead of 1 / 0, and check the value here? --storage-opt lcow.globalmode=false should disable it. This is important, because we use these inside the daemon.json as well (for which we use booleans)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, lcow.globalmode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - the statement about the prefix isn't universally true - see btrfs.go L569 for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to true/false using strconv.ParseBool

Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the LCOW driver to support both global SVM lifetime and
per-instance lifetime. It also corrects the scratch implementation.
@lowenna
Copy link
Member Author

lowenna commented Jul 13, 2017

@thaJeztah Comments addressed, update pushed. Will address the return parameters in the follow-up PR once this is merged as mentioned ^^

@lowenna
Copy link
Member Author

lowenna commented Jul 13, 2017

@thaJeztah and still green 💚

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussing with @johnstep - we're good merging this; it's clearly WIP code, and this requires refactoring before it's reaching its final stage.

"lgtm"

return err
}

return d.config.CreateSandbox(filepath.Join(d.dir(id), sandboxFilename), client.DefaultSandboxSizeMB, d.cachedSandboxFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you changed this to if err := ... ; err != nil {?

logrus.Debugf("%s: releasing cacheMutex. Ref count has dropped to zero", title)
d.cacheMutex.Unlock()

// To reach this point, the reference count has dropped to zero. If we have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, because entry.refCount will never actually reach zero

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the comment. To something more like there are no more references if we have reached this point. Is that what you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or "don't" return after decrementing, but checking if referenceCount reaches 0. May want to combine that with the refactoring; for now, updating the comment should do

logrus.Debugf("%s %s: refCount decremented to %d", title, id, entry.refCount)
d.cacheMu.Unlock()
logrus.Debugf("%s: releasing cache item for decrement and early get-out as refCount is now %d", title, entry.refCount)
entry.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise there's a lot of debugging / WIP code in here still; but all the locking makes it quite easy to make slip-ups at some point.

Once we reach a bit more mature code; should we refactor, and separate out the whole cache into something more standalone?

// GetRefCount returns the reference count.
func (e * cacheEntry) GetRefCount() int {
	e.Lock()
	defer e.Unlock()

	return e.refCount
}

// GetRefCount decrements and returns the reference count.
func (e * cacheEntry) DecrementRefCount() int {
	e.Lock()
	defer e.Unlock()
	if e.refCount > 0 {
            e.refCount--
        }
	return e.refCount
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Yes, the debugging stuff is deliberately paranoid right now. Once it's stable, I can make changes such as this.

@thaJeztah thaJeztah merged commit f22cecf into moby:master Jul 13, 2017
lowenna pushed a commit to microsoft/docker that referenced this pull request Jul 18, 2017
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.10
- hcsshim:  v0.5.28
lowenna pushed a commit to microsoft/docker that referenced this pull request Jul 19, 2017
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.10
- hcsshim:  v0.5.28
gupta-ak pushed a commit to microsoft/docker that referenced this pull request Jul 28, 2017
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.10
- hcsshim:  v0.5.28
lowenna pushed a commit to microsoft/docker that referenced this pull request Aug 2, 2017
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.12
- hcsshim:  v0.6.x
lowenna pushed a commit to microsoft/docker that referenced this pull request Aug 3, 2017
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.12
- hcsshim:  v0.6.x
@thaJeztah thaJeztah added the area/lcow Issues and PR's related to the experimental LCOW feature label Oct 30, 2018
silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 16, 2020
Signed-off-by: John Howard <jhoward@microsoft.com>

This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).

It also addresses moby/moby#33969 (comment)

Requires:
- go-winio: v0.4.3
- opengcs:  v0.0.12
- hcsshim:  v0.6.x
Upstream-commit: 8c279ef
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/lcow Issues and PR's related to the experimental LCOW feature status/2-code-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants