Skip to content

Conversation

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 9, 2022

@thaJeztah thaJeztah added area/runtime Runtime status/2-code-review area/images Image Distribution containerd-integration Issues and PRs related to containerd integration labels Aug 9, 2022
@thaJeztah thaJeztah added this to the v-next milestone Aug 9, 2022
@thaJeztah thaJeztah force-pushed the containerd_contexts_step1 branch from 2c7339b to b477634 Compare August 9, 2022 12:56
@thaJeztah thaJeztah changed the title containerd integration: imageservice: add contexts to various methods containerd integration: imageservice: add contexts to various methods (step 1) Aug 9, 2022
@thaJeztah thaJeztah force-pushed the containerd_contexts_step1 branch 2 times, most recently from c6bb210 to eeb19b2 Compare August 9, 2022 13:15
@thaJeztah thaJeztah marked this pull request as ready for review August 9, 2022 14:15
@thaJeztah
Copy link
Member Author

whoop; at least this one is green; I updated #43818 based on this PR (to start with the minimum amount of changes for that PR to work)

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plumbing contexts through a function inevitably makes it possible for an operation to fail for a completely benign reason: the context was canceled. A function could misbehave if it is not prepared for the possibility that context-accepting function calls could be canceled. Best-case scenario it returns an inaccurate error if a context is canceled. Worst-case scenario is a timing-dependent bug which occurs only if the context is canceled at a specific point in the operation's execution. Given the risk of introducing bugs which could be really troublesome to debug—or consistently reproduce—I think we have to make sure all functions are cancelation-aware when we plumb contexts through them. Since Go convention is to signal that a function call was canceled by returning the ctx.Err() value, cancelation-awareness is necessarily an aspect of error handling. So I think we have to plumb error-handling up into the caller at the same time as we plumb contexts down into a callee function.

Comment on lines 47 to 54
func (i *ImageService) manifestMatchesPlatform(ctx context.Context, img *image.Image, platform specs.Platform) bool {
logger := logrus.WithField("image", img.ID).WithField("desiredPlatform", platforms.Format(platform))

ls, leaseErr := i.leases.ListResources(context.TODO(), leases.Lease{ID: imageKey(img.ID().Digest())})
ls, leaseErr := i.leases.ListResources(ctx, leases.Lease{ID: imageKey(img.ID().Digest())})
if leaseErr != nil {
logger.WithError(leaseErr).Error("Error looking up image leases")
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifestMatchesPlatform returning false could mean the manifest does not match the platform, or it could mean ListResources returned an error. The caller can't tell them apart, so could erroneously proceed believing that it had been given accurate information when in fact the context happened to be canceled at an inopportune time. This kind of ambiguity has always existed, but was much less likely to be an issue on a healthy system before contexts were plumbed through. This is why I think we need to also plumb errors through whenever we plumb contexts through a function.

func (i *ImageService) manifestMatchesPlatform(ctx context.Context, img *image.Image, platform specs.Platform) (bool, error)

daemon/create.go Outdated

if opts.params.Platform == nil && opts.params.Config.Image != "" {
if img, _ := daemon.imageService.GetImage(opts.params.Config.Image, opts.params.Platform); img != nil {
if img, _ := daemon.imageService.GetImage(ctx, opts.params.Config.Image, opts.params.Platform); img != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring an error-return value on a function which takes a context argument could potentially lead to incorrect behaviour if the caller is unaware the function call was canceled. In this particular case, the image platform compatibility check could be skipped if the context is canceled, which could result in a container erroneously being created when the request should have been rejected.

When plumbing contexts through a function, at which stage do we want to audit the error handling? When a PR adds a context argument to a function's signature, should we audit the body of the function whose signature is modified (e.g. GetImage) or should we audit all the call sites of that function (e.g. the line this comment is attached to)? I personally would prefer the latter as the code won't compile until every call site of the plumbed-in function is touched, so the diff naturally highlights every call site which needs to be audited.

@thaJeztah
Copy link
Member Author

^^ also discussing this in the maintainers meeting; @corhere had some ideas on different approach on handling the context, and context-cancellation, which may be a good topic for our containerd-integration call ./cc @bsousaa @rumpl

I think next meeting is on a "whaleness" day, but perhaps it would be an idea to write down some approaches so that we can decide which one best fits our situation best.

@corhere
Copy link
Contributor

corhere commented Aug 11, 2022

A function must be prepared to handle the cancelation of any function calls it passes a (potentially) cancelable context into. As a corollary, functions for which it can be statically proven that under no circumstances can a cancelable context be passed into a function call do not need to be cancelation-aware. Therefore:

  • func Foo(_ context.Context) generally does not need to be cancelation-aware as it does not propagate its context argument into any function calls, and
  • a function call which passes in a statically provably not-cancelable context (context.Background() or context.TODO()) cannot be canceled.

I think it is possible to to plumb contexts through iteratively without getting overwhelmed or missing anything. The general idea is to always maintain some invariants while refactoring which do not require having to recursively audit all the callers or callees of a refactored function. Here's what I have come up with:

  1. A cancelable function must signal to its caller when its operation is canceled by returning (an error which wraps) ctx.Err().
  2. Any function which accepts a context argument and returns an error value must be assumed to be a cancelable function. (It could be refactored to become cancelable at any time, if it is not already.) Any call sites to that function must be cancelation-aware unless it can be statically proved that the context argument passed in is not cancelable. In most cases, cancelation-awareness implies that the calling function must itself be cancelable.

Refactoring could proceed iteratively, touching a chosen subset of the codebase at a time while maintaining the invariants. When a context argument is added to a function, all its call sites need only be modified to pass in context.TODO(), without needing to worry about cancelation-awareness. Reviewers can skim the call sites and focus on auditing only the function being refactored. When a context argument needs to be added to an interface method and the implementations need to be updated to match, the implementation functions' signatures can be modified by discarding the argument value (e.g. func (*Foo) Do(_ context.Context)) so that the function bodies do not need to be touched. Any gaps in context propagation can be found and iteratively refactored by searching for and replacing any context.TODO() calls.

The major flaw in the iterative approach is that values cannot be propagated across the context.TODO() gaps. There are ways to "derive" a context which propagates values but not deadlines or cancelation so that values can be propagated through cancelation-unaware function calls. The trivial but wrong way would be to create a custom Context type which stubs out the Deadline, Done, and Err methods. The right way would be to start a new context chain from context.Background() and explicitly copy the desired values into it, assuming you have access to the necessary context keys.

@thaJeztah thaJeztah force-pushed the containerd_contexts_step1 branch 2 times, most recently from 972db81 to d5c25c3 Compare August 25, 2022 23:39
@rumpl rumpl force-pushed the containerd_contexts_step1 branch from d5c25c3 to 2d59f8c Compare September 1, 2022 12:04
@thaJeztah
Copy link
Member Author

@rumpl looks like this may be related 🤔

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform/different_os (0.00s)
    create_test.go:484: assertion failed: expression is false: client.IsErrNotFound(err): Error response from daemon: namespace is required: failed precondition
    --- FAIL: TestCreateDifferentPlatform/different_os (0.00s)

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform/different_cpu_arch (0.00s)
    create_test.go:493: assertion failed: expression is false: client.IsErrNotFound(err): Error response from daemon: namespace is required: failed precondition
    --- FAIL: TestCreateDifferentPlatform/different_cpu_arch (0.00s)

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform (0.02s)

Comment on lines 79 to 85
_, err := c.imageBackend.GetImage(ctx, spec.Image, imagetypes.GetImageOpts{})
if err == nil {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, err := c.imageBackend.GetImage(ctx, spec.Image, imagetypes.GetImageOpts{})
if err == nil {
return nil
}
_, err := c.imageBackend.GetImage(ctx, spec.Image, imagetypes.GetImageOpts{})
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return err
}
if err == nil {
return nil
}

@rumpl rumpl force-pushed the containerd_contexts_step1 branch from 2d59f8c to d1a6d70 Compare September 5, 2022 12:40
@thaJeztah
Copy link
Member Author

@rumpl looks like this may be related 🤔

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform/different_os (0.00s)
    create_test.go:484: assertion failed: expression is false: client.IsErrNotFound(err): Error response from daemon: namespace is required: failed precondition
    --- FAIL: TestCreateDifferentPlatform/different_os (0.00s)

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform/different_cpu_arch (0.00s)
    create_test.go:493: assertion failed: expression is false: client.IsErrNotFound(err): Error response from daemon: namespace is required: failed precondition
    --- FAIL: TestCreateDifferentPlatform/different_cpu_arch (0.00s)

=== FAIL: github.com/docker/docker/integration/container TestCreateDifferentPlatform (0.02s)

This one looks to be failing still (on windows-2022 / run / integration-test (builtin, ./...)). Not 100% sure what's happening there; it's a bit weird that it's failing on that one, because that's builtin, so should even use containerd? Error is coming from

return "", fmt.Errorf("namespace is required: %w", errdefs.ErrFailedPrecondition)

@rumpl rumpl force-pushed the containerd_contexts_step1 branch 3 times, most recently from 9a410ab to d1a6d70 Compare September 6, 2022 10:11
Comment on lines 75 to 76
}
if err != nil && img == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the other special cases where err != nil && img != nil? Why is silently ignoring the error the correct behaviour?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed

@thaJeztah
Copy link
Member Author

#44091 was merged, so this PR can be rebased (to drop the first commit)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@rumpl rumpl force-pushed the containerd_contexts_step1 branch from 6ba017b to 33d04c1 Compare September 7, 2022 08:49
@rumpl
Copy link
Member

rumpl commented Sep 7, 2022

#44091 was merged, so this PR can be rebased (to drop the first commit)

Done

@rumpl rumpl force-pushed the containerd_contexts_step1 branch from 33d04c1 to 89350f9 Compare September 7, 2022 12:59
Copy link
Member Author

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +78 to 80
if img != nil {
p := maximumSpec()
imgPlat := v1.Platform{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but (also related to the other question about nil error and nil image), perhaps we should look at GetImage to return a sentinel error in some of these conditions.

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

Comment on lines 75 to 77
} else {
if err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
} else {
if err != nil {
return err
}
}
} else if err != nil {
return err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL... I wanted to make that comment, and thought "let it be" 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised none of the linters complained!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to enable more linters; looks like a bunch were deprecated, and there's a couple of interesting new ones; was testing some of them already on our code.

Not sure if any of them picked up this one though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very good at dodging linters

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/images Image Distribution area/runtime Runtime containerd-integration Issues and PRs related to containerd integration status/4-merge

Projects

Development

Successfully merging this pull request may close these issues.

3 participants