Skip to content

idea: unbound cancel scopes #607

@njsmith

Description

@njsmith

I've recently run into a few places where I want a cancel scope for some code that may or may not already be running. For example, in pytest-trio, if a fixture crashes you want to cancel the main test... but these run in different tasks, so it's tricky to find the main task's cancel scope and put it somewhere that a fixture can get at it, without race conditions.

It's possible to create an object that sort of acts like a cancel scope, but where you can call cancel before or after the code inside the scope starts running. But it's fairly tricky to get all the cases right, e.g.:

@attr.s
class UnboundCancelScope:
    cancel_called = attr.ib(default=False)
    _cancel_scope = attri.ib(default=None)

    def cancel(self):
        self.cancel_called = True
        if self._cancel_scope is not None:
            self._cancel_scope.cancel()

    def __enter__(self):
        self._cancel_scope = trio.open_cancel_scope().__enter__()
        if self.cancel_called:
            self._cancel_scope.cancel()

    def __exit__(self, *args):
        return self._cancel_scope.__exit__(*args)

# Creation:
unbound_cancel_scope = UnboundCancelScope()

# Entering:
with unbound_cancel_scope:
    ...

Maybe we should make this just... how cancel scopes work, always? Right now open_cancel_scope is a context manager, so it forces you to immediately enter the scope it creates. But we could reinterpret it as returning an unbound cancel scope object, and then the with cancel_scope: ... as entering that scope – it'd even be backwards compatible!

Implementation-wise, I think it'd be almost trivial. The one thing to watch out for is that it'd become possible to attempt to re-enter a scope that you're already inside, which would be complicated (e.g. instead of keeping a set of which tasks are inside the scope, we'd have to keep a dict of task → refcount). For now we should just error out if someone tries to do this. (OTOH, I think having multiple independent tasks entering the same scope is fine and would Just Work.)

Maybe we should also make CancelScope public? Right now it's hidden in order to keep the constructor private, but that would be unnecessary in this approach – in fact open_cancel_scope(...) would just be return CancelScope(...), so maybe we'd even want to deprecate it or something.

One limitation of this approach is that cancelled_caught would become ambiguous if multiple tasks can enter the same scope. It might not matter.

Alternatives

There's a larger design space here of course. Cancel scopes are inspired in part by C#'s cancellation system, which has "cancel sources" – which let you call .cancel(), and set deadlines – and "cancel tokens" – which are read-only objects that let you check whether the corresponding source has been cancelled and what its deadline is. You can also combine multiple tokens together to create a new cancel source, that automatically becomes cancelled when any of the original tokens are cancelled. (I'm not sure why this creates a new source, rather than creating a new token. I think it doesn't matter which way you define the API though, each version can basically be implemented in terms of the other. Also, for some reason C# doesn't actually provide any API for querying for the current deadline given a source or a token, but this is silly so I'm going to ignore it.)

In Trio's current system, cancel scopes = cancel sources, and there is no reified object corresponding to cancel tokens – they're implicit on the cancel stack associated with a task, and you can query this implicit state using current_effective_deadline(). So in addition to introducing the idea of an "ambient" cancel token, we're also quite aggressive about collapsing together the different ideas here.

If we wanted to fully decompose the space, you can imagine operations:

  • create a cancel source (like an unbound cancel scope, in the proposal above)
  • given a cancel source, produce a cancel token
  • use a with block to bind a given cancel token to the current task, which produces a "cancel binding"
  • given a "cancel binding", query for whether the code inside the block was actually cancelled (the cancelled_caught attribute)
  • given a task's ambient context, produce a new cancel token that becomes cancelled if the original context becomes cancelled.
  • given a cancel token, query for current deadline and cancelled state

This is almost certainly too fine-grained a decomposition, but I find it useful to see it all laid out like that... and it does allow for things we can't do right now, like check whether another task's ambient context has been cancelled (by extracting its cancel token and then querying it later). Or a minor feature that curio has, and I envy: if you enter a thread with run_sync_in_worker_thread, and then the thread comes back into trio with BlockingTrioPortal.run, and the original run_sync_in_worker_thread is cancelled... it would be neat if this caused the code inside the BlockingTrioPortal.run call to raise a Cancelled error that propagated all the way back out of trio, through the thread, and back into trio.

Though actually... the "fully-decomposed" design is still not powerful enough to allow that! I was thinking you could do it by having run_sync_in_worker_thread capture the ambient token and then inside BlockingTrioPortal.run we could do with the_ambient_token... but this doesn't quite work, because it would create a new binding. If run_sync_in_worker_thread was cancelled, then the code inside the BTP.run call would raise Cancelled but that exception would be caught at the with the_ambient_token, instead of propagating into the thread and then back into trio. Cancelled exceptions are associated with cancel bindings, not cancel tokens or cancel scopes. Hmm! Well, at least the decomposed design gives us useful vocabulary :-).

It's not clear whether propagating cancellation across threads is really that important. But if we do want to do it... [longish text split off into #606, since it doesn't seem to be too related to this issue after all].

Other things to consider: as noted in #285, we might want to capture actual exceptions for each binding, which has the same issues as cancelled_caught, but even more so.

I'm not sure how shielding fits into the above picture. In the fully-decomposed picture, I think a shield would be a separate kind of thing, where you just do with shielded(): ..., since it's above managing the binding stack. Having a .shield attribute on cancel sources or cancel tokens doesn't make much sense conceptually.

Given the above, I'm having trouble thinking of cases where capturing a task's ambient context state in the form of a token is actually useful.

I'm not sure how useful the source/token distinction is for trio, given that the actual message delivery is via the ambient state (unlike C# where the token object is important because you have to manually examine it all the time to check if you're cancelled). And current_effective_deadline is sufficient for examining the current ambient state. Also, since a token's functionality is a subset of a source's functionality, we always add tokens later without breaking anything. (So e.g. we'd still have to support with source: ..., but that's fine, it'd just be a shorthand for with source.token: ....)

So I think the 'unbound scopes' idea captures most of the valuable parts of the "fully decomposed" design, except that I'm a little nervous about bindings – it's a little weird to have cancelled_caught / #285 state and shielding associated with the scope rather than with a with block.

If CancelScope became a public class with a public constructor, and we want to transition from with open_cancel_scope() as scope: ... to scope = CancelScope(); with scope: ... as being the primitive operations... then we have the opportunity to make scope.__enter__ return whatever we want, it doesn't have to return self. It could return something like a binding object. Or return None for now, and we reserve the right to add something like a binding object later.

This would cause some disruption for move_on_after etc., though, since they return the actual cancel scope object, and it is fairly ordinary to call .cancel() on this object, as well as to check .cancelled_caught. I suppose if we had to we could in the future declare that there's both CancelScope.cancelled_caught and CancelBinding.cancelled_caught, and the former says something like "did any binding catch something" and the latter is more specific.

For shielding... it's a bit weird to have a shielded cancel scope you enter later, or in multiple tasks, or where your scope's shield attribute can get toggled by someone somewhere else who you wanted to let cancel you... but maybe there's no harm in allowing these things? I guess it's worth at least taking a peek at how hard it would be to split shielding off into its own thing. FWIW, currently every non-test use of shielding in trio is exactly with trio.open_cancel_scope(shield=True): .... (This would also let us move shielding into hazmat!)

Possibly the shielding discussion should be split into a separate issue, too, since it's kind of orthogonal to the unbound cancel scopes idea. The cancelled_caught part is more closely related.

CC: @1st1, on the theory that you're probably thinking about similar issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions