Skip to content

Expose fiber primitives for continuation cloning#10894

Merged
Octachron merged 1 commit intoocaml:trunkfrom
dhil:expose-fiber-primitives
Sep 19, 2022
Merged

Expose fiber primitives for continuation cloning#10894
Octachron merged 1 commit intoocaml:trunkfrom
dhil:expose-fiber-primitives

Conversation

@dhil
Copy link
Copy Markdown
Contributor

@dhil dhil commented Jan 13, 2022

This PR makes available the alloc_stack_noexc and
rewrite_exception_stack from runtime/fiber.c in the C API. In
accordance with the C API the functions are renamed to
caml_alloc_stack_noexc and caml_rewrite_exception_stack,
respectively.

My motivation for this change is to be able to support multi-shot
continuations in OCaml via a library. I am interested in exploring
design and programming with multi-shot continuations in my own
research. To facility this research, I am in the process of building a
library that provides
multi-shot continuations on top of OCaml's linear continuations. Under
the hood, the library uses a variation of the clone_continuation
primitive that used to reside in the Obj module in previous versions
of Multicore OCaml. In order to implement clone_continuation as
library I need access to the two above mentioned fiber manipulation
primitives in runtime/fiber.c which are currently marked
static. It is not clear to me whether there is a good reason for
having them as static, whereas if they are non-static then I am able
to implement continuation cloning externally. I need access to
alloc_stack_noexc in order to allocate enough space for a
carbon-copy of an existing stack segment, and
rewrite_exception_stack in order to patch up exception pointers for
cloned stack segments.

In my current implementation I duplicate the implementations of these
primitives in order to get going. Obviously, it would be a lot better
and more robust if I could simply reuse the implementations in the
OCaml runtime directly.

As of writing, this PR exposes only the two primitives that I need to
get my minimal implementation of multi-shot continuations
going. However, I think there is a boarder discussion to be had about
whether we should make available the rest of the fiber manipulation
primitives in runtime/fiber.c too.

CC @kayceesrk and @dra27, both of whom I have had previous
conversations with about proposing this change.

CC @gasche and @gadmm with whom I have had discussions about adding
multi-shot continuations to OCaml via a library.

@dhil dhil force-pushed the expose-fiber-primitives branch from 55ac084 to 912757c Compare January 13, 2022 16:03
@kayceesrk
Copy link
Copy Markdown
Contributor

I am for this change.

The continuation cloning facility in Multicore OCaml has been used by researchers as a vehicle for exploring topics in effect handlers and delimited continuations. In particular, @dhil developed a multicore OCaml backend for links [1], Oleg and I developed an implementation of Eff language in Multicore OCaml [2]. Many other recent papers also use OCaml multishot continuations [3,4]. There is interest in the community for this feature [5] for exploring probabilistic programming languages and model checkers.

It was the right decision to remove support for continuation cloning from the compiler since it makes reasoning about resources difficult and breaks compiler optimizations. However, it would be useful to encourage innovation in the ecosystem. The ocaml-multicont library [6] from @dhil allows that by rebuilding the support in an external library. The PR makes it a little bit more robust.

Of course, it would be unwise to provide API stability for this feature and the runtime should be free to make any changes as it deems fit.

[1] https://dhil.net/research/papers/mlabstract2016.pdf
[2] https://arxiv.org/abs/1812.11664
[3] https://www.microsoft.com/en-us/research/publication/generalized-evidence-passing-for-effect-handlers/
[4] https://dl.acm.org/doi/pdf/10.1145/3408975
[5] https://discuss.ocaml.org/t/multi-shot-continuations-gone-forever/9072
[6] https://github.com/dhil/ocaml-multicont

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 14, 2022

This might require some advice for avoiding broken programs due to incorrect compiler optimizations. @kayceesrk Do you have any advice?

@kayceesrk
Copy link
Copy Markdown
Contributor

Do you have any advice?

The general advice is to avoid side-effects and linear resources in the computation closed over by the cloned continuation. Here is an example that breaks compiler optimization which converts a heap allocation to a stack allocation. Continuation cloning makes it harder to reason about programs that use linear resources. This advice is best included in the ocaml-multicont library.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 14, 2022

Avoiding side-effects might be too strict; there is nothing wrong with printf for instance. It also seems that the conversion of a heap allocation into a stack allocation can be disabled by hand as shown in the example, it is error-prone but possible. Are there other ways to turn off this optimisation?

@kayceesrk
Copy link
Copy Markdown
Contributor

I'm not sure what we're aiming for with the advice. If it is crash safety, then there should be no restriction. Perhaps you may have other features in mind.

Of course, there's the challenge of ensuring that the linear resources are used linearly. But that's a "feature" of continuation cloning. For example, you may end up double-freeing a malloc'ed memory region and crash, but I would consider this to be an improper use of the linear resource, and not an issue with continuation cloning.

For the ref optimization, perhaps Sys.opaque_identity would be enough? There may be other optimizations on references that may break similar to how weak memory models breaking optimizations, but I haven't thought about this enough.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 14, 2022

I am asking specifically about broken compiler optimisations (i.e. linearity assumptions made by the compiler), since this is not something that the programmers can guess. I think you answered my question.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Jan 14, 2022

I think that technically the optimisation of references relies on an escape analysis, not linearity (although maybe the two are linked ?).
Effects, and in particular continuations are messing with this by making some variables escape without any way for the compiler to notice it. Typed effects might allow for an analysis correct in presence of continuations, but they're not there. Single-shot continuations are likely fine, since from the interrupted code's point of view performing the effect should be equivalent to calling a function that may or may not return, but not knowing the implementation details I can't be sure.
I don't think it's evident that it's the optimisation that is broken here: not allocating local references has been a key feature of OCaml for a long time, so I'd consider that the expectations from the linked example are wrong. Hopefully we'll have the occasion to discuss this in depth before 5.0 is released.

In any case, passing the reference to a non-simplified function call (typically Sys.opaque_identity) will force the compiler to consider the reference as escaping, disabling the optimisation.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 14, 2022

I think you mean that linearity of use (of the continuation) is different from linearity of reference (uniqueness), i.e. why we don't talk about linear “types”.
I agree that the program is broken, not the optimisation. So if you want a clear guideline for 5.0, it is fine to say that naked refs should be avoided.
It is going to be a bit of a bigger problem than solved by having an effect type system inform the escape analysis, because with an effect system you expect some polymorphism that lets you do fewer effects in a context that is ready to handle more effects. You cannot have such polymorphism here: whether the optimisation is done or not gives two different version of a function, and you do not want to use one instead of the other (concretely this would mean disabling the optimisation even when it is valid).

@kayceesrk
Copy link
Copy Markdown
Contributor

We all seem to agree that there is an issue with refs and continuation cloning. I think the compiler shouldn't attempt to give any recommendations. Such discussions and recommendations may be part of the multicont library.

Leaving this aside for the moment, and getting back to the proposed PR, do we have issues with the proposed changes of making caml_alloc_stack_noexc and caml_rewrite_exception_stack available externally?

CC @damiendoligez who has previously reviewed this code.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 23, 2022

Is the proposed approach the best way to support resumptions (GC-handled multishot continuations) in the medium-long term?

I'm asking because the current scenario looks a bit similar to what happened in delimcc, or parts of Coq. The project starts using internal primitives for good reasons; the internal primitives were not conceived as a robust API, they happen to be on the path to the least-effort external implementation of the desired feature. But then other time the internal runtime code evolves, and the external code breaks often and needs updating, to the point that (1) it could have been more robust to design a proper API from the start, provided we had a good vision on what the feature request really was (sometimes it's hard to know because downstream projects won't really tell us).
So we start from something that looks like a simple, minimal choice, and then again and again in the following decade we are forced to work on those downstream packages to support them, and it's delicate because there are few people around with the required internal runtime knowledge and time on their hands.

Are we exposing ourselves to the same issue here? Is there a better way to do things? One advantage we have is that we have a clear specification for the downstream need: multishot continuations.

  1. How much work would it be to implement (and maintain) the multishot primitives in the runtime?
  2. Would it make it easier to evolve the OCaml runtime and the stack handling, easier than exposing lower-level stack primitives as proposed?

The two primitives needed for "resumptions" (multishot continuations) are:

  • promote: converts a linear stack fragment (a continuation) into a GC-handled representation of the stack fragment (a resumption)
  • resume: copies a resumption back into a new linear stack fragment.

Remark: currently the implementation of ocaml-multicont is not to store the resumption stack on the heap, but to have a small resumption block on-heap own point to the (linear) resu mption stack, with a finalizer attached to drop the resumption stack if the resumption becomes dead. The code is available at https://github.com/dhil/ocaml-multicont/blob/3f371bc/multicont_stubs.c, where multicont_alloc_stack_noexc and multicont_rewrite_exception_stack are local copies of the runtime primitives that are currently not exported.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Jan 24, 2022

In fact I think the only primitive you need (if you go all-in on the "resumption = finalized GC-managed fiber" approach) is reinstating a fiber by copy, without consuming it. Implementation-wise it is the same, but as an API this reduces the surface area and maybe it makes it easier to define expectations.

Copy link
Copy Markdown
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people were in favour of this PR but it fell off the radar. It is a trivial change that exports existing functions. In terms of API there is no risk since they are still internal and can be changed later on. So I think it is harmless even for 5.0 (and having it in 5.0 will make it a bit simpler for @dhil).

The discussion went into "what API should OCaml export ideally?" but this should not side-track this PR. Still if you would like to export directly a more useful API, how about my suggestion of simply having a function that reinstates a fiber by copy, which is reasonably defined and is all that you need in the latest design (if I trust the comment by my younger self)? (Or perhaps it is too early to settle on this design.)

I think there is a boarder discussion to be had about whether we should make available the rest of the fiber manipulation primitives in runtime/fiber.c too.

What do you have in mind?

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Guillaume's reasoning. Unless someone objects, I think we can have this in 5.0.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Aug 27, 2022

Nobody objects

@dhil
Copy link
Copy Markdown
Contributor Author

dhil commented Aug 27, 2022

Sounds good to me. I will bring the patch up to date with trunk soon.

I think there is a boarder discussion to be had about whether we should make available the rest of the fiber manipulation primitives in runtime/fiber.c too.

What do you have in mind?

I don't recall now, hehe. The fiber implementation has changed sufficiently since my comment that it probably has become dated.

This PR makes available the `alloc_stack_noexc` and
`rewrite_exception_stack` from `runtime/fiber.c` in the C API. In
accordance with the C API the functions are renamed to
`caml_alloc_stack_noexc` and `caml_rewrite_exception_stack`,
respectively.

My motivation for this change is to be able to support multi-shot
continuations in OCaml via a library. I am interested in exploring
design and programming with multi-shot continuations in my own
research. To facility this research, I am in the process of building a
[library](https://github.com/dhil/ocaml-multicont) that provides
multi-shot continuations on top of OCaml's linear continuations. Under
the hood, the library uses a variation of the `clone_continuation`
primitive that used to reside in the `Obj` module in previous versions
of Multicore OCaml. In order to implement `clone_continuation` as
library I need access to the two above mentioned fiber manipulation
primitives in `runtime/fiber.c` which are currently marked
`static`. It is not clear to me whether there is a good reason for
having them as `static`, whereas if they are non-static then I am able
to implement continuation cloning externally. I need access to
`alloc_stack_noexc` in order to allocate enough space for a
carbon-copy of an existing stack segment, and
`rewrite_exception_stack` in order to patch up exception pointers for
cloned stack segments.

In my current implementation I duplicate the implementations of these
primitives in order to get going. Obviously, it would be a lot better
and more robust if I could simply reuse the implementations in the
OCaml runtime directly.

As of writing, this PR exposes only the two primitives that I need to
get my minimal implementation of multi-shot continuations
going. However, I think there is a boarder discussion to be had about
whether we should make available the rest of the fiber manipulation
primitives in `runtime/fiber.c` too.
@dhil dhil force-pushed the expose-fiber-primitives branch from 912757c to b6bd87a Compare August 29, 2022 12:43
@Octachron
Copy link
Copy Markdown
Member

Since this is a small tweak to an internal API, which would open some exploration space for experimental API,
I am planning to merge this change later today without further objections.

@Octachron
Copy link
Copy Markdown
Member

I would be in favor of having change entry with an explicit mention that this is an internal change aimed to experimental libraries.

@Octachron Octachron merged commit 78f2d18 into ocaml:trunk Sep 19, 2022
Octachron added a commit that referenced this pull request Sep 19, 2022
Expose fiber primitives for continuation cloning

(cherry picked from commit 78f2d18)
@Octachron
Copy link
Copy Markdown
Member

Merged and cherry-picked to 5.0 in 611253d .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants