-
Notifications
You must be signed in to change notification settings - Fork 1.2k
caml_result: an exception monad in C for resource-safe interfaces #13013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dbfff01 to
4e259db
Compare
runtime/caml/mlvalues.h
Outdated
| lets the caller implement proper cleanup and propagate the | ||
| exception themselves. */ | ||
| typedef struct { | ||
| _Bool is_exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that C99 _Bool cannot be used in public headers for C++ compatibility as C++ doesn't define it. We chose not to include stdbool.h in public headers out of fear of compatibility problems, so we don't have access to bool. The current consensus is to use int for public booleans. See #12787, #12702 for previous discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is also possible to test for C++ and use bool in this case. (This was not possible with #12787.) In the long run C is headed towards deprecating _Bool in favour of bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be tempting to be able to use "_Bool or bool", because this plays nicely (nicer than int) with FFI based on code generation from the C headers. @gadmm tried Rust's bindgen, it creates a nice-looking Rust structure with a bool field.
However, in #12772 (comment) @MisterDA hinted at ABI issues coming from the fact that _Bool and bool are not guaranteed to have the same size. This could be a deal-breaker, right?
I am unfamiliar with these aspects of language interoperability -- it is interesting to have to deal with the absence of a type of booleans in 2024 -- so I welcome guidance. My intuition would be to give up on the nice type and use int (or maybe intnat?) to avoid any potential issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use a C89 type it should probably be int, which is (still) the type of boolean expressions (such as (0 == 1), (a && b), or !p) in C23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use bool/_Bool or int; it should not make any change for Rust (nor to the caml_result type due to padding). I don't think ABI issues should be feared, as OCaml is unlikely to be supported on exotic platforms/compilers.
|
@gadmm told me (private communication) that he is fine with the overall approach if we can gather consensus around this. |
damiendoligez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea and the code looks good modulo a few remarks, but don't merge before @xavierleroy has a look at it.
runtime/signals.c
Outdated
| CAMLexport caml_result caml_process_pending_actions_result(void) | ||
| { | ||
| return caml_process_pending_actions_with_root_exn(Val_unit); | ||
| return caml_process_pending_actions_with_root_result(Val_unit); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much wrapping going on here... I think this function is the same as caml_do_pending_actions_result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just caml_do_pending_actions_result, as there is a check on caml_check_pending_actions first. I inlined/streamlined the code -- and took the opportunity to write a small bit of documentation on the with_root variants. This new commit will need a review because it is less of a mechanical refactoring than the rest of the patch.
|
This looks great, the encoded exception values returned by callback_exn are an easy trap to fall into.
This is the correct choice. The C ABI rules for returning a structure by value are tricky and vary significantly between platforms. It suits the C API (for which we have a compiler to implement those rules), but it would be unnecessarily complicated to follow these rules correctly in the per-platform assembly stubs. |
gadmm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it's a step in a good direction. I have some comments below.
Also, I think one important part remaining is to update the documentation (intf-c.etex), which should show that it is simpler indeed for living in harmony with the GC.
runtime/callback.c
Outdated
| } | ||
|
|
||
| /* support code for caml_result */ | ||
| CAMLexport value caml_run_result(caml_result res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the allusion to runnable monads, but a name in the same vein as caml_raise_if_exception would be much less cryptic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered the following naming choices:
caml_raise_if_exceptionis only in CAML_INTERNALS, so I could reuse that name and usecaml_raise_if_encoded_exceptionfor the old version, which is now only used incallback.c.- I could use consistent names for both versions, for example
caml_raise_if_exception_{encoded,result} - or a new name altogether, for example
caml_get_value_or_raise.
My current preference is (3) -- I like that the name is explicit that it can raise (this is better than caml_run_result), but also that it returns the value otherwise (this is better than caml_raise_if_exception).
I will move caml_raise_if_encoded_exception out of fail.h and into callback.c right away, and I think I will go for (3) as well, but I am interested in feedback on the name in any case.
runtime/callback.c
Outdated
|
|
||
| /* Result-returning variants of the above */ | ||
|
|
||
| CAMLexport caml_result caml_callbackN_result( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hoping to have _result variants for all public functions of the runtime at some point (hence for many private ones as well). Maybe some less noisy convention such as _res would turn out to be preferable in the long run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to have a slightly longer but clearer name. _res could mean various things to me without context, for example resource or resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it is a codebase-wide convention, and it is already clear from the signature that it returns a caml_result? In addition, to me a complete word is more at risk of introducing ambiguity with words that precede: cf. run_callback_result in memprof.c.
runtime/caml/mlvalues.h
Outdated
| lets the caller implement proper cleanup and propagate the | ||
| exception themselves. */ | ||
| typedef struct { | ||
| _Bool is_exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is also possible to test for C++ and use bool in this case. (This was not possible with #12787.) In the long run C is headed towards deprecating _Bool in favour of bool.
runtime/fail_byt.c
Outdated
| { | ||
| caml_result result = caml_process_pending_actions_with_root_result(v); | ||
| v = result.data; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blanket v = result.data; is very unusual so it could benefit from a comment saying that this is really intended.
It looks like one can do without the braces here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version.
caml_result result = caml_process_pending_actions_with_root_result(v);
/* If the result is a value, we want to assign it to [v].
If the result is an exception, we want to continue raising it instead of [v].
The line below does both these things at once. */
v = result.data;
xavierleroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor remarks below. Overall, this looks good, but as discussed privately with @gasche I'd be reassured if this design was tested on the OCaml/Rust interface, to make sure it works well for this particularly demanding library.
|
This needs to be rebased on top of the statmemprof patches, and |
|
I rebased this PR on top of trunk, and took all the standing review comments into account. Thanks a lot for all this detailed feedback. Notable changes:
Besides the naming of |
|
Of course I forgot two points. I said that I would move from @xavierleroy asked whether this Input: h.h typedef unsigned long long value;
typedef struct caml_result {
_Bool is_exception;
value data;
} caml_result;
caml_result caml_process_pending_actions_res();Output: /* automatically generated by rust-bindgen 0.69.4 */
pub type value = ::std::os::raw::c_ulonglong;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct caml_result {
pub is_exception: bool,
pub data: value,
}
#[test]
fn bindgen_test_layout_caml_result() {
const UNINIT: ::std::mem::MaybeUninit<caml_result> = ::std::mem::MaybeUninit::uninit();
let ptr = UNINIT.as_ptr();
assert_eq!(
::std::mem::size_of::<caml_result>(),
16usize,
concat!("Size of: ", stringify!(caml_result))
);
assert_eq!(
::std::mem::align_of::<caml_result>(),
8usize,
concat!("Alignment of ", stringify!(caml_result))
);
}
extern "C" {
pub fn caml_process_pending_actions_res() -> caml_result;
}This suggests that the |
which gets you even closer to the result type in Rust or OCaml. Plus, it leaves open the possibility of adding other kinds of results later, such as |
NickBarnes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
runtime/caml/mlvalues.h
Outdated
| lets the caller implement proper cleanup and propagate the | ||
| exception themselves. */ | ||
| typedef struct { | ||
| _Bool is_exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use a C89 type it should probably be int, which is (still) the type of boolean expressions (such as (0 == 1), (a && b), or !p) in C23.
| CAMLexport caml_result caml_callback_result( | ||
| value closure, value arg) | ||
| { | ||
| return Result_encoded(caml_callback_exn(closure, arg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it would be more effort to make caml_callback_exn (etc) be a wrapper around caml_callback_result, rather than vice versa. But in future, since the "encoded" versions are frowned upon, I can imagine switching them over (and even, for that matter, pushing the change down into caml_callback_asm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to keep using the "encoded exceptions" format for code that is defined in the assembly runtime, because there it is much easier to flip a boolean than to return a structure. Now assuming that all the caml_callback*_asm functions return an encoded exception, it is more logical I thnink to have the _exn version be primitive and the _result version defined on top of it -- otherwise the _exn version would go back and forth, from encoded exceptions to caml_result to encoded exception again.
| CAMLexport value caml_callback (value closure, value arg) | ||
| { | ||
| return caml_raise_if_exception(caml_callback_exn(closure, arg)); | ||
| return encoded_value_or_raise(caml_callback_exn(closure, arg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or caml_get_value_or_raise(caml_callback_result(closure, arg)) ?
runtime/callback.c
Outdated
|
|
||
| /* Result-returning variants of the above */ | ||
|
|
||
| CAMLexport caml_result caml_callbackN_result( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it is a codebase-wide convention, and it is already clear from the signature that it returns a caml_result? In addition, to me a complete word is more at risk of introducing ambiguity with words that precede: cf. run_callback_result in memprof.c.
runtime/caml/memory.h
Outdated
| } | ||
|
|
||
| #define CAMLlocalresult(res) \ | ||
| res.data = Val_unit; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be
| res.data = Val_unit; \ | |
| caml_result res = Result_unit; \ |
or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CAMLlocalresult definition was wrong, and it was not used in the runtime. I believe that it is useful to have such a macro (it is tricky to write, and it is easy to write subtly wrong code by forgetting about it), so what I did was to (1) fix it and (2) tweak the code to use it somewhere (where it is not strictly necessary). As you have seen, #13061 is an offshoot of this work, and I may wait for that one to be resolved to decide what to do here.
I originally introduced CAMLlocalresult because I had a use for it in the runtime, but the previous use-case apparently was just in pre-PR experiments and never made it to the PR. It is notable that it is not strictly necessary in the runtime itself, where the cleanup code in the exception case does not require rooting, and the usage code in the value case is either (1) an immediate return or (2) a continuation assuming that the value is () and never using it again.
runtime/caml/mlvalues.h
Outdated
| lets the caller implement proper cleanup and propagate the | ||
| exception themselves. */ | ||
| typedef struct { | ||
| _Bool is_exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use bool/_Bool or int; it should not make any change for Rust (nor to the caml_result type due to padding). I don't think ABI issues should be feared, as OCaml is unlikely to be supported on exotic platforms/compilers.
| CAMLexport caml_result caml_process_pending_actions_result(void) | ||
| { | ||
| return caml_process_pending_actions_with_root_exn(Val_unit); | ||
| if (caml_check_pending_actions()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what's wrong with return caml_process_pending_actions_with_root_result(Val_unit);. It's better to not duplicate the code in this case.
|
@gasche has already explored the idea of having the result type extensible in the future. However I don't see how it fits with the stability guarantees needed. I don't think there is a decisive argument for/against any of the suggested representations, just pick any one. |
In the context of ocaml#13013, I would like to be able to use CAMLxparam1 on a lvalue that is not a variable: CAMLxparam1(result.data); This is not currently supported, as the definition of CAMLxparam1 uses its argument to build a variable name: #define CAMLxparam1(x) \ struct caml__roots_block caml__roots_##x; \ CAMLunused_start int caml__dummy_##x = ( \ (caml__roots_##x.next = *caml_local_roots_ptr), \ (*caml_local_roots_ptr = &caml__roots_##x), \ (caml__roots_##x.nitems = 1), \ (caml__roots_##x.ntables = 1), \ (caml__roots_##x.tables [0] = &x), \ 0) \ CAMLunused_end Let us break down this definition: 1. What the macro does is to add a new element to a singly-linked list stored in `*caml_local_roots_ptr`. 2. This element has type `struct caml__root_block`, and the structure is initialized by first declaring it and then initializing its field. We use the macro argument `x` to derive the structure name, `caml__roots_##x`. 3. We want to initialize `caml__roots_##x` and also assign `*caml_local_roots_ptr`, but instead of a standard `do { ... } while (0)` block we place these assignments inside the declaration of an unused variable `caml__dummy_##x`. This is done to respect the C89 rule that all declarations must be grouped together at the beginning -- putting a statement here would break this property for declarations coming right after the macro invocation. Now that we use C99, we can use a structure initializer to build the structure value without first declaring a variable of this type, and we don't need to group all definitions at the beginning so our macro can have arbitrary statements. The present commit changes the definition of `CAMLxparam1` to become: #define CAMLxparam1(x) \ *caml_local_roots_ptr = &(struct caml__roots_block){ \ .next = *caml_local_roots_ptr, \ .nitems = 1, \ .ntables = 1, \ .tables = { &(x), } } which supports `CAMLxparam1(result.data)` just fine.
2cbf39a to
483e1f8
Compare
|
I did a round of modification following the reviews, thanks! For now I have used
I did, in a previous iteration of this PR, use an enum with two values: CAML_VALUE and CAML_EXCEPTION (I am not sure if enum names need namespacing in C, but I thought that they did). The result was fairly heavy/cumbersome, because I would then feel tempted to use The only form of extensibility that I could think of was an eventual future version of What I would ideally like to do to have something more future-proof than the current definition is the following:
|
|
Also: @gadmm proposed changing the convention from |
|
Thanks for the feedback, I was going to insist about it :) In the related PR, the _result suffix often leads to ambiguities
|
(statmemprof still uses _exn internally, to be converted next)
Suggested-by: Damien Doligez <damien.doligez@inria.fr>
Suggested-by: Guillaume Munch-Maccagnoni <Guillaume.Munch-Maccagnoni@inria.fr>
Suggested-by: Guillaume Munch-Maccagnoni <Guillaume.Munch-Maccagnoni@inria.fr>
|
I propose documentation for |
|
Note to self: the Coq implementation uses To fix this, Coq would need conditional compilations depending on the OCaml version, or we would have to restore |
|
This is part of the public interface indeed. In fact, I checked that you preserved the functions part of the public interface so it did not raise alarm: |
I made a mistake when introducing caml_process_pending_actions_res (ocaml#13013), currently in trunk `caml_process_pending_actions_exn` is exported in signals.h but its implementation is removed. This function is not CAML_INTERNALS, and Coq uses it. To restore this function I reintroduced a helper function to transform `caml_result` values back into encoded exception (earlier ocaml#13013 designs had this). It is named `caml_result_get_encoded_exception`, in fail.h, CAML_INTERNALS-only.
|
I restore |
I made a mistake when introducing caml_process_pending_actions_res (ocaml#13013), currently in trunk `caml_process_pending_actions_exn` is exported in signals.h but its implementation is removed. This function is not CAML_INTERNALS, and Coq uses it. To restore this function I reintroduced a helper function to transform `caml_result` values back into encoded exception (earlier ocaml#13013 designs had this). It is named `caml_result_get_encoded_exception`, in fail.h, CAML_INTERNALS-only.
In the OCaml runtime, some C functions that in turn call OCaml code come in two variant:
caml_callback, will raise exceptions directly, as OCaml code wouldcaml_callback_exn, will not raise exceptions but encode them into the return valueThe 'encoded exception' variant is important for resource-safe interfaces because the caller may want to run some cleanup/release code before propagating the exception, which the 'raising' variant cannot express -- C code cannot create OCaml exception handlers. On the other hand, the encoding of exceptions into the
valuetype is a big hack that is unsafe in two different ways:valueencoding exceptions are not valid OCaml values, the GC will blow up if it encounters them; callers have to follow a strict discipline of checking for an exception right away and discarding the fakevaluevaluetype means that the C compiler cannot check that we use these exceptions as intended, and will happily let us mix them with real values by mistake, making the dynamic unsafety harder to guard againstIn #12407, @gadmm proposed to extend the approach of 'encoded exceptions' to more parts of the runtime, to make it possible to write resource-safe code in more places, but @xavierleroy pushed against the design due to the unsafety of encoded exceptions.
To be able to move forward, we need an approach for resource-safe APIs that is consensual; the present PR, which grew out of the discussios in #12407 and in particular discussions with @gadmm, is a proposed approach.
The PR proposes a new 'result' variant:
caml_callback_resultis similar tocaml_callback_exn, but instead of returning an unsafevalueit returns a new, distinct C typecaml_result, that properly represents either a value or an exception. This is more safe:value, reminding us to deal with the exception in the callerThe present PR:
caml_resulttype,_resultvariants for all functions that previously had an_exnvariant, andFor the
_exnvariants, the ones that were part of the public API are kept for compatibility, and those that were under CAML_INTERNALS are removed.(The places where encoded exceptions are retained are the implementation of the
caml_callback*_exnfunctions, which touch the bytecode interpreter and the assembly code in the runtime system. I preferred to leave them as-is, and wrap them insidecaml_resulton the outside.)