Skip to content

Ppx: add support for begin%lwt e1; e2; end in ppx extension#525

Closed
kandu wants to merge 7 commits intoocsigen:masterfrom
kandu:ppx_sequence
Closed

Ppx: add support for begin%lwt e1; e2; end in ppx extension#525
kandu wants to merge 7 commits intoocsigen:masterfrom
kandu:ppx_sequence

Conversation

@kandu
Copy link
Copy Markdown
Contributor

@kandu kandu commented Dec 25, 2017

An alternative for the old >> sequence operator.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 25, 2017

Don't mind the AppVeyor failure, that is because ocamlfind.1.7.3-1 is not yet moved into the Windows opam repository.

I tried this branch locally, but it doesn't seem to generate correct code:

let () =
  let p : unit Lwt.t =
    begin%lwt
      Lwt_unix.sleep 1.;
      Lwt.return ()
    end
  in
  Lwt_main.run p

This doesn't wait for one second. However, deleting the Lwt.return () causes it to wait, because there is no sequence.

Is this similar to #307, "Support ;%lwt as sequence syntax -- backwards incompatible in some cases?" I am assuming that begin%lwt ... ; ... end, [%lwt ... ; ...], and ... ;%lwt ... are all indistinguishable to the PPX. If these are more or less the same, can you comment on any of the issues raised in #307? I don't think we have to be as cautious as we thought back in #307, because IIRC there aren't many users of [%lwt ...], but nonetheless.

cc @hcarty @raphael-proust

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 26, 2017

I rechecked the code and some tests(including your 'sleep 1 sec' code). It works properly.

let pl= Lwt_io.printl
let pi= Lwt_io.printf "%d\n"
let pe= print_endline

let t=
  begin%lwt
    pl "1";
    pi 1;
    begin
      begin%lwt
        pl "2";
        pi 2;
      end |> ignore;
      pe "s1";
      pl "s2";
    end;
    pl "3";
  end

expands to

let pl = Lwt_io.printl
let pi = Lwt_io.printf "%d\n"
let pe = print_endline
let t =
  Lwt.bind (pl "1")
    (fun () ->
       Lwt.bind (pi 1)
         (fun () ->
            Lwt.bind
              ((Lwt.bind (pl "2") (fun () -> pi 2)) |> ignore;
               pe "s1";
               pl "s2") (fun () -> pl "3")))

and for

let () =
  let p : unit Lwt.t =
    begin%lwt
      Lwt_unix.sleep 1.;
    end
  in
  Lwt_main.run p

expands to

let () =
  (* the new mapper doesn't expand non-sequence expr and the old style [%lwt ...] wrap works *)
  let p : unit Lwt.t = Lwt.catch (fun () -> Lwt_unix.sleep 1.) Lwt.fail in
  Lwt_main.run p

it's mostly the same as #307 except that it expands all immediately nested sequences at once.
i.e. we don't have to write ;%lwt after all the exprs, the code below just fine

begin%lwt
  e1;
  e2;
  e3;
  e4;
end

and for compatibility. Maybe the scheme that for all unmatched Pexp_, we just wrap it with a Lwt.catch is not a good idea since it brakes furthuer extension or new introduced syntax. For such compatibility, we may break it as soon as possible(in a major version) to prevent the costs from increasing in the future.
The sequence semantics is essential in logical flow. In my opinion, it's a good idea to take a place for it. If necessary, take some trivial guys's place.

#307 and this pr are dealing with the same problem, as you mentioned, talk could continues in that thread.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 26, 2017

You're right, it works. I must have forgotten eval `opam config env` at some point, or something like that. Sorry about that.

This PR is nice, but a thing that bothers me a little is that with this PR,

e1;%lwt
e2;
e3;

propagates the effect of ;%lwt to all the sequence expressions. Maybe this is not a big deal, and we can deliberately fail to mention ;%lwt as an option for triggering this rewriting, and/or discourage it.

It also defeats this advantage of #307:

This is a particularly useful addition for 4.04.0 and on as it allows non-Lwt ; sequence and Lwt ;%lwt sequence operations to intermingle without extra parens.

I'm not sure how important that is, @hcarty?

I just did a revdeps search in opam. [%lwt ...] is used in only two packages, ocsigen and sqlexpr. They use the current exception-handling expansion, and don't have any nested sequence expressions inside [%lwt ...].

Interestingly, I found ;%lwt used in csv and csv-lwt (https://github.com/Chris00/ocaml-csv). I'm not sure what's going on there, as there is preprocessing and generated files.

I'd say the effects on users from changing this will be pretty small, but let's discuss a bit.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 26, 2017

Hm, I wonder if it's possible to reliably check for ;%lwt vs. [%lwt ...] or begin%lwt ... end by checking the location of the extension point.

distinguish the help msg from cmd args by inserting a space before.
…e expansion

{| begin%lwt e1; e2; ... end |} expands all immediately nested sequences while {| e1;lwt e2 |} only takes effect between e1 and e2
…wt.ppx works in compatible mode. i.e. Lwt.catch all [%lwt ...] expressions
@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 27, 2017

To write code that combines regular sequence and lwt sequence, ;%lwt is really handy. We often encounter this situation when we are gluing up non-lwt code into lwt-thread.

As you suggested, ;%lwt is appended after the head expression inside the Eexp_sequence expr, [%lwt e1;e2 ] and e1;%lwt e2 can be distinguished by their positions.
Sadly, I didn't find a solution to distinguish [%lwt ... ] between begin%lwt .. end, so I add a new cmd option to lwt.ppx: -begin-sequence.

When -begin-sequence is not triggered, the ppx extension works in compatible mode plus the ability of e1;%lwt e2 t expansion. 'Compatible mode' means that lwt.ppx still works as old behavior, Lwt.catch [%lwt expr] works, even for [%lwt e1;e2; ... ] or begin%lwt e1; e2; ... end.

After the trigger, e1;%lwt e2 and begin%lwt ... end both take effect, while e1;%lwt e2 is intended for gluing code and begin%lwt ... end is for normal lwt sequence code.
Now, lwt.ppx doesn't generate Lwt.catch for [%lwt Pexpr_sequence (e1, e2) ] or begin%lwt ... end. The new sequence syntax takes the place of it. Unlike e1;%lwt e2, begin%lwt e1; e2; ... end propagates the effect to all immediately nested sequences to prevent us from wagging the ;%lwt tail after all the sequence statements which is not elegant.

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 27, 2017

But I still think that mixing non-lwt and lwt sequence with ;%lwt is not a good idea. We don't take too much advantage of it.
We are not supposed to write glue code all the day and when writing, clarity is more important than convenience, as they are gluing up between inside/outside lwt world, which is likely to introduce bugs.

;%lwt looks messy and is likely to cause messy buggy code when refactoring code.

let t=
  unit;
  unit;%lwt
  unit;
  unit;%lwt
  unit;
  'a lwt

compiles,
swap line 1 with line 2

let t=
  unit;%lwt
  unit;
  unit;
  unit;%lwt
  unit;
  'a lwt

still compiles with no warning。

In fact, ;%lwt allow you to write whatever you want in any sequence since it has no clean boundary, (in other words, it automatically make a boundary between former and later statements, which hides logical error silently)
compiler doesn't help in ;%lwt polluted code.

I doubt if It is worth introducing the ;%lwt syntax.

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 27, 2017

shall we add -semicolon-sequence to cmd args? by default is turned off.

Copy link
Copy Markdown
Collaborator

@aantron aantron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks pretty good. I'd like to wait for @hcarty's opinion as well, as he had submitted #307. Concerning compatibility, given that 3.2.0 was just released, we can probably sneak in an extra announcement that we are going to disable [%lwt ...] (#527). We can then adjust the PPX so that if anything other than sequence, let, etc., is found under [%lwt ...], it will be an error. This should be enough to catch all mistaken uses of it, and force them to be changed.

We can then unconditionally enable begin%lwt ... ; ... end and ;%lwt.

I'm not sure what you mean by that sequence code compiling without warnings – I wouldn't expect it to compile at all. Doesn't all the stuff before ;%lwt have to be of type unit Lwt.t, not just unit? Maybe I am not reading the pseudocode correctly.

"-log", Set log, " enable logging";
"-no-log", Clear log, " disable logging";
"-no-sequence", Clear sequence, " disable sequence operator";
"-no-strict-sequence", Clear strict_seq, " allow non-unit sequence operations";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I just tried looking at the PPX output, ouch.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help* output.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kandu, I cherry-picked this into Lwt master, see 35a7270. Thanks again!

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 28, 2017

ah yes, the pseudocode needs some tweaking

let t=
  unit;
  unit Lwt.t;%lwt
  unit;
  unit Lwt.t;%lwt
  unit;
  'a Lwt.t

e.g.

open Lwt
let t=
  ignore ();
  return ();%lwt
  ignore ();
  return ();%lwt
  ignore ();
  return "bug"

let main ()=
  t >>=
  Lwt_io.printl

let ()= Lwt_main.run @@ main ()

@Drup
Copy link
Copy Markdown
Member

Drup commented Dec 28, 2017

propagates the effect of ;%lwt to all the sequence expressions. Maybe this is not a big deal, and we can deliberately fail to mention ;%lwt as an option for triggering this rewriting, and/or discourage it.

Errk, I'm sorry, this is a deal breaker, as far as semantics goes. It's just completely non-sensical. If that's the case, You are better off removing ;%lwt.

I'll get a look at things in more details later, but that particular semantics is just plain wrong.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 28, 2017

@Drup, @kandu fixed that behavior in a subsequent commit. In its current form, the PR detects whether the %lwt is post-fixed to the ;, or surrounding it, by comparing the locations of the %lwt and the ; (%lwt will have a lesser offset if it surrounds ;, a higher offset if it is ;%lwt). In case it is ;%lwt, it doesn't propagate the effect to nested sequences, which gives more sensible behavior. I'm not sure how "wise" it is to rely on locations and perform this check, but that's how it is now :)

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 28, 2017

@kandu In that pseudocode, I don't see how you could rearrange the ; and ;%lwt in such a way that the code would become wrong, yet you would get no warning.

Also, @kandu, did you use the >> syntax?

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 29, 2017

@aantron errors may occur in the logical flow, to be specific, critical section.

The sequence nested in begin end is atomic in lwt thread.
With begin end and begin%lwt end, the boundary is set up, and the compiler is aware of it, which prevent us from breaking the atomicity inadvertently.

It's more convenient using begin%lwt end in the scenario which we write pure lwt sequence embed begin end nested atomic regular sequence. And the only scenario in which ;%lwt is more convenient is when we are gluing up regular and lwt sequence together. As the code shows above, ;%lwt has no clean boundary between outside/inside lwt world, after rearranging some lines of our code, the atomicity breaks silently. In this scenario, 'no warning' itself is a warning.

semicolon-sequence was enabled by default, given the above, commit Ppx: -semicolon-sequence to trigger ;%lwt expansion was submitted in order that lwt users can choose their coding style themselves. And by defaut, the ;%lwt expansion is disabled now. With this ppx option, it's just fine to keep the expansion there.

As for >> syntax, it confused me and wasted a lot of my time before. Recently, I've been adding lwt support for some monadic libraries, which are very dependent on >>= >> operators, so >> expansion is and will be always disabled.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 29, 2017

With begin end and begin%lwt end, the boundary is set up, and the compiler is aware of it, which prevent us from breaking the atomicity inadvertently.

Not sure what you mean here, but with begin%lwt end, all the nested ; immediately below become places where atomicity is broken. So, just adding the %lwt to begin converts an atomic sequence of ; to one where Lwt can interrupt the control flow after every ;.

As the code shows above, ;%lwt has no clean boundary between outside/inside lwt world, after rearrange some lines of our code, the atomicity breaks silently.

I'm not sure about this, the only parts of the code that aren't executed atomically are exactly at the places where ;%lwt is used. I don't think that's silent, it's actually very syntactic, even more so than begin%lwt end. It is visually localized. Also, if a user rearranges the code too carelessly, they should get warnings (or errors), as ;%lwt and ; expect the left side to be of different types, and ;%lwt also requires that the right side have a type in _ Lwt.t, whereas ; only allows that (though this second condition is less likely to be violated when a user is rearranging code).

On that subject, we should probably make begin%lwt end and ;%lwt insensitive to -strict-sequence: they should always be strict.

And, I think Lwt users can choose the coding style by simply choosing to use ;%lwt or not – I don't think it requires an option, at least not for that specific purpose. I think with the location check you added (thank you), there is no way for a user to trigger the behavior that ;%lwt triggers, except by literally writing ;%lwt, so the option is redundant with simply not writing that syntax.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 29, 2017

Also, cc @paurkedal due to work on ocaml/ocaml#1474.

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Dec 29, 2017

with begin%lwt end, all the nested ; immediately below become places where atomicity is broken. So, just adding the %lwt to begin converts an atomic sequence of ; to one where Lwt can interrupt the control flow after every ;.

I know what you mean here. lwt sequence is interruptable semantically. And what I was concerning about is regular sequence, which is seen as semantically atomic in a lwt thread.

coding style 1

open Lwt
let t=
  begin%lwt
    do_lwt 1;
    do_lwt 2;
    begin
      do_atomic 1;
      do_atomic 2;
      return ();
    end;
    do_lwt 3;
    do_lwt 4;
  end

coding style 2

open Lwt
let t=
  do_lwt 1;%lwt
  do_lwt 2;%lwt
  do_atomic 1;
  do_atomic 2;
  do_lwt 3;%lwt
  do_lwt 4;

They are logically equivalent.
do_atomic 1 get a value from a place, calc(value), do_atomic 2 put the calc(value) back to the place.

In the former style, lwt sequence is forbidden in the begin end chunk(except the last return ()), users can not break the atomicity inadvertently.

In the later style, users are allowed to insert, e.g. Lwt_unix.sleep 1. or any unit Lwt.t expression between do_atomic 1 and do_atomic 2, and then the compiler says, 'It's ok, no warning.' let alone their indentation are the same.

thread 1 do_atomic 1 -> interrupt;%lwt -> thread 2 do_atomic 1
Boom! you hit the critical section bomb.

In vim or spacemacs editor, we just type the keybord three times, dd p, and then:
no warning -> Boom!

humans make mistakes. I suggest keeping the ppx option there. Or we can give users some tips about it in the doc.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 30, 2017

Yes, I think we agree that inserting ;%lwt will disrupt a critical section, but my point is that it's already easy for the user not to use ;%lwt at all. I don't see why we need an option to disallow it. We could simply not add ;%lwt at all in this PR (i.e., check the location, and if the offset is greater than the sequence expression's, don't rewrite at all, but raise an error). We could leave ;%lwt to be discussed in #307, or @hcarty may give an opinion in 2018.

But I don't see why we should add ;%lwt, which is a syntax you have to choose to write, so it is inherently opt-in, and then make it double opt-in by making it conditional on an option. I guess arguments can be made about policies in large projects, but all that seems easier to deal with than complicating build systems by having users add options.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Dec 30, 2017

Also the point about no warnings is a bit odd. It is visible to the user that the user is adding ;%lwt. I thought you meant there was some way to disrupt a critical section, that was not syntactically obvious, and the compiler did not give a warning.

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Jan 2, 2018

One thing about the begin%lwt ... ; ... end syntax that may be an issue, is that, I believe, refactoring the inside so that the first expression is a let, makes that let effectively a let%lwt, and makes the rest of the semicolons back into regular semicolons:

begin%lwt
  let foo = bar in
  blah;
  blah;
end

This seems undesirable. I think this kind of change is pretty common. I make it often.

In fact, this makes me think that we should take the location check in this PR, that distinguishes between begin%lwt ... ; ... end and ;%lwt, and see if we can apply it to all the other syntax as well, and completely ban rewriting [%lwt ... ] and begin%lwt ... end due to having non-obvious effects on their nested subexpressions, and due to being resistant to reasonable refactoring.

I'm not fully sure about this, discussion welcome. @Drup, what do you think?

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Jan 2, 2018

A slightly curious thing is that with the location check in this PR, ;%lwt doesn't interfere with existing uses of [%lwt ...] anymore, so adding it doesn't involve any breaking changes. cc @hcarty again (sorry :p)

aantron added a commit that referenced this pull request Jan 2, 2018
This commit still doesn't detect [%lwt let ...], etc. That is currently
being discussed in #525.
@aantron aantron mentioned this pull request Jan 2, 2018
@paurkedal
Copy link
Copy Markdown
Contributor

I think there is merits to the arguments for begin%lwt ... end, but I worry about the need to distinguish [%lwt ...] from what is meant to be equivalent shortcuts. An alternative solution might be to introduce another extension point id for this, e.g.lwt.serial:

begin%lwt.serial
  let a = ... in
  let%lwt b = ... in
  blah;
  blah;
end

One might analogously have a begin%lwt.join ... end, though any internal lets would be serialised.

I think the main weakness of using begin is that it's overly verbose for the common case of only a few serialised statements. Using a longer extension point ID doesn't help, of course, though at least it does not take up any precious space as formatted above.

@paurkedal
Copy link
Copy Markdown
Contributor

Introducing more extension point IDs, the begin construct could be dropped. E.g. @kandu's nested example could we written:

let t =
  [%lwt.serial
    do_lwt 1;
    do_lwt 2;
    [%lwt.atomic
      do_atomic 1;
      do_atomic 2;
      return ()
    ];
    do_lwt 3;
    do_lwt 4
  ]

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Jan 3, 2018

and then make it double opt-in by making it conditional on an option. I guess arguments can be made about policies in large projects, but all that seems easier to deal with than complicating build systems by having users add options.

I think the main weakness of using begin is that it's overly verbose for the common case of only a few serialised statements.

I agree. Commit 74f2aae is reverted

refactoring the inside so that the first expression is a let, makes that let effectively a let%lwt

It's an endogenous issue from the implementation of the OCaml compiler, that begin end is not semantically strict to be a surround of sequences but a generic one.
Not a big issue, compiler will interrupt the effect by compaining about type mismatching.

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Jan 3, 2018

@aantron as you are working on PR #534, subsequent modifications for this PR(removing the compatible mode etc.) can be done by you if it's going to be merged.

@Drup
Copy link
Copy Markdown
Member

Drup commented Jan 3, 2018

Honestly, I think using locations to distinguish how ppx nodes are placed is a horribly broken idea. It's very brittle to code transformation tools, other ppxs, or simply other syntaxes like reason. It's basically begging to be broken.

The fact that [%lwt let ... ] is indistinguishable from let%lwt ... is one of ppx's (dubious, if you ask me) design decision and other tool might rely on it, we just have to live with that.

In essence, this pr/discussion is about introducing a do notation specialized for lwt. Adding a new ppx keyword for this seems perfectly acceptable to me, and it's going to be a lot more robust than whatever you can cook by playing with locations. There are multiple ppx do notations around that we can take inspiration from, or we could just bless one in particular.

@kandu
Copy link
Copy Markdown
Contributor Author

kandu commented Jan 3, 2018

At this point, I agree with @Drup

Honestly, I think using locations to distinguish how ppx nodes are placed is a horribly broken idea.

and @paurkedal

An alternative solution might be to introduce another extension point id for this, e.g.lwt.serial

This PR is going to be closed if no objection.

The location info may help in other ways:

Sadly, I didn't find a solution to distinguish [%lwt ... ] between begin%lwt .. end, so I add a new cmd option to lwt.ppx: -begin-sequence.

After looking into the parsetree, I think It could be done by simply comparing the location of exp and trigger
i.e. appending a line of code

    and pos_exp= exp.pexp_loc.Location.loc_start.Lexing.pos_cnum 

after eba5ec0#diff-caac464b5d25fba7598ee925a51b5c01R239

and because exp and trigger_loc are paramters of lwt_expression

let lwt_expression mapper exp attributes (trigger_loc:Location.t)=

we can distinguish [%lwt keyword] between keyword%lwt before any specific keyword expression expansions(the third note in #534).

@aantron
Copy link
Copy Markdown
Collaborator

aantron commented Jan 3, 2018

There are multiple ppx do notations around that we can take inspiration from, or we could just bless one in particular.

We should probably take a survey of them.

So the proposal is:

@kandu I think we want to move away from comparing locations completely :)

@kandu kandu closed this Jan 3, 2018
@hcarty
Copy link
Copy Markdown
Contributor

hcarty commented Jan 3, 2018

I'm just starting to look at this after being away from a computer for a while.

I think ;%lwt may be worth adding now that the current [%lwt ...] interpretation is being renamed to [%lwt.catch ...] or similar. It's still ugly but I really like the symmetry I mentioned in #307 (comment). Code can be adapted to/from an Lwt context simply by adding/removing the %lwt suffix.

If we do keep [%lwt ...] my preference would be to treat everything in ... as if it has %lwt applied. For example these would be equivalent:

[%lwt
Lwt_io.printl "Hello";
Lwt_io.printl "world"
]
Lwt_io.printl "Hello";%lwt
Lwt_io.printl "world"

These would be equivalent as well:

[%lwt
Lwt_io.printl "Here we go";
match i'm_a_promise with
| Some x -> Lwt.return x
| None -> Lwt.fail Exit
]
Lwt_io.printl "Here we go";%lwt
match%lwt i'm_a_promise with
| Some x -> Lwt.return x
| None -> Lwt.fail Exit

I don't know if that's a good change to make. The explicit %lwt suffixes keep context nice and local. That is, however, what I assumed [%lwt ...] would mean when Lwt's ppx syntax was first introduced.

@Drup
Copy link
Copy Markdown
Member

Drup commented Jan 4, 2018

@aantron I agree with the proposal. :) Let's just drop "%lwt as Lwt.catch", add ;%lwt, and then look at the situation for do notations.

@paurkedal
Copy link
Copy Markdown
Contributor

If we do keep [%lwt ...] my preference would be to treat everything in ... as if it has %lwt applied. For example these would be equivalent:

@hcarty Without location-hacks, that would mean that the let%lwt, match%lwt, and if%lwt would also propagate their bind-behaviour into sub-expressions, which would be backwards-incompatible and probably not what we want.

@hcarty
Copy link
Copy Markdown
Contributor

hcarty commented Jan 4, 2018

@paurkedal Ah, of course - thank you for pointing that out. Then I take back that suggestion as it is even worse than I originally though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants