Skip to content

Allow toplevel value to be used for preprocessing#386

Closed
Kakadu wants to merge 8 commits intoocaml:trunkfrom
Kakadu:trunk
Closed

Allow toplevel value to be used for preprocessing#386
Kakadu wants to merge 8 commits intoocaml:trunkfrom
Kakadu:trunk

Conversation

@Kakadu
Copy link
Copy Markdown
Contributor

@Kakadu Kakadu commented Dec 31, 2015

I was building a custom js_of_ocaml toplevel recently and I needed to embed PPX syntax extension. By default it uses system call to invoke preprocessor which doesn't suite js_of_ocaml. The patch allows toplevel values of type Ast_mapper.mapper to be used while preprocessing the source.

P.S. Happy New Year :)

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 1, 2016

cc @diml @whitequark @alainfrisch : what do you think?

@whitequark
Copy link
Copy Markdown
Member

I acknowledge the usefulness of such a feature (even without js_of_ocaml!), but I have a few concerns. The main one is that we will have two distinct ways of adding a ppx. How do they interact with each other? This is especially important, given that I think I'd like to see ocamlfind activating extensions like this.

The minor ones are that the messages are inconsistent with the behavior of #ppx, the name #ppx_mapper does nothing to distinguish itself from #ppx in a useful way, #ppx_mappers_clear has no equivalent for the out-of-process mappers, and #d... directives are very awkwardly named.

I propose to rename #ppx_mapper to something that better describes its function (but I am unsure what: #inproc_ppx? #internal_ppx?); remove #ppx_mappers_clear or add a #ppx equivalent (I do not have an opinion on which option is preferable); and remove the #d... directives, since their full functionality is exposed from the command line and can also be accessed if you #require "compiler-libs.common"; it's a niche feature with fairly obtuse output which, I think, doesn't warrant a directive.

@Kakadu Kakadu force-pushed the trunk branch 2 times, most recently from a685a87 to b15f556 Compare January 3, 2016 23:04
@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 3, 2016

The main one is that we will have two distinct ways of adding a ppx. How do they interact with each other? This is especially important, given that I think I'd like to see ocamlfind activating extensions like this.

At the moment toplevel applies all external ppx preprocessors and ad-hoc preprocessors after that. Demo output looks like this:

➜  asp  ocaml
        OCaml version 4.03.0+dev12-2015-11-20

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

#require "compiler-libs.common";;
/home/kakadu/.opam/4.03.0+trunk+fastppx/lib/ocaml/compiler-libs: added to search path
/home/kakadu/.opam/4.03.0+trunk+fastppx/lib/ocaml/compiler-libs/ocamlcommon.cma: loaded
#require "ppx_const";;
# /home/kakadu/.opam/4.03.0+trunk+fastppx/lib/ppx_const: added to search path
/home/kakadu/.opam/4.03.0+trunk+fastppx/lib/ppx_const/./ppx_const: activated
open Parsetree;;
# open Ast_mapper;;
# 
let dumb_mapper =
  { default_mapper with expr = fun mapper e ->
      match e with
      | {pexp_desc=Pexp_constant (PConst_char 'a');_} as e ->
          {e with pexp_desc=Pexp_constant (PConst_char 'A')}
      | e -> default_mapper.expr mapper e
    };;
val dumb_mapper : Ast_mapper.mapper =
  {attribute = <fun>; attributes = <fun>; case = <fun>; cases = <fun>;
   class_declaration = <fun>; class_description = <fun>; class_expr = <fun>;
   class_field = <fun>; class_signature = <fun>; class_structure = <fun>;
   class_type = <fun>; class_type_declaration = <fun>;
   class_type_field = <fun>; constructor_declaration = <fun>; expr = <fun>;
   extension = <fun>; extension_constructor = <fun>;
   include_declaration = <fun>; include_description = <fun>;
   label_declaration = <fun>; location = <fun>; module_binding = <fun>;
   module_declaration = <fun>; module_expr = <fun>; module_type = <fun>;
   module_type_declaration = <fun>; open_description = <fun>; pat = <fun>;
   payload = <fun>; signature = <fun>; signature_item = <fun>;
   structure = <fun>; structure_item = <fun>; typ = <fun>;
   type_declaration = <fun>; type_extension = <fun>; type_kind = <fun>;
   value_binding = <fun>; value_description = <fun>; with_constraint = <fun>}
# #dsource true;;
# let f () = if%const true then fun () -> 'a' else fun () -> 'b';;

let f () () = 'a';;
val f : unit -> unit -> char = <fun>
# #ppx_mapper dumb_mapper;;

#ppx_mapper  dumb_mapper;;
Extension is added. There are 1 extensions at the moment
# let f () = if%const true then fun () -> 'a' else fun () -> 'b';;
Applying mapper

let f () () = 'A';;
val f : unit -> unit -> char = <fun>

The minor ones are that the messages are inconsistent with the behavior of #ppx, the name #ppx_mapper does nothing to distinguish itself from #ppx in a useful way, #ppx_mappers_clear has no equivalent for the out-of-process mappers, and #d... directives are very awkwardly named.

#print_parsedtree, #print_source ?

I propose to rename #ppx_mapper to something that better describes its function (but I am unsure what: #inproc_ppx? #internal_ppx?); remove #ppx_mappers_clear or add a #ppx equivalent (I do not have an opinion on which option is preferable);

Maybe fast_ppx (the same as a file created for this stuff). Adding the directive #ppx_clear should not be difficult.

... and remove the #d... directives, since their full functionality is exposed from the command line and can also be accessed if you #require "compiler-libs.common"; it's a niche feature with fairly obtuse output which, I think, doesn't warrant a directive.

I like that I am able to enable/disable source output when I want it. Also, #recflags is already supported by the toplevel and these three options are not worse than then #recflags.

@whitequark
Copy link
Copy Markdown
Member

Maybe fast_ppx

That's completely uninformative.

these three options are not worse than then #recflags.

# #recflags;;
Unknown directive `recflags'.

Did you mean #rectypes? #rectypes is useful for more than debugging; also, it's exposed in a way different than what you propose (it doesn't accept an argument).

I still really dislike that we have two completely independent ways of applying ppx extensions. How about refactoring the codepath behind #ppx so that both out-of-process and in-process rewriters are applied in the order that you add them? And there would also be only one #ppx_clear.

@whitequark
Copy link
Copy Markdown
Member

I like that I am able to enable/disable source output when I want it.

You can still do that if using Clflags, though.

@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 4, 2016

Yeah, #recflags.

How about refactoring the codepath behind #ppx so that both out-of-process and in-process rewriters are applied in the order that you add them?

I will take a look

@alainfrisch
Copy link
Copy Markdown
Contributor

I share all @whitequark's comments.

Having a single set of ppxs to be applied (with two different calling modes) seems much better. Cookies should be properly threaded between all of them (they can be used to pass information from one to the other).

Instead of taking a Ast_mapper.mapper value, one could consider relying on the registration mechanism in Ast_mapper, i.e. simply load a .cmo/.cma file implementing the ppx logic and registering the mapper through Ast_mapper.register, and then use a toplevel directive with string argument(s) to refer to registered mapper (and pass them arguments). This would make it easier to reuse existing code of ppx that don't necessarily export the mapper value.

@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 8, 2016

Folks, do you think that changing type clflags.mli to

+type ppx_cfg = ExternalPPX of string | LocalPPX of Obj.t
+
...
-and all_ppx = ref ([] : string list)        (* -ppx *)
+and all_ppx = ref ([] : ppx_cfg list)        (* -ppx *)

will be bad idea?

@alainfrisch
Copy link
Copy Markdown
Contributor

will be bad idea?

Yes, indeed. I'd find much better to keep using Clflags.all_ppx to represent command-line parameters and introduce another list (e.g. in Pparse) to represent the two possible kinds of ppx calling mode. There, instead of Obj.t, you could directly refer to the actual type of mapper (or just a local record of mapping functions).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 8, 2016

Note that it should be possible to give the two kinds of ppx the same type by using a GADT and existential packing. You should be able to reuse the type 'a ast_kind that I developped (not merged in trunk) for #351 (note: I'm interested in feedback on this PR):

type ppx_transform = Ppx_val : 'a ast_kind * ('a -> 'a) -> ppx_transform

Having a single sequence of ppx transformers (even if some are skipped dynamically when processing a certain kind of ASTs) would make it easier to ensure that the pass ordering is preserved.

@alainfrisch
Copy link
Copy Markdown
Contributor

I'm not sure to see what you mean with the existential. Registered PPXs all have to support both kinds of AST types (signatures/structures), not only one of them. (Well, technically, in the context of the toplevel, in-process mappers don't need to deal with signature, but let's be more general to allow linking special compiler drivers with built-in mappers.) The two kinds of PPXs are those implemented by an external command (which can be abstracted as "string -> string -> unit", the two strings representing the input and output files) and those implemented by an in-process mapper (of type Ast_mapper.mapper).

@damiendoligez
Copy link
Copy Markdown
Member

I think the name should be plugin_ppx for this feature.

@damiendoligez damiendoligez added this to the 4.04-or-later milestone Jan 22, 2016
@Kakadu Kakadu force-pushed the trunk branch 4 times, most recently from acf5fab to 53698ed Compare January 28, 2016 09:42
@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 28, 2016

Okay.

  1. I renamed directive as Damien recommended
  2. I left dump options as they are.
  3. I adopted gasche's GPR
  4. After parsing command line options they are passed to pparse.ml and don't get away from there.
  5. In process PPX objects are added to pparse.ml too.
  6. The implementation in pparse.ml seems to be very straightforward. I don't understrand why we need new GADT there or why should we abstract everything as function
  7. Test added

@whitequark
Copy link
Copy Markdown
Member

Looks good to me otherwise--though there are a few leftover changes from the rebase (Fastppx reference and makefile dep) and a bunch of unrelated commits pulled in before that.

@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 28, 2016

I probably should rebase everything after release of 4.03...

@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Jan 28, 2016

Fastppx reference and makefile dep

Are you reading outdated diff, @whitequark ? Don't see anything like that.

@whitequark
Copy link
Copy Markdown
Member

@Kakadu I was reading individual commits.

@gasche
Copy link
Copy Markdown
Member

gasche commented Aug 17, 2016

It seems fine to me -- sorry for not replying earlier. In general, with GADT-based APIs, when the code type-checks then it is also reasonable code. (In the same way, patches over the new GADT-based format implementations are much easier to review because type-checking gives much stronger confidence in their correctness than before.)

@mshinwell mshinwell modified the milestones: 4.05-or-later, 4.04 Sep 7, 2016
From now on Clflags.all_ppx only stores options after parsing command
line arguments.

Some values are exposed in pparse.mli to manage PPX preprocessors:
  val clear_ppx : unit -> unit
  (* disable all ppxs *)
  val add_external_ppx : path:string -> unit
  (* Add PPX specified by command line option *)
  val add_in_process_ppx : Ast_mapper.mapper -> unit
  (* Add ast mapper specified in the toplevel *)
Add directives to remove in-process ppxs by path.
1) External PPX don't cause any additional IO.
2) Tool name is set when applying in-process PPXs.

Also slightly rewritten code related to applyign rewriters:
function for applying on structures and signatures merged into
single one.
@Kakadu
Copy link
Copy Markdown
Contributor Author

Kakadu commented Nov 8, 2016

Can we do one more iteration of review?

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 8, 2016

I'm not making decisions on this PR (I'm not qualified), and would rather delegate to @alainfrisch and @whitequark that seems to have opinions about it. They seem rather "meh" on the proposal for now, and I don't think we would merge before a somewhat more enthusiastic consensus emerges.

@whitequark
Copy link
Copy Markdown
Member

Not much has changed from my POV:

  • I would like to see a #remove_ppx command to remove an external PPX, like for in-process PPX.
  • I still think that #clear_in_process_ppxs will be a footgun and/or effectively unusable since it will remove ubiquitously used ppxs, once we have in-process ppx support in ocamlfind.
  • Pparse.clear_ppx should be removed in favor of something more rigorous, or at least gain a clear rationale for existence.

@alainfrisch
Copy link
Copy Markdown
Contributor

I haven't reviewed the new implementation, but I still need to be convinced about the usefulness of unloading a ppx. You'd typically set up your REPL environment (loading libraries, registering printers, and now installing in-process ppx) after startup and then use it for the current session. What would be a typical use case for unloading a ppx?

@damiendoligez damiendoligez modified the milestone: 4.05-or-later Feb 15, 2017
@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 27, 2017
@alainfrisch
Copy link
Copy Markdown
Contributor

Latest comments from @whitequark and me were left unaddressed for more than 1Y. I suspect the OP lost motivation to push the proposal forward, so I'm close this PR for now.

@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone May 24, 2018
lukemaurer added a commit to lukemaurer/ocaml that referenced this pull request Apr 23, 2021
This reverts the parts of commit
9816841 that have been split off into
PRs ocaml#383 and ocaml#386, as well as the Bottom stuff (likely to be unnecessary
as we're considering making `Coercion.compose` a total function).
poechsel pushed a commit to poechsel/ocaml that referenced this pull request Apr 29, 2021
Print aliases for debug.

Various improvements

* Use `Coercion.t` rather than `Coercion.t option` in many places
* Use `Rec_info.t` again where sensible (in making inlining decisions,
  for instance), but redefine it as `unit` pending the next phase of
  `Rec_info` overhaul
* Add simple `int` fields to `Coercion.t` for changing recursion depth;
  this is simpler than what we'll actually need but specific enough that
  we can test meaningfully (e.g., we can tell whether coercions are
  composed in the right order)

Address a CR

Update .depend

Overhaul Aliases_of_canonical_element; add Alias_set

Also return Bottom from more places (generally anything involved in a
meet).

Use Rec_info.unknown instead of unit

Add comments; simplify `Aliases` API a bit

Revert (most of) "Overhaul Aliases_of_canonical_element; add Alias_set"

This reverts the parts of commit
9816841 that have been split off into
PRs ocaml#383 and ocaml#386, as well as the Bottom stuff (likely to be unnecessary
as we're considering making `Coercion.compose` a total function).
poechsel pushed a commit to poechsel/ocaml that referenced this pull request Apr 29, 2021
Print aliases for debug.

Various improvements

* Use `Coercion.t` rather than `Coercion.t option` in many places
* Use `Rec_info.t` again where sensible (in making inlining decisions,
  for instance), but redefine it as `unit` pending the next phase of
  `Rec_info` overhaul
* Add simple `int` fields to `Coercion.t` for changing recursion depth;
  this is simpler than what we'll actually need but specific enough that
  we can test meaningfully (e.g., we can tell whether coercions are
  composed in the right order)

Address a CR

Update .depend

Overhaul Aliases_of_canonical_element; add Alias_set

Also return Bottom from more places (generally anything involved in a
meet).

Use Rec_info.unknown instead of unit

Add comments; simplify `Aliases` API a bit

Revert (most of) "Overhaul Aliases_of_canonical_element; add Alias_set"

This reverts the parts of commit
9816841 that have been split off into
PRs ocaml#383 and ocaml#386, as well as the Bottom stuff (likely to be unnecessary
as we're considering making `Coercion.compose` a total function).
poechsel pushed a commit to poechsel/ocaml that referenced this pull request May 4, 2021
Print aliases for debug.

Various improvements

* Use `Coercion.t` rather than `Coercion.t option` in many places
* Use `Rec_info.t` again where sensible (in making inlining decisions,
  for instance), but redefine it as `unit` pending the next phase of
  `Rec_info` overhaul
* Add simple `int` fields to `Coercion.t` for changing recursion depth;
  this is simpler than what we'll actually need but specific enough that
  we can test meaningfully (e.g., we can tell whether coercions are
  composed in the right order)

Address a CR

Update .depend

Overhaul Aliases_of_canonical_element; add Alias_set

Also return Bottom from more places (generally anything involved in a
meet).

Use Rec_info.unknown instead of unit

Add comments; simplify `Aliases` API a bit

Revert (most of) "Overhaul Aliases_of_canonical_element; add Alias_set"

This reverts the parts of commit
9816841 that have been split off into
PRs ocaml#383 and ocaml#386, as well as the Bottom stuff (likely to be unnecessary
as we're considering making `Coercion.compose` a total function).
poechsel pushed a commit to poechsel/ocaml that referenced this pull request May 7, 2021
A `Name.Set.t` or `'a Name.Map.t` is a fine thing, but a `Simple.Set.t`
or `'a Simple.Map.t` is much sketchier, since whether we want to treat
two simples that differ only in `Rec_info.t` as equivalent depends on
context. Fortunately, we usually just need a set or map of names. The
one place where this gets tricky is in the uses of `Aliases.get_alias`,
so there's now `Aliases.Alias_set.t` to encapsulate the needed
operations.
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Jan 4, 2022
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Signed-off-by: Christine Rose <professor.rose@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants