Matching: refactor to clarify the argument-passing structure#13084
Matching: refactor to clarify the argument-passing structure#13084gasche merged 3 commits intoocaml:trunkfrom
Conversation
ncik-roberts
left a comment
There was a problem hiding this comment.
Looks great to me — simple_arg would have helped me unconfuse myself in my review of the mentioned PR.
lambda/matching.ml
Outdated
| (** The argument in first position is about to be matched upon, | ||
| it has already been bound to a variable | ||
| or it is a tuple of variables in the weird do_for_multiple_match case. *) |
There was a problem hiding this comment.
I find that simple_args is missing documentation explaining why it exists as an alternative to args. I would suggest introducing this comment with a little more framing:
(** [simple_args] is a more specific version of [args], to be used when...
(and the rest of the comment as-is).
*)
There was a problem hiding this comment.
I would argue that another relevant property of simple_args is that all of the lambda values it represents are pure. That's what allows us to drop the mutability on the floor in make_line_matching and make_matching (and my misunderstanding of this fact led to the conversation that you mention in the PR text).
I suggest mentioning this in the comment. Or, if you think of a good way to mention the purity in the relevant type names, even better! (I couldn't think of a good way to do this naming.)
There was a problem hiding this comment.
I added more comments along the lines you suggested.
| (m : Simple.clause pattern_matching) = | ||
| match m with | ||
| | { cases = []; args = [] } -> comp_exit ctx m | ||
| | { args = ((Lvar v as arg), str) :: argl } -> | ||
| bind_match_arg str v arg ( | ||
| let args = (arg, Alias) :: argl in | ||
| let m = { m with args } in | ||
| let first_match, rem = split_and_precompile_simplified m in | ||
| combine_handlers ~scopes repr partial ctx first_match rem | ||
| ) | ||
| | _ -> assert false |
There was a problem hiding this comment.
I'm not sure I understand why this PR allows this simplification or makes it more obviously correct. Does the new simple_arg shape tell us something more specific about m.args? Or was this simplification possible before?
(Separately, I'll note that bind_match_arg is now called only once, and so could be inlined if you found that more readable.)
There was a problem hiding this comment.
Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now. The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.
One aspect of the change that is subtle is that we now allow the Tuple case to reach here, but in fact I believe that this case cannot happen. (So we could add an assert false if we wanted to, as we do here.) I think that the Tuple case can only flow through the precompilation logic, never to this actual compilation code due to the different compile_flattened code path. I tried to check this statically, but the cost in complexity is quite high (polymorphic variants, polymorphic recursion, oh my!) and I did not finish it. I prefer the new version which might implicitly generate less-good-than-we-expected code if a Tuple argument somehow found its way here.
There was a problem hiding this comment.
This seems fine then.
Before we would need a dynamic check to ensure that the list of arguments is not empty, and we know this statically now.
Agreed, this is clear.
The code was also written as if the argument needed to be bound, while we can now statically reason about the fact that it is pure and its let_kind does not matter.
It seems to me this was already true as of this refactor: 5bffef5. The call to bind_match_arg is guarded by a match on Lvar v. I'm saying this just to check my understanding, not to request changes.
There was a problem hiding this comment.
Yes, I think that I could have simplified the compile_match_simplified part of the change by removing the bind logic at that point. I feel more comfortable with the change now that the types make it clear that the argument list has already been "split" by the time we reach this point in the code -- this is a sharp distinction with compile_match_nonempty above.
222c026 to
7b661ba
Compare
The "arguments" in pattern-matching compilation are a list of
scrutinees that are being matched upon, that can most of the time be
formed of arbitrary expressions. The first argument is bound to a
variable when we are about to generate code to switch on the
corresponding column, that is, in the
`compile_match_{nonempty,simplified}` functions. When new arguments
are generated by applying projections to an existing argument, it
is (almost) always on this bound-variable form. In other words, most
arguments can be complex expressions, except the first argument in
some parts of the pattern-matching logic.
The code relies on this simple-argument structure for the first
argument, but would not encode the corresponding invariants in the
type. This is what the current commit does; we can get rid of some
runtime assertions, and simplify code that was more complex than
necessary -- in `compile_match_simplified` in particular.
We don't need to pass an `~arg` parameter to functions that also take a `split_args` argument, as they respect the invariant that this `arg` argument is always the expression form of the first argument.
7b661ba to
6dc533f
Compare
ncik-roberts
left a comment
There was a problem hiding this comment.
I think the changes look good, but a maintainer will need to approve.
trefis
left a comment
There was a problem hiding this comment.
Approving on behalf of Nick Roberts.
The changes look good overall. I spent a few minutes trying to make the changes around toplevel_handlers less ugly, and now understand why you did things this way. ( #sad )
I also find the Tuple of lambda a bit distasteful, but not sure what could be done about it.
Anyway, this PR is an improvement.
|
Thanks both! Merging. |
This refactoring PR is born out of a discussion with @ncik-roberts in #12555 (comment) . "Arguments" in pattern-matching compilation are expressions that access parts of the scrutinee. They can contain complex expressions, in particular field projects of the form
x.f, but neverx.f.g-- the projection necessarily operates on a variable, or sometimes a tuple of variables in weird corner cases ofdo_for_multiple_match. There is a specific moment in pattern-matching compilation that turns complex expressions into variables, and then the rest of the pipeline assumes that it gets a variable, and this is important for correctness, but not obvious to see in the code at all.The present PR uses finer-grained types to statically encode this structure of arguments, continuing on a tradition started with @trefis a few years ago to clarify the pattern-matching compiler. There is no change to the behavior, and the diff consists almost entirely of very minor simplifications, but it becomes much easier to reason about this transition from arbitrary expressions to simple arguments in the code -- and I will rely on this property in a follow-up PR to fix The Pattern-Matching Bug.
My favorite parts of the diff are the following:
The worst part of the diff is in
do_for_multiple_match(a small hack due totoplevel_handlernot being polymorphic over the argument structure), but it's basically fine.A second commit also simplifies some functions that would take both a dedicated
~argargument (representing the first argument known to be present) and anargsvector that is now known to be simple; there is no need to track~argspecifically as it can now be recovered fromargs.first.(cc @trefis and @lthls who are on fire with pattern-matching refactoring reviewing)