Skip to content

Refreshing parmatch#1488

Merged
trefis merged 31 commits intoocaml:trunkfrom
trefis:cleanup-parmatch-revisited
Dec 4, 2017
Merged

Refreshing parmatch#1488
trefis merged 31 commits intoocaml:trunkfrom
trefis:cleanup-parmatch-revisited

Conversation

@trefis
Copy link
Copy Markdown
Contributor

@trefis trefis commented Nov 23, 2017

tl;dr: adding comments, renaming things and deduplicating code

A previous version of this work has already been reviewed by @gasche and @maranget, the comments that follow (apart from some obvious exceptions) are meant for them and might not make sense without a log of the previous discussions.
That being said the commits should all be somewhat atomic and a majority of them contain no meaningful code change (so they should be fairly easy to read).

External review required

Some commits in the list would be better reviewed by @garrigue than by Gabriel or Luc.
Jacques could you have a look at the following commits:

  • "update comment": the previous comment was stale, but perhaps I misread the condition and wrote a wrong comment. Or perhaps it could be written/explained better.
  • "remove stale adhoc fix for printf.ml": I'm betting that this was needed before the switch to GADTs, although I cannot easily verify that.
    The stdlib, and the testsuite compile fine without that line. So I just removed it.
    @gasche would like to keep the line "for safety", but I still would like to see it removed. We have currently no example of code requiring that line (but perhaps you can produce one), and I don't think it's good to keep code that has no visible impact and that no one (at least in Europe, although I haven't asked @yallop) understands. The "good" thing is that the worst that can happen if we remove this line is that we might encounter some code that needs it, at which point the assert false in orify_many will trigger and we can then reintroduce the line and update the testsuite!
    What do you think / what can you tell us about this?

Updates w.r.t. the previous version

(this section is targeted at Gabriel and Luc)

  • I implemented the discussed change to [build_specialized_submatrices], I did that in a separate commit (build_specialized_submatrices returns the default matrix) to make it easier to review (but it should be squashed eventually).
    I also removed build_default_matrix as (as we guessed) it became unused.
    @gasche: you said you didn't remember why we didn't keep the "omega group" even when we had constructors in the first column, as we would need it if the signature is not complete. And we do indeed need it but were recomputing it (i.e. we were calling build_default_matrix).

  • @gasche you had suggested that do_match could be factorized. Were you talking about the changes that are done in "remove some code duplication"?

  • I unmerged satisfiable and satisfiables as Luc requested. But I did keep the new name (list_satisfying_vectors) for satisfiables.

I believe I addressed every other comment present in the notes you sent, but you should double check.

PS: my initial plan was to merge all of these commits, but @gasche seemed to be against it. So I delegate the decision to him.

@trefis trefis force-pushed the cleanup-parmatch-revisited branch from cee345a to 91f05d9 Compare November 23, 2017 06:44
@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 23, 2017

@gasche would like to keep the line "for safety", but I still would like to see it removed. [...] the worst that can happen if we remove this line is that we might encounter some code that needs it, at which point the assert false in orify_many will trigger and we can then reintroduce the line and update the testsuite!

Yes, I think that risking breaking our users' code as a way to recover information about our codebase is not a good idea. I would rather keep the line unless someone can prove (by reasoning on the code statically) that is never useful -- or @garrigue remembers why it's there and reassures us that it can safely be removed.

Maybe it's not too hard to use git blame or git log to understand when the line was added, and look at what printf.ml was like at the time to get a reproduction case. (It was probably full of Obj.magic, but even code full of Obj.magic (for example the code produced by Coq) should type-check correctly.) @trefis, have you considered this?

let rec remove_first_column = function
| (_::ps)::rem -> ps::remove_first_column rem
| _ -> []
in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same thing as List.map List.tl?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly: what about matrices which contains some rows but no columns?
I don't know off-hand whether we can have some in that places and the current implementation was meant to handle them, or if the code was just written this way because it was written this way.

Not sure it's worth changing.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 23, 2017

PS: my initial plan was to merge all of these commits, but @gasche seemed to be against it. So I delegate the decision to him.

To clarify, you are talking of squashing all the commits. Yes, I think it's a bad idea: countless time people thought "oh it's painful to prepare a proper patchset and it won't matter anyway", and we find the giant commit while bisecting something or blaming to understand the reason for a change, and it's a pain.

Here all your changes are reasonable as self-contained commits, there is nothing that obviously needs to be squashed into previous commits. Would you have a semi-automated way to check that the compiler remains buildable and passes the testsuite after each commit? If you do, and if it passes, then I would support merging the PR with the separate commits.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 23, 2017

(One way to do the check is to write a script that does clean; world.opt || exit 3; tests || exit 4, and then use the x/exec feature of git rebase -i trunk -- see documentation -- to run it after each commit)

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 23, 2017

The build_specialized_submatrices change is very nice, thanks!

My understanding of the discussion with @maranget is that this is good to merge modulo the two points that @garrigue should comment on. (I still think that 9cc2261 should not be in the PR by default, and its presence is what discouraged me from "approving" already.)

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Nov 24, 2017

Maybe it's not too hard to use git blame or git log to understand when the line was added, and look at what printf.ml was like at the time to get a reproduction case. (It was probably full of Obj.magic, but even code full of Obj.magic (for example the code produced by Coq) should type-check correctly.)
@trefis, have you considered this?

I had actually, blaming gave me no insights. There was no Obj.magic, I tried building the files of then with my branch, but failed to. What I didn't do though (and I don't know why now) is to remove the line in the compiler of the time, try compiling the stdlib and see what breaks. Which I just did this morning and though I do not understand why it seems the matching on abstract types was the cause of the problem (so nothing printf specific).

Here is a minimal reproduction case:

type abstract

type opt =
  | N
  | S of abstract


let name_of_input = function
  | N -> ()
  | S _ -> ()

OCaml 4.00

With the following patch applied:

diff --git a/typing/parmatch.ml b/typing/parmatch.ml
index 0873f2b6d..36f011331 100644
--- a/typing/parmatch.ml
+++ b/typing/parmatch.ml
@@ -1187,7 +1187,18 @@ let exhaust_gadt ext pss n =
     Rnone -> Rnone
   | Rsome lst ->
       (* The following line is needed to compile stdlib/printf.ml *)
-      if lst = [] then Rsome (omegas n) else
+    if lst = [] then (
+      let () =
+        let loc =
+          try let pat = List.hd (List.hd pss) in pat.pat_loc
+          with _ -> Location.none
+        in
+        Format.fprintf Format.err_formatter
+          "%a: empty witness list (will return _×%d)\n%!"
+          Location.print_error loc n
+      in
+      Rsome (omegas n)
+    ) else
       let singletons =
         List.map
           (function

Here is what I get:

$ ./boot/ocamlrun ./ocamlc -w A -nostdlib -nopervasives ./test.ml
File "./test.ml", line 9, characters 4-5:
Error: : empty witness list (will return _×1)
$

This branch

With the following diff:

diff --git a/typing/parmatch.ml b/typing/parmatch.ml
index e723c2c32..c7754a75f 100644
--- a/typing/parmatch.ml
+++ b/typing/parmatch.ml
@@ -1234,6 +1234,18 @@ let exhaust ext pss n =
   match ret with
     No_matching_value -> No_matching_value
   | Witnesses lst ->
+    if lst = [] then (
+      let () =
+        let loc =
+          try let pat = List.hd (List.hd pss) in pat.pat_loc
+          with _ -> Location.none
+        in
+        Format.fprintf Format.err_formatter
+          "%a: empty witness list (will return _×%d)\n%!"
+          Location.print_error loc n
+      in
+      Witnesses (omegas n)
+    ) else
       let singletons =
         List.map
           (function

Here is what I get:

$ ./boot/ocamlrun ./ocamlc -w A -nostdlib -nopervasives ./test.ml
$

Conclusion

I do not understand what was going wrong, but I'm still in favor of removing the line since we know what it was trying to catch. I'd be happy to update the patch to add this small example to the testsuite to make sure it is not broken in the future.

@alainfrisch
Copy link
Copy Markdown
Contributor

Yes, I think that risking breaking our users' code as a way to recover information about our codebase is not a good idea. I would rather keep the line unless someone can prove (by reasoning on the code statically) that is never useful -- or @garrigue remembers why it's there and reassures us that it can safely be removed.

Keeping cruft that nobody understand is not a good solution either. I agree we should track the original reason, but if one cannot find it and if removing the irregularity does not break any known code, I'm in favor of doing the clean up.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 24, 2017

I don't disagree. You have my encouragement to remove it if someone checks that the whole of opam builds without it. Unless this is done, please keep it in.

@maranget
Copy link
Copy Markdown
Contributor

maranget commented Nov 28, 2017

About the controversial line | Rsome lst -> if lst = [] etc.). I have checked all occurrence of Rsome e expressions: (1) there is one base case Rsome [omegas n]; and several inductive case, with either preserve argument length (map) or increase it (append). Hence I think tat we face leftover code and that the risk of deleting this line is minimal.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 28, 2017

But Thomas above had a reproduction case where the conditional fires in OCaml 4.00. Does his repro case not get into this path anymore on trunk?

@maranget
Copy link
Copy Markdown
Contributor

maranget commented Nov 28, 2017

@garrigue, Now let us look at function exhaust (about line 1200 in file parmatch.ml). The comment (which I wrote) does not apply to the code (which @garrigue wrote, I guess). The comment refers to the original code that relied on the described property not to performed inductive calls over specialized matrices when the constructors collected from the argument matrix do not form a complete signature. The code has probably been changed for the purpose of GADT or whatever.

If Jacques confirms that the code should stay as it is, I suggest deleting the comment.

* D non-exhaustive => we have a non-filtered value
*)
let r = exhaust_gadt ext (filter_extra pss) (n-1) in
(* As the default matrix is included in [pss] one can avoid recursive
Copy link
Copy Markdown
Contributor

@maranget maranget Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one contradiction: the recursive calls are not avoided, Even more, they are performed in all situations, the result being bound to before.

@maranget
Copy link
Copy Markdown
Contributor

maranget commented Nov 28, 2017

As far as I know, @gasche Thomas repro case is for OCaml 4.0, not for trunk.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Nov 28, 2017

Here all your changes are reasonable as self-contained commits, there is nothing that obviously needs to be squashed into previous commits.

All the commits do compile on their own, except for the one preceding "make depend". I should probably just squash the "make depend" commit into it.

I have checked all occurrence of Rsome e expressions: (1) there is one base case Rsome [omegas n]; and several inductive case, with either preserve argument length (map) or increase it (append).

Note that that wasn't the case in 4.04 (search for "dug").

@maranget
Copy link
Copy Markdown
Contributor

maranget commented Nov 28, 2017

Note that that wasn't the case in 4.04 (search for "dug").

Sorry I do not see your point. Looking for dug in parmatch.ml for 4.04, I see a map over list r extracted by pattern Rsome r. Thus, a case of length conservation.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Nov 28, 2017

That’s because I should have written 4.00...
Apologies.

@maranget
Copy link
Copy Markdown
Contributor

Ok I see, Rsome [] was possible in 4.00, not now. So I'd feel relatively confident to assume this is no more possible now. At worst, I suggest an assert.

My other remark on the discrepancy between code and comment is still valid.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 28, 2017

If both of you are convinced through static reasoning that this code/case cannot be executed with the current compiler codebase, then of course it is reasonable to remove it.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Nov 29, 2017

My other remark on the discrepancy between code and comment is still valid.

Indeed.

To sum up: @garrigue could you please comment on the following two points:

  • Luc pointed out that this comment doesn't match the code as recursion on specialized submatrices already happened (the result being bound to before).
    Should the comment be removed or can the code be updated?
  • I updated that comment. Does it seem correct to you?

Btw, I'm not fond of the wording of that last comment, I'd appreciate it if someone (@gasche?) could make improvement suggestions. Perhaps the comment should simply be removed and the conditional made easier to read, again, suggestions would be appreciated.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we shouldn't wait for Jacques feedback (which may come at an arbitrary later time) or my ability to review the comment and suggest improvements (same absence of timeline), and merge now. The risk of merging is that two marginally wrong/improvable comments stay around, but the overall effect of the patch is positive (we fix more such comments). The cost of delaying the merge is that this has to be rebased over and over again, and that everyone willing to understand the codebase today is reading the harder-to-understand version.

Here is a formal approval. I think we could merge now, but I'm waiting for at least @trefis opinion on the merge timing/strategy.

(I ignored the issue of buildability and make depend in my approval decision. If you rebase to restore the property, great, otherwise people will just have to do git bisect skip on this one.)

@maranget
Copy link
Copy Markdown
Contributor

Ok for merging, at least mark the comments as being obsolete.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Nov 30, 2017

I'll do a proper last rebase in the next few days, squashing the few things that needs to be squashed and marking as obsolete the comment pointed out by Luc.
I'll update this PR once I'm done.

@trefis trefis force-pushed the cleanup-parmatch-revisited branch from 9c5b502 to 8096b22 Compare December 1, 2017 03:18
trefis added a commit to trefis/ocaml that referenced this pull request Dec 1, 2017
This line was here to handle the case where "Rsome []" was being
returned, which seemed to happen (at least on OCaml 4.00) when matching
on values of abstract types (or with abstract parts).

This cannot happen anymore as all the sites returning "Rsome _" return a
non-empty list.
See ocaml#1488 (comment) for
a lengthier discussion on the matter.
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Dec 1, 2017

This is now done. I've also update the description of the commit removing the line talking about printf to summarize the situation and link to this discussion.

@gasche: good to merge whenever.

@garrigue
Copy link
Copy Markdown
Contributor

garrigue commented Dec 4, 2017

@trefis
Concerning point 1 (the comment in exhaust), it should indeed be removed.
For GADTs we need a complete list of missing cases, so we cannot dispense with recursing.

Concerning point 2 (in check_unused), indeed the condition is now much more complex than just "there are no other lines". Your updated comment seems accurate.

For the discussion about the relation with a past bug inPrintf, I'm afraid I cannot comment. You should ask the other Jacques (Le Normand) @rathereasy , if he remembers...

@maranget
Copy link
Copy Markdown
Contributor

maranget commented Dec 4, 2017

Concerning point 1 (the comment in exhaust), it should indeed be removed. For GADTs we need a complete list of missing cases, so we cannot dispense with recursing.

Fine. Thanks Jacques.

For the discussion about the relation with a past bug inPrintf, I'm afraid I cannot comment. You should ask the other Jacques (Le Normand) @rathereasy , if he remembers...

We can now consider that the Printf comment and the if lst = [] then ... are leftover comment and code, and forget about them.

trefis added 24 commits December 4, 2017 11:11
…as well

This is a post merge cleanup. [build_specialized_submatrices] always builds the
default bmatrix as part of its work, but was previously only returning it when
there were no specialized matrices and when the [return_omega_group] parameter
was [true] (yes, this was ugly).
As it turns out, all the places calling this function also ended up building
the default matrix as a follow up.
The function now always returns the default matrix along with the (possibly
empty) list of specialized matrices.

As a consequence [build_default_matrix] became redundant and was removed.

Thanks to Gabriel for noticing this.
Previously, after exhaust had produced a pattern not matched by the matrix
formed of the unguarded clauses of the matching, we would try to match the
pattern with the matrix formed of the full matching.
This implies more work than necessary: we already know the unguarded clauses
don't match the pattern, so we only need to test it on the guarded ones.
We avoid doing some redundant work in the process.
The other option was to simply use [option].
[satisfiable] could be expressed in terms of [list_satisfying_vectors] as it
only cares about the existance of such a vector.
However
- it might have a non negligeable impact on performances
- satisfiable is a simple function that can be seen as a reference. There is
  value in keeping it.
@trefis trefis force-pushed the cleanup-parmatch-revisited branch from 8096b22 to 875e73b Compare December 4, 2017 11:14
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Dec 4, 2017

Thanks!

I rebased one last time to take into account the lasts comments. I'll merge once travis/appveyor confirm I haven't broken anything.

@trefis trefis merged commit 69ca9e2 into ocaml:trunk Dec 4, 2017
trefis added a commit that referenced this pull request Dec 4, 2017
This line was here to handle the case where "Rsome []" was being
returned, which seemed to happen (at least on OCaml 4.00) when matching
on values of abstract types (or with abstract parts).

This cannot happen anymore as all the sites returning "Rsome _" return a
non-empty list.
See #1488 (comment) for
a lengthier discussion on the matter.
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
This imports the Principles and Users sections which have been discussed on Discuss. The actual roadmap (i.e. the development workflows) are still in progress and will be shared with the community for feedback before they are imported.

---------

Co-authored-by: Christine Rose <christinerose@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants