Skip to content

Do not trigger the "missing rec" hint for simple values#1678

Closed
xclerc wants to merge 1 commit intoocaml:trunkfrom
xclerc:ocaml-rec-hint
Closed

Do not trigger the "missing rec" hint for simple values#1678
xclerc wants to merge 1 commit intoocaml:trunkfrom
xclerc:ocaml-rec-hint

Conversation

@xclerc
Copy link
Copy Markdown
Contributor

@xclerc xclerc commented Mar 26, 2018

The "You are probably missing the `rec' keyword" hint is mildly
annoying when triggered for simple values because it is almost
always wrong, and very often suggests a fix that will then trigger
an error. For instance:

# let myvar1 = 0;;
val myvar1 : int = 0
# let myvar2 = myvar2 + 1;;
Error: Unbound value myvar2.
       Hint: You are probably missing the `rec' keyword on line 1.
# let rec myvar2 = myvar2 + 1;;
Error: This kind of expression is not allowed as right-hand side of `let rec'

This PR slightly changes the condition so that the hint is triggered iff
there are multiples definitions (i.e. let...and), or the value is an
arrow type. The previous toplevel session then becomes:

# let myvar1 = 0;;
val myvar1 : int = 0
# let myvar2 = myvar2 + 1;;
Error: Unbound value myvar2
Hint: Did you mean myvar1?

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Mar 26, 2018

Ping @Armael :)

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch sounds like a good idea to me, and it contains an important fix to fold_values that you did not mention in your description.

I think that it would make sense to relax "of function type" to "of function type or lazy type".

Are we sure that the support for let .. and ... (multiple in the patch) is worth it? It makes the patch more complex, changes the type declarations, and I am not convinced that it is actually that useful. It would pay for its cost if mutual value recursions were somewhat common, but are they?

Finally, at some point I thought that this patch would also silence useful-warnings, and I had a type signature such as type 'a seq = Seq of (unit -> ('a * 'a seq) option) in mind; there would be recursive declarations, I thought, at the type 'a seq, starting with the Seq constructor, and the warnings would be silenced by this patch. In fact it is almost impossible to build a useful sequence without maintaining some state as an argument of the recursive call, so the recursion happens on a function type:

let nats = nats_from 0
and nats_from n = Seq (fun () -> Some (n, nats_from (n+1)))

and value_unbound_reason =
| Val_unbound_instance_variable
| Val_unbound_ghost_recursive
| Val_unbound_ghost_recursive of bool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this aspect of the patch, I would prefer to see an interface that more explicitly gives a meaning to the bool payload, for example using a record type (possibly an inline record) with a field name. Currently it is very unclear to a reader of the .mli what this bool would mean.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; I wanted to push this draft to get some feedback
and explore the design space. For instance @lpw25
suggested that we could use the recursive check to ensure
that the hint actually makes sense.

(fun name _path descr acc ->
match descr.val_kind with
| Val_unbound _ -> acc
| _ -> f name acc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important omission of the previous patch (cc @Armael) that we need to merge independently of the rest, it could have gone into a separate commit.

Copy link
Copy Markdown
Member

@Armael Armael Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this was not detected earlier because in the case of an unbound value with a corresponding ghost binding, the "missing let rec" hint would trigger, and therefore no spellchecking would be attempted. Only the present patch adds the possibility of not printing the hint, and uses this -- buggy -- codepath. Am I correct?
In any case I agree this fix is required.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure, but I think @Armael is right.

@Armael
Copy link
Copy Markdown
Member

Armael commented Mar 26, 2018

Finally, at some point I thought that this patch would also silence useful-warnings

Do you mean "a rec may be missing" hints, instead of "warnings"?

@Armael
Copy link
Copy Markdown
Member

Armael commented Mar 26, 2018

@gasche I didn't understand fully your point about 'a seq. What conclusions should we draw?

@xclerc
Copy link
Copy Markdown
Contributor Author

xclerc commented Mar 26, 2018

This patch sounds like a good idea to me, and it contains an important fix to fold_values that you did not mention in your description.

Indeed, but I am not completely sure it could be observed in presence
of the original missing-rec hint.

I think that it would make sense to relax "of function type" to "of function type or lazy type".

Maybe. Are you referring to the "trick" using lazy on the right-hand side
of multiple declarations to accommodate the syntactic restrictions on
recursive definitions?

Are we sure that the support for let .. and ... (multiple in the patch) is worth it? It makes the patch more complex, changes the type declarations, and I am not convinced that it is actually that useful. It would pay for its cost if mutual value recursions were somewhat common, but are they?

(I should have commented this aspect in my original post.)
The problem is that for multiple definitions, the bindings are originally
populated with type variables; this leads the check for an arrow to
return false and hence to silence the hint for cases such as:

let f x = ... g x
and g x = ...

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 26, 2018

@Armael my point was that you can have recursion at datatypes that are not functions but contain functions, but then I realized that the recursive definitions usually occur at function types (as in my example) even when we are building one of these datatypes.

@xclerc there is something odd in the "type variable" story that I would like to understand better. Are the typing environments populated differently for let .. in .. and let .. and .. in? (This would be strange.) Your code checks the type after the error was raised, that is after type inference runs on the body; of course the in-definition type of f is not unified with its external type (this only happens when rec is present), but I would expect the outer fun x -> ... to create a function type.

Are you referring to the "trick" using lazy on the right-hand side
of multiple declarations to accommodate the syntactic restrictions on
recursive definitions?

More generally, values that are typically recursive have a function or a lazy type: fun and lazy are the two things you can put on the right-hand-side of a let rec and know that it will always work.

@xclerc
Copy link
Copy Markdown
Contributor Author

xclerc commented Mar 26, 2018

Well, my understanding of the type checker is far too limited
to precisely understand what is going on, but the fact is that
(Ctype.repd desc.val_type).desc is Tvar None when calling
Typeopt.is_function_type.

@Armael
Copy link
Copy Markdown
Member

Armael commented Mar 27, 2018

Btw, the idea of using the recursive check on the hint sounds tempting. But I wonder how much refactoring work it would take to plug it at this stage?

@damiendoligez
Copy link
Copy Markdown
Member

Well, my understanding of the type checker is far too limited
to precisely understand what is going on, but the fact is that
(Ctype.repd desc.val_type).desc is Tvar None when calling
Typeopt.is_function_type.

Maybe @lpw25 or @garrigue can shed some light on this question?

Personally, I was expecting the reason for the let ... and exception to be the fact that we almost never need let ... and for non-recursive bindings, so the rec hint is rarely wrong.

@damiendoligez damiendoligez added this to the consider-for-4.07 milestone Mar 28, 2018
@Armael
Copy link
Copy Markdown
Member

Armael commented Apr 3, 2018

Maybe @lpw25 or @garrigue can shed some light on this question?

So, I investigated a bit, and I think I can explain (this is mostly guesswork so maybe I'm wrong though).

@xclerc describes that this PR enables the hint if "the value is an arrow type". This is a bit imprecise. To be more precise, I think the criteria we want is that, when a ghost identifier appears in the body of a definition, then the missing rec hint is displayed if the body of the definition is a function.

However, what this PR currently implements is a bit different: faced with a ghost identifier, it does not check if the type of the enclosing body is a function, but if the type of the ghost identifier is an arrow type.
(indeed, ghost identifiers do have a type, which is the same as the type of the regular identifier they correspond to).

For single let-bindings, this is equivalent. Indeed, in that case, the type of the ghost binding is the same as the type of the definition body it appears in. Then, at the time the typechecking function discovers the ghost identifier, its type has already been refined to an arrow type by inspecting the definition body.

For multiple let-bindings, this does not work, as it has been noticed. On an example:

let f () = g ()
and g () = f ()
in ...

Trying to typecheck the body of f discovers a ghost binding g. But at that time, since (I assume) the bindings are processed in order, the type of g is still a fresh type variable; so not an arrow type, since the definition of g has not been analyzed yet. It is also not the type of the enclosing definition, which we would want to check (that would be the type of f).

This I think explains the current behaviour.

Idea for a fix: what about associating to a ghost binding the type of its enclosing definition? I guess this would be an argument to the Val_unbound_ghost_recursive constructor. For single let-definitions, it would be the same as the type of the ghost binding, but not for multiple let-definitions. Then, instead of checking the type of the ghost binding, we would check the type of the enclosing definition for an arrow type.

Do you think that would work?

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 3, 2018

That's a nice analysis, thanks!

I'm a bit unsure about adding arguments to Val_unbound_ghost_recursive that are only implementation-defined. Besides, this means that you have to construct an unbound-value-environment for all the values in the "let .. and ..." block, while one would usually expect to type-check them all under the same typing environment.

Here is a rough idea for an alternative implementation strategy: catch the Unbound_value_missing_rec exception at the let-type-checking site, and degrade it to a simple Unbound_value exception if we decide we are not confident enough given the let-body shape or type. (We could also have the type-checker code raise an internal exception, always catch it at let-checking boundaries are turn it into a reraise of one or the other exception.)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Apr 4, 2018

Not sure whether it helps but if you changed the if recursive then ... here:

https://github.com/xclerc/ocaml/blob/dcbf787d66189254a1748de3879c461d0f369697/typing/typecore.ml#L4779

to if recursive || multiple then ... then the types for syntactic functions would be function types thanks to type_approx.

However, as @Armael said, the check is currently checking the type of the value being used, whereas really it is the type of the value being defined that matters. (Also the multiple value in maybe_add_pattern_variables seems wrong -- I think it would consider let (x, y) = ... in as a multiple binding). Although really even the type of the value being defined is just being used as a proxy for the actual thing we care about: that the use is in a position where a recursive use is valid.

It is a shame that the current recursive values check is implemented as a separate pass over a typed tree, rather than implemented as part of the main typing pass (which I think would be an equally viable approach) because then you could just check the desired property directly.

How about only adding the ghost bindings for let definitions that are obviously functions? You could implement a simple function similar to type_approx that just checked whether an expression was obviously a function.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2018

I had a discussion about this PR with Armaël a few days ago. Here is a summary of the opinion we converged to:

  1. The heuristic proposed in this PR is controversial; the original implementation is flawed (see discussion above), and it's not clear what is a better heuristic (looking at the term shape seems better, but not fully satisfying; calling the recursive-value test is not possible without a larger reengineering effort).
  2. I'm not excited about rushing a debatable heuristic in trunk. I would rather take note of the fact that this message generates more false warnings than we thought, and weaken the wording: change "You are probably missing" into "You might be missing", or better, "If this is a recursive declaration, you should add". This can be done quickly. (We can also continue discussing a better heuristic.)
  3. The part of the PR that filters unbound values from the value folding is nice and we can merge it quickly.
  4. It is frustrating that this warning silences the typo-advice heuristic, and it would be nicer to have them both printer or combine them in some way. I was not aware of the interaction with the typo-advice logic before, and I don't know whether improving this is a trivial change or a delicate change.

I encouraged Armaël to submit a PR with the simple changes (2) and (3) above, but he is busy with other things right now.

@mshinwell
Copy link
Copy Markdown
Contributor

@gasche Can we maybe revert the change which added this hint and reconsider for 4.08?

The dev Flambda branch tracks trunk and whilst working there I have seen many occurrences of this hint all of which, except for one, were wrong.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2018

Well yes, we can certainly do this. Could you list some of those examples of mistaken hints in the discussion of #1472?

@Armael
Copy link
Copy Markdown
Member

Armael commented Apr 9, 2018

I have some time today, I could submit PRs for 2) 3) and 4) if that's judged satisfactory. (on xclerc original example it would display "if this is a recursive declaration..." and then the spellchecking hint (or maybe first the spellchecking hint?))

@mshinwell mshinwell removed this from the consider-for-4.07 milestone Apr 9, 2018
Armael pushed a commit to Armael/ocaml that referenced this pull request Apr 11, 2018
Imported from Xavier Clerc patch in ocaml#1678
@Armael Armael mentioned this pull request Apr 11, 2018
Armael pushed a commit to Armael/ocaml that referenced this pull request Apr 26, 2018
Imported from Xavier Clerc patch in ocaml#1678
Armael pushed a commit to Armael/ocaml that referenced this pull request Apr 26, 2018
Imported from Xavier Clerc patch in ocaml#1678
@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 30, 2018

I think this PR is now superseded by #1720. Thanks for the problem report!

@gasche gasche closed this Apr 30, 2018
Armael pushed a commit to Armael/ocaml that referenced this pull request May 6, 2018
Imported from Xavier Clerc patch in ocaml#1678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants