Env: always freshen persistent signatures before using them#2231
Env: always freshen persistent signatures before using them#2231trefis merged 6 commits intoocaml:trunkfrom
Conversation
Otherwise:
$ cat foo.mli
type t
module M : sig
type s
module type S = sig
type nonrec t = t
type nonrec s = s
val to_s : t -> s
end
end
module N : M.S
val x : t
$ cat bar.ml
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let a = 0
let _ =
Foo.N.to_s Foo.x
$ ocamlc -c -nostdlib -nopervasives foo.mli
$ ocamlc -c -nostdlib -nopervasives bar.ml
Fatal error: exception Stack overflow
ac0fa9f to
9cc8568
Compare
|
I'd also like to point out that travis is unhappy simply because I'm missing a changelog entry. But I'm waiting for a reviewer and an approval for 4.08 inclusion before adding it. |
Drup
left a comment
There was a problem hiding this comment.
For a while, I really couldn't understand why this refreshing was needed on the theoretical level, until I re-read note 1:
Even though this code is exercised in other situations than the loading of cmis, the issue can only happen in that case, because in the other cases all the idents involved were generated in the current run of the compiler, so the stamps won't ever conflict.
So, the bug is that, due to insufficient freshening in recursive cases when loading signatures, you can make the compiler observe colliding stamps between different runs of the typechecker. To solve it, you split the refreshing and the prefixing so that refreshing can ensure all the recursive knots are properly solved, and prefixing doesn't have to deal with them.
Given all that, I'm pretty happy with your solution and it's quite clear to me that the patch implements what you described. The code looks as good as env.ml can look. The issue is pretty subtle, so I think @garrigue should take a look as well.
Does it has a perf impact ? The third commit should ensure we only pay the refresh when loading a new cmi, so I expect it to be fine.
It does indeed have a small perf impact.
So: it has an impact that I can notice at janestreet, but it's smaller than the one that the cache of prefixed signatures had. Also, as a follow up to this, I experimented with adding lazyness to the prefixing of module types. Which, at janestreet, brings the compilation time back down to Thanks for the review! |
|
Re. 4.08: yes, it's a bugfix, it's fine to cherry-pick into 4.08. I read your initial description twice and still couldn't understand exactly what the problem was, so I won't review, but given that @Drup has approved I think this is good to go (in the Changes, put it in the 4.08 section). |
|
Thanks @gasche! I'll wait another day before to give Jacques a chance to have a look if he wants to (unless Damien wants to do a beta with this patch today). |
|
Could you add a testsuite entry? |
I don't think so: the test that I gave in my initial message is in my opinion too brittle, despite reproducing the issue on several versions of the compiler. The issue is that the stamps have to line up for the infinite loop to appear, and any change to the number of stamps generated would make the symptom disappear while the bug would remain. That's also why it was there for several years without ever causing any issue to anyone. It feels to me like that kind of issues (using idents / signatures in an incorrect context) is more likely to be avoided by some static checks, than by testing. |
|
Maybe there is another way to think of it, in the long term (not this PR): is there a dynamic check, that would be costly but only enabled in some special extra-safe mode, that would raise a proper error when this kind of issues happen, in a non-heisenbug way? (For example, we could add a special mark on all stamps before serializing them into cmi files (by making them negative for example), and pervasively using a stamp-comparison function that aborts if a marked stamp is compared to a non-marked stamp.) |
|
Actually, it's what |
Not quite. It's changing the Anyway, I'll merge now and cherry-pick on 4.08. Thanks everyone! |
|
Right. But if we're only interested by the signature, doing it for all IDs in it wouldn't be much more violent. |
|
I agree, we could easily adapt |
When accessing components of a module, the typechecker will prefix the siblings components by the name of the module.
That is, if you have:
and you access
M.to_s, it will have the typeM.t -> M.s.This prefixing is done lazily by
components_of_moduleandcomponents_of_module_maker.When loading a .cmi, one needs to refresh all the identifiers it defines, so as to avoid any clash with identifiers already present in the environment.
For toplevel identifiers, this is done by this call to
Subst.For all the other identifiers, e.g. the ones that are defined in submodules, this freshening is done lazily by
components_of_module_maker, as a side effect of the prefixing.However, when populating the environment with the components of that signature, it sometimes needs to traverse indirections in that signature (cf. the call to
scrape_aliasat the beginning ofcomponents_of_module_maker).For example:
If we were to load
Fooin another module, when trying to add the components ofFoo.Nto the environment, the compiler needs to resolveM.S. Which means it needs to get the components ofM(that is:M, notFoo.M), and that it will run the same little loop again, and prefix all references to the sibling idents byM.So, for instance, the updated declaration of
M.Swill be:because
sis defined inM, buttcomes from the outside.Note that when doing that prefixing, we've assigned a new stamp to
t-- the/x-- and tos(that is: we have freshened them). But the/ypart is still the "old" stamp, that comes directly from the cmi.At this point, everything could be fine:
t/xcan becomeFoo.N.t,t/ycan becomeFoo.t, etc.But everything could also be pretty bad: because
xmight happen to be equal toy¹, sot/ywould also be replaced byFoo.N.t. Which is a very good way to get the typechecker stuck in an infinite loop (insideexpand_abbrev) when you try to use the type.Here is a dummy program exercising that issue:
This can be reproduced with any compiler from 4.02 to trunk, and probably before that, but I haven't tried.
We ran into this problem recently at Jane Street.
A solution
For performance reasons, we can't afford to eagerly freshen the signature.
So what we must do instead, if we are to keep environments with unprefixed things in them, is to make sure that all the idents that go into these environments have been refreshed².
Which is what this PR does.
Notes
[1]: Even though this code (
components_of_module_maker,prefix_idents) is exercised in other situations than the loading of cmis, the issue can only happen in that case, because in the other cases all the idents involved were generated in the current run of the compiler, so the stamps won't ever conflict.[2]: More precisely, and this is related to [1], we only need to refresh idents coming from external units. The other ones can stay the same.
PS: This PR is based on #2229, there is no hard dependency, it was just written this way.
PPS: This PR appears to conflict with trunk, I assume it's because of the PRs of diml and gasche that were merged recently.
I think it can be looked at before I rebase though, there shouldn't be any substantial changes.