Refactor typing/env to separate the filesystem-related logic#2228
Refactor typing/env to separate the filesystem-related logic#2228gasche merged 13 commits intoocaml:trunkfrom
Conversation
366d3c3 to
a733665
Compare
|
I just handled a somewhat-painful rebase on top of #2041 (now merged in trunk). Nothing much to declare (I wasn't sure whether |
|
P.S.: I made sure that the PR is split in several clean commits, and it should be easiest to review commit-by-commit. |
|
@gasche sure, I'm having a look |
ghost
left a comment
There was a problem hiding this comment.
The changes look correct to me, I just left a couple of suggestions. I don't know either whether the changes related to add_import are correct or not.
|
@diml Thanks! I'll adress your comments shortly. Do you have an opnion on the interest of the refactoring -- do you agree that it makes the final code nicer, or does it seems roughly equivalent to you? |
bd2f90e to
6fa23da
Compare
|
(I pushed two additional commits to implement the suggestions of @diml) |
|
I forgot to say: I grepped the code of |
|
I agree that it makes the final code look nicer. I find |
|
BTW, in utop we use cppo to be compatible with several versions of the compiler, so we can adapt to breaking changes in the compiler API. |
|
I haven't looked at the code in this PR in details yet but:
|
|
Good point on the copyright; in the past I've just put my copyright when creating new files, but they were completely new. Re. #2231: given that the present PR is non-urgent, I guess that the most reasonable thing is for me to wait until those recent PRs are settled to merge -- and deal with the rebase pains on my end. Grmbl... |
| try Env.import_crcs ~source:filename cu.cu_imports | ||
| with Consistbl.Inconsistency(name, user, auth) -> | ||
| fprintf ppf "@[<hv 0>The files %s@ and %s@ \ | ||
| disagree over interface %s@]@." |
There was a problem hiding this comment.
Why not put the try .. with in Env.import_crcs itself? Given that it's the one calling Consistbl.check.
Also, it looks like Toploop calls that function during initialization, without any try ... with; apparently it can't fail, but if it ever did, it would be with an uncaught exception, which doesn't seem very nice.
There was a problem hiding this comment.
I prefer for the refactoring commit to stay as close to the existing behavior as possible, and I don't know what Toploop is doing with this function, so I did not change anything here.
There is a small change of behavior in this patch due to a different handling of weak dependencies (those with crco=None); in Env.check_consistency, only non-weak dependencies would get [Env.add_import] called, while the `toplevel/` implementations would also call [Env.add_import] on weak dependencies. After this patch, we systematically call [add_import] only on non-weak dependencies, even in `toplevel/`. ([Gabriel:] As far as I can see, the use of [add_import] in the toplevel never leads to a use of [Env.imports()] for producing a dependency list, as the toplevel does not produce cmi/cmo files; are they just no-ops?)
This small change of behavior simplifies the internal plumbing of env
by avoiding the need to passes the 'current_unit_name' state to
cmi-checking exceptions -- it allows to separate the cmi/crc logic to
a separate module in a future commit.
We believe that the change does not actually reduce error message
clarity, as the name of the offending unit appears in the location
filename anyway (see how these exceptions are handled by
Location.error_of_printer_file in the error printer).
Before:
File "a.mli", line 1:
Error: Unit A imports from B, which uses recursive types.
The compilation flag -rectypes is required
After:
File "a.mli", line 1:
Error: Invalid import of B, which uses recursive types.
The compilation flag -rectypes is required
|
I just rebased on top of trunk. I'll re-read the rebased diffs, and my plan is to merge once the CI passes. |
…module Persistent_env is a new module that handles the relation between the type-checking state and the "persistent" typing information laying in .cmi files on the filesystem. In particular, it handles the collection and production of CRC information for the .cmi files being read and written to the filesystem; the using modules (in our case, only Env) are in charge of turning the cmi files into higher-level information (components and signatures). Persistent_env exposes a type `'a t` of a persistent environment, which acts as a mutable store of `'a` values. There is no global state in the module itself: while Env (and thus the OCaml type-checker) uses a single global persistent environment, it should be possible to create several independent environments to represent, for example, several independent type-checking sessions.
The hope is that the (env => persistent_env) refactoring does not break reasonable user code; the fact that this test had to be updated is a bad sign. On the other hand, we believe that utop is unaffected by the change, which suggests that real-world toplevel are less likely to be affected.
(suggested by Jérémie Dimino)
The hope is that a tailor-made algebraic datatype is more readable /
less confusing than using ('a option) directly -- one may confuse
getting None when looking in a table with the Not_found case.
(Suggested by Jérémie Dimino)
|
Obviously very late to this, but I confirm that the CRC related change is correct. |
The debugger reimplements its own error-reporting logic without using the reporter-registration mechanism of the compiler, so it needs to be adapted after the split between `Env` and `Persistent_env` in ocaml#2228. (Interestingly, this forced me to expose the `Error of error` exception in the Persistent_signature, which was not the case before. It was probably a mistake to not expose an exception value that can be raised by (correctly-written) consumers of the module.) I noticed the issue while inspecting a testsuite failure (ocaml#8544). Before this patch: ``` $ cat tests/tool-debugger/find-artifacts/_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/debuggee.byte.output Loading program... done. Breakpoint: 1 10 <|b|>print x; Uncaught exception: Persistent_env.Error(_) ``` After: ``` $ cat tests/tool-debugger/find-artifacts/_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/debuggee.byte.output Loading program... done. Breakpoint: 1 10 <|b|>print x; Debugger [version 4.09.0+dev0-2019-01-18] environment error: The files /usr/local/lib/ocaml/stdlib.cmi and [...]_ocamltest/tests/tool-debugger/find-artifacts/debuggee/ocamlc.byte/out/blah.cmi make inconsistent assumptions over interface Stdlib ```
This PR, which builds on top of #2227, is again only a refactoring PR: it should not affect the semantics of the compiler (there are two small changes in behavior that I mention below; the rest should be strictly identical, only moving code and data around).
The PR came as I was trying to understand the code of Env that is related to compilation units.
typing/env.mlhas more than 2K lines, and about 300 of them are related to compilation units (reading and writingcmifiles, consistency checking, various caches, etc.), but they are blended in with the rest in a way that makes it difficult (for me) to follow the logic of this part of the code.The present PR separates this persistent-files logic to a separate module, Persistent_env, that Env then uses internally. The last commit does the actual file split, all other preceding commits are small changes that were necessary for it, either because they abstract away certain details of the Env module that are affected by the split, or because they disentangle the mess of dependencies within the module that makes the split difficult -- it took me 3 attempts to discover everything that could create a cyclic dependency.
The change might be useful to the Merlin people (cc @trefis, @let-def), as Merlin handles cmi files in a different way from the upstream type-checker.
Now on the two small behavior changes:
The
avoid external uses of {add_import,crc_units}commit merges together three loops that are almost identical, but have a difference in the wayEnv.add_importis called or not on transitive dependencies that have no CRC. This specific change should be reviewed by @lpw25. (Understanding the mess of direct/indirect and weak/strong and transparent/opaque dependencies is my next goal.) Making this merge only makes the overall code cleaner, it's easy to separate into two functions if it turns out to be incorrect.The
remove the current unit name from some exceptionscommit removes some information from Persistent_env exception that was constructed, at raising sites, by using internal state from Env (the name of the current compilation unit). This changes the error messages slightly (see the commit message for an example), but I believe the change is harmless. On the other hand, trying to preserve the previous behavior exactly would be doable, but make the code more complex (I could register the exception-handler inEnvinstead ofPersistent_env, to have the information available at printing-time, but the irregular structure would be a bit vexing).