Move contains_calls and num_stack_slots from Proc to Mach.fundecl#8928
Move contains_calls and num_stack_slots from Proc to Mach.fundecl#8928gasche merged 7 commits intoocaml:trunkfrom
Conversation
|
(cc @Octachron who might be interested in learning about backend things.) |
Octachron
left a comment
There was a problem hiding this comment.
From an external point of view, the change looks really nice: it reduces quite a bit the amount of mental state tracking required to follow the backend pipeline. I have few remarks about points where it seemed possible to go one step further without massive changes, but I may be reasonable to postpone those changes to another PR.
| Some res | ||
|
|
||
| let contains_calls = ref false | ||
|
|
There was a problem hiding this comment.
As far as I can see contains_calls is an internal variable for the selector_generic class. It seems like it could be declared inside the class as a mutable instance value.
There was a problem hiding this comment.
A mutable boolean instance value wouldn't work because the traversal uses functional object copies ({< ... >}). Making the reference an instance variable (val contains_call = ref false) would still be a good idea, though; this way, it is implicitly reset in Selection.fundecl when the new class instance is created.
There was a problem hiding this comment.
(In that case please include a comment explaining why it's a per-instance reference instead of an instance mutable variable, it's a bit tricky.)
There was a problem hiding this comment.
Thank you very much for the explanations about mutable vs reference instance variables! I've just pushed a commit that implements it.
| (* The entry point *) | ||
| end | ||
|
|
||
| val num_stack_slots: int array |
There was a problem hiding this comment.
This reference seems to run contrary to the PR's goal: it introduces a global state used by {Coloring,Linscan}.allocate_registers to communicate their result. It seems to me that it would be more in line with the PR goal to make the allocate_registers functions returns the computed value of num_stack_slots and avoid this global state altogether.
There was a problem hiding this comment.
I considered it, but it would be the only result of allocate_registers that is returned directly. Other inputs/outputs are communicated via Reg.t.interf and Interval/Interf.
Putting num_stack_slots in Reloadgen is not entirely contrary to this PR's goal if we think of Asmgen.regalloc as a single pass. At the end of each iteration, Reloadgen uses num_stack_slots to create the new Mach.fundecl, without the need to know which register allocation algorithm was used and where to find these results. It's also nice that we don't need to duplicate this array in coloring/linscan and keep the copies in sync.
The downside of having num_stack_slots in reloadgen is that coloring/linscan now depend on it.
|
I welcome this PR as a move in the right direction, but I'd like to be sure that the new references added by this PR are meant to be removed in the short term. The fact that some code uses global references isn't an excuse to add more of them, in my opinion, but I can accept that this will be done in a later PR. |
@lthls The only new references are the ones added in Emit. The others are merely existing references moved around :) Moving them "localizes" their uses and indeed will help us get rid of them. To remove num_stack_slots, I am happy to implement what @Octachron suggested in #8928 (comment), if there is no deep reason for keeping allocate_registers as unit -> unit. To remove contains_calls from selectgen, I did try an instance variable first, but couldn't get it to work. I'll give it another go. I would really like to collects all references in Emit into an environment and pass it to emit_instr. I'll submit a separate PR for it. |
|
I addressed all the comments. This version of the PR is much cleaner than the initial version. Thanks @Octachron @lthls @gasche for your constructive feedback. Do you think it is ready to be merged? |
|
|
||
| method emit_fundecl f = | ||
| Proc.contains_calls := false; | ||
| contains_calls := false; |
There was a problem hiding this comment.
Currently, emit_fundecl is always called (from Selection.fundecl) with a fresh instance, so the instance variable contains_call is always set to its initial value, containing false. So this initialisation should be redundant. I'm not against keeping it, though (it's safe).
There was a problem hiding this comment.
Thanks, I've just updated the commit to removed it. It's better not to duplicate the initialization code.
525350a to
04996b7
Compare
|
It looks like you forgot to recompute the dependencies after the change in |
fixed, thanks! |
|
The current code looks good to me. For my part, I think it can be merged once the Ci on precheck passes. |
|
The CI failed on i386 with an error from apt-get, which seems unrelated to this PR. I've rerun the latest version of this PR on precheck (build 299) and it is all fine. |
I am working on a tool [1] that manipulates, via compiler-libs, the intermediate representation created by Linearize pass. The design of ocamlopt backend makes it possible, thanks to the separation of code of passes into modules. However, I found it tricky to use the intermediate representation independently, because different passes communicate by storing some function-specific information in the global state. This PR makes small changes to store this information in the record that describes the individual functions.
Global variables Proc.contains_calls and Proc.num_stack_slots hold per-function information, used by Linearize and Emit. These variables are updated every time Asmgen.compile_fundecl is invoked to compile another function. This PR moves the variables contains_calls and num_stack_slots from Proc to the passes that produce these values: Selectgen and Reloadgen, respectively. The values are then stored per-function in Mach.fundecl, and propagated to Linearize and Emit as needed.
This approach is not ideal, because the individual passes still use global variables, but there are other such variables there already and the intention is that the uses should be limited to a particular pass (rather than communication between passes). Information produced in each pass that need to be propagated further is then recorded in Mach.fundecl. This design makes it easier to understand the code in the backend, and save and restore IR after each pass.
To make the intermediate representation stand-alone, the remaining global state is Cmm.label_counter, which is per compilation unit and not per function. It is being addressed separately.
This PR is on top of (PR #8870) to split Linearize into two modules.
Passes on precheck:
https://ci.inria.fr/ocaml/job/precheck/292/
[1] Feedback-directed optimizations for Ocaml, mentioned in the context of PR #8526 on function sections (see #8526 (comment))