Optimise Switch code generation on booleans#8672
Optimise Switch code generation on booleans#8672stedolan wants to merge 1 commit intoocaml:trunkfrom
Conversation
|
Naively I would have the following expectations:
Both assumptions are violated by the situation you point out, and not restored by the fix you propose. Is there a reason why we can't have these nice things? |
|
I don't really understand either of these points, could you explain a bit more?
Switch already optimises this: If there is only one possible action, no tests are generated. (In Switch, "case" means possible input value, and "action" means possible branch to take. There may be multiple cases yet only a single action, e.g.
What do you mean by "behaves in the same way" here? The type of branch is selected according to the values that are being tested. Isn't that the right thing to do here? |
|
Looks good. To be honest I would prefer @maranget to give the thumb up on that one given that he was the one who wrote the |
|
(If @maranget doesn't give an opinion here I'll ask him in person next week.) |
|
Any updates on this? |
|
I tend to approve the change, especially since I have always thought that != 0 and true were the same thing. So good catch by @stedolan. I'll merge happily. |
If the value being matched is known to be 0 or 1, we need not do a further comparison with 0 before dispatching.
|
I have just merged the PR. |
|
Thanks! |
This patch optimises Switch to avoid introducing an extra comparison when matching on booleans (or other types with two constant constructors). If the value being matched is known to be 0 or 1, we need not do a further comparison with 0 before dispatching.
There was already a special case to detect branches that distinguish constructor 0 from other constructors (these are compiled as a comparison against 0, rather than the usual less-than comparison). Here, this special case is extended to detect the case when there is only one other constructor, with value 1.
An example of matching on a boolean is this function:
Currently, it is compiled to this
Lambda:With this patch, the
!= 0is removed, and it becomes:This doesn't make much difference for the non-Flambda compilation pipeline. With Flambda enabled, the resulting Cmm is much simpler. Instead of:
we get:
This translates into smaller code:
vs.
I haven't found a significant speed difference, but this seems worthwhile for the code size reduction alone.