Conversation
|
Does this come from aesthetic considerations, or did you observe noticeable reductions in code size? If so, please share your findings! |
|
These changes are part of a larger patch we successfully |
|
It just occurred to me that the following rewrites should probably
(We could also ban |
Good to know, but that wasn't @alainfrisch's question. Does all this reduce the code size significantly? By how much? |
|
Sorry, I misread Alain's question. Here are the sizes for the binaries from the distribution:
|
|
And do these tiny reductions lead to any observable performance improvement? Smaller code is usually better, except if you start using less-optimized instructions. I'm trying to find reasons to add complexity to the compiler and risk performance regressions (on specific versions of the CPU). |
|
I will measure the gains in tight loops; in the meantime, here is an illustration The sections refer to "Intel 64 and IA-32 Architectures Optimization Manual", |
|
I had a quick look at the new instruction patterns. Using The other patterns for "load integer constant" are less convincing to me. Replacing one |
(The submitted patch considered only the values in the range of |
|
Here is the comparison against trunk mentioned above: |
|
An instrumented version of the compiler gives the following distribution
As @xavierleroy pointed out, there is no risk of performance regression if we Would you consider such a patch for merge? |
|
Sorry to insist, but would you accept a patch reduced to the following changes?
|
Definitely yes.
Yes, but this is trading increased code size for reduced dependencies, so perhaps keep inc / dec if fast_code is false. |
I have removed the complex patterns around the
As suggested, I have made the code emitting |
|
Travis is not happy. The |
Just write |
Of course. My gut feeling was that it would hide a real issue. |
|
Travis is still not happy, but this time it is not even clear why. |
|
You have another occurrence of |
|
How embarrassing... I guess I was puzzled by the change/lack However, isn't this occurrence a bit more annoying? As it is an |
Fix minor typo
…1578) Before, cyclic dependencies were reported as a warning, and ocamldep -sort would exit with code 0. Now, the message says "error" and the exit code is nonzero.
Except for the Camlinternal* modules and the new Stdlib module, all modules in the stdlib now compile to Stdlib__<module>. Pervasives is renamed to Stdlib and now contains a list of aliases from the long names to the short ones, so that from inside and outside the stdlib we can refer to the standard modules as just List or Stdlib.List rather than Stdlib__list. In order to avoid printing the long names in error messages and in the toplevel, the following heuristic is added to Printtyp: given a path Foo__bar, if Foo.Bar exists and is a direct or indirect alias to Foo__bar, then prefer Foo.Bar. A bootstrap step was required to replace Pervasives by Stdlib as the module opened by default.
awk is symbolic link in Cygwin, which means it can't be used in -pp for a native Windows build. Just use gawk instead, as no other package provides the awk command on Cygwin.
which is used by both ocamldoc and the reference manual.
Since the reflector auxiliary program used by this test is now written in OCaml rather than in C, the redirections father process must take care to pass on OCAMLRUNPARAM, othewise the test fails when run with the debug runtime.
The function argument ocamlrunparam should actually be called systemenv.
…nto amd64-emit-tweaks
|
@xclerc: Looks like you merged trunk into your branch, and this resulted in the pull request being messed up. Can you remove the merge commit and rebase instead? Some other ideas (for a future PR):
|
|
ping? |
|
ping @xclerc |
|
Ping again. What is the status here? If @xclerc is not interested in working on this anymore, maybe @smuenzel you would be interested in rebasing the restricted version of this PR and proposing it again, as a new PR? (I am planning to close next time I encounter this PR if there has not been any progress.) |
|
Closing. We can always reopen if there is interest. |
This PR suggests the following changes to
amd64/emit.mlpin order toemit shorter instructions when possible:
These changes are based on the fact that when using a 32-bit register,
the lower part is used to performed the actual computation while the
higher part is cleared.
This PR also disables
inc/decinstructions, as suggested by the Intelmanual because, contrary to
add/subinstructions, they do not set allflags, which can in turn add data dependencies.
(The PR is split into multiple commit to easily apply only a subset.)