Optimization contributions should be justified with empirical evidence by alainfrisch · Pull Request #8584 · ocaml/ocaml

alainfrisch · 2019-04-04T08:24:45Z

I suggest to document in CONTRIBUTING.md that we expect "optimization" contributions to come with at least some empirical results to illustrate their usefulness.

dra27 · 2019-04-09T09:11:00Z

Just for context, are there some specific PRs which led to this?

The idea of having this paragraph seems fine, but I think the language could be friendlier - I don't think someone should be discouraged from opening a PR with an optimisation, but requesting help to get more numbers, or an indication of whether with numbers the optimisation may then be accepted.

gasche · 2019-04-09T09:23:48Z

I can think of a couple specific optimization PRs that were submitted without giving, at first, concrete evidence of performance improvements, but I would rather not point at any specific one right now because I wouldn't want the authors to feel guilty about it.

I don't think someone should be discouraged from opening a PR with an optimisation

My understanding is that we don't want to accept optimization PRs that don't provide clear performance improvements in well-understood scenarios. Asking for performance numbers upfront sounds quite reasonable to me. Raising the barrier of entry for optimization PRs might be the right choice if that means less time collectively spent (for submitters and reviewers) discussing optimizations that don't make it in the end.

(It feels a bit weird to write this as I just opened a stdlib PR without testsuite coverage for the new functions, which is just as bad.)

mshinwell · 2019-04-09T09:28:27Z

I'm not really sure why it's necessary to restrict contributions. If a particular PR doesn't come with sufficient evidence, then the reviewers can always ask for such. As I have mentioned before, it is also not always possible to determine that a particular optimisation is beneficial by using numbers, as the improvement may (for example) be within noise. However multiple such optimisations may bring noticeable value, especially when considering second-order effects, such as code size and cache behaviour.

gasche · 2019-04-09T09:50:24Z

In that case we could review and benchmark/measure these "multiple such optimisations" together.

mshinwell · 2019-04-09T09:55:38Z

Except that you might not have written all of the optimisations at the time when one is ready for submission. I really don't see why this cannot be left to the discretion of the reviewers.

gasche · 2019-04-09T09:59:17Z

Here is what Alain wrote:

In general, contributions which are intended to optimize something
which can be measured in an objective way should come with empirical
evidence to illustrate the expected gains. This includes optimizing
the speed of generated code, the speed of the compiler, or the
compactness of the generated code. At the very minimum, the
contributor should provide one concrete example where the optimization
gives a significant improvement, even if this comes in the form of a
specially crafted micro-benchmark. This gives an estimated upper
bound to the expected gains on realistic payloads, which is already
very useful to appreciate the usefulness of the contribution and the
tradeoffs it involves (added complexity/maintainance burden, possible
degradations in other cases, etc). Of course, larger-scale
experiments, with realistic and various code bases are even
better. (See Benchmarking section above.)

I don't see a situation where it would be reasonable to accept an optimization, with the regression risks that it implies, without any factual evidence that it actually makes something better for someone -- which is the minimal standard being demanded in this text. Clarifying this fairly minimal and reasonable standard in CONTRIBUTING.md sounds like an excellent idea to me, and the somewhat surreal discussion we are having about it would tend to strengthen this opinion.

mshinwell · 2019-04-09T10:11:00Z

From my point of view, based on a fair amount of experience in this area, this isn't a surreal discussion. I would have appreciated being able to reach an agreement before the pull request was approved.

The discussion above and the text differ as to whether actual numbers are required. Was the intention for them to be required?

lpw25 · 2019-04-09T10:11:58Z

I don't have a problem with requiring some evidence that the optimisation has a benefit, but I do think we should allow for indirect evidence of the benefit -- for example concrete numbers for a measure like icache misses when hoping to improve execution time -- simply because sometimes the direct measurement is much more noisy than the indirect measurement.

Do people think that the wording

This includes optimizing the speed of generated code, the speed of the compiler, or the
compactness of the generated code.

allows for that or not?

gasche · 2019-04-09T10:17:24Z

In my mind cache misses (or branch mispredictions, cycle count, etc.) fall under the scope of "something which can be measured in an objective way" in the first sentence above:

contributions which are intended to optimize something
which can be measured in an objective way should come with empirical
evidence to illustrate the expected gains

So I would say "yes they are in scope". Would you like to propose a better wording to capture your idea of "indirect evidence"?

lpw25 · 2019-04-09T12:32:55Z

Would you like to propose a better wording to capture your idea of "indirect evidence"?

The current wording is probably fine. I just wanted to make sure that people were on the same page about that aspect.

alainfrisch · 2019-04-09T12:57:44Z

The ultimate goal of optimizations is usually to make things faster; reducing code size and cache misses usually go in the right direction, so it seems fine to optimize for them, but the case for including a PR will be more easily made if the thing being measured is the thing that end users actually care about.

Anyway, the goal of this PR is not to set in stone rigid constraints, but to encourage a minimal standard of empirical testing; I assume that people working on optimization somehow look results upfront, and it's a bit unproductive when the reviewer needs to ask for them.

alainfrisch · 2019-04-09T13:03:46Z

(An interesting case is #8596. I believe it's a useful one, but reviewing such PR takes time and the discussion shows that the safety of the optimization is not completely straightforward. Luckily, gains mentioned in #6148 are significant, and, in my opinion, justifies the time spent on the PR and the increased fragility of the code w.r.t. future evolutions. If the gains were around 1 or 2%, I'd say it would have been probably better not to do the optimization.)

alainfrisch · 2019-05-09T08:52:40Z

Let's not waste too much time on this possibly surreal discussion. @mshinwell: are you ok with the current text? If not, do you have suggestions to improve it, or are you globally against the idea of formalizing minimal expectations?

mshinwell · 2019-05-09T11:48:13Z

I'm not really happy with the text as it stands. I'm also a bit concerned that there seem to be undertones in this discussion suggesting that people are just writing optimizations for the sake of it, without due thought being given to their applicability -- in general I would be surprised if that were the case.

Here is a revised proposal:

Contributions to improve the compiler's optimization capabilities are welcome. However, due to the potential risks involved with such changes, we ask the following of contributors when submitting pull requests:

Explain the benefits of the optimization (faster code, smaller code, improved cache behaviour, lower power consumption, increased compilation speed).
Explain when the optimization does and does not apply.
Explain when, if ever, the optimization may be detrimental.
Provide benchmark measurements to justify the expected benefits. Which kinds of measurements are appropriate will vary depending on the optimization; some optimizations may have to be measured indirectly (for example, by measuring cache misses for a code size optimization). Measurements should ideally include experiments with full-scale applications as well as with microbenchmarks. If the improvement provided by an optimisation is difficult to measure on its own, for example because it is within noise, this should be explained together with a convincing argument as to the optimization's merits.
At least some of the measurements provided should be from experiments on open source code.
If assistance is sought with benchmarking then this should be made clear on the initial pull request submission.

A major criterion in assessing whether to include an optimisation in the compiler is the balance between the increased complexity of the compiler code and the expected benefits of the benchmark. Contributors are asked to bear this in mind when making submissions. The use of formal methods to increase confidence is encouraged.

alainfrisch · 2019-05-09T12:07:09Z

I'm not really happy with the text as it stands. I'm also a bit concerned that there seem to be undertones in this discussion suggesting that people are just writing optimizations for the sake of it

No, really, I'm convinced that people actually do the benchmark. The main idea here is to make it clear that sharing those results upfront, without the reviewer having to ask it, improves the decision process. There is also the idea that micro-benchmarks are good to show potential benefits, perhaps more than larger-scale existing applications (where performance critical code might have been written in a style aware of previous limitations of the optimize).

I'm ok with your alternative proposal, except for the claimed rationale ("However, due to the potential risks involved with such changes") which sounds overly limited. Taking reviewer resources and increasing the code complexity and maintenance costs are also important criteria (they are shared with any kind of contributions, though, so perhaps this is obvious).

Waiting for others to comment before switching the PR to your version.

mshinwell · 2019-05-09T12:10:37Z

I tried to address the code complexity issue in the last paragraph. I don't think that we need to explicitly need to mention reviewers' time; as you say, that's the same for any submission.

gasche · 2019-05-09T12:21:24Z

I also like Mark's proposal. I would suggest adding a list item about testing and correctness:

Justify the correctness of the optimization, and discuss a testing strategy to ensure that it does not introduce bugs.

yallop · 2019-05-09T12:43:04Z

I think Mark's proposal is good. The rationale ("Due to the risks ...") seems reasonable enough to me: if there weren't any risks then it would be fine to accept changes without evidence that they improve things. (Wasting reviewers' time and increasing code complexity are among the possible risks, which could be enumerated for clarity.)

More generally, it certainly seems appropriate to insist on some kind of measurement and at least one improvement. If there's no way to show an improvement then "optimization" isn't really the right term. Concretely, I'd drop the sentence beginning "If the improvement provided by an optimisation is difficult to measure on its own" from Mark's proposal, and perhaps instead say explicitly that measurements showing clear benefits when combined with some other optimization/change are acceptable.

damiendoligez · 2019-09-10T14:27:36Z

FTR I also like Mark's proposal.

gasche · 2019-09-30T13:44:02Z

@alainfrisch there seems to be a consensus around Mark's proposed revision. Could you adopt it, and then we merge?

alainfrisch · 2019-10-01T09:56:03Z

Ok, I've switched to Mark's version, including the suggestions from Gabriel (put together with Mark's final sentence) and Jeremy. I'm still unsure with:

some optimizations may have to be measured indirectly
(for example, by measuring cache misses for a code size
optimization).

(Code size is something which can be measured directly, I don't see why one would want to measure cache misses instead.)

lpw25 · 2019-10-01T10:03:40Z

I think the intention is to say that measuring cache misses rather than execution time can be used to justify changes that reduce code size.

As you say, you could just measure code size itself and say that is good enough -- but some would argue that code size is also an indirect measure since it is probably execution time that you actually care about.

Maybe you can improve the wording to make the point clearer.

alainfrisch · 2019-10-01T11:37:51Z

Thanks for the explanation. But I suspect that in the vast majority of cases, achieving a significant reduction of cache misses (through reduced code size or other means) should lead to observable speedups, at least on micro-benchmarks (and if one cannot exhibit such benchmark, perhaps the optimization is not worth the trouble). Anyway, if everyone else likes the current version, I'm not opposed to merging as is.

gasche

Personally I think that the fine details of this discussion do not matter as much as the fact that it now exists, so we shouldn't be too perfectionist about it.

alainfrisch · 2019-10-01T11:55:14Z

Thanks @gasche and to everyone involved in this discussion. Now let's all make OCaml faster/smaller/better!

alainfrisch added the no-change-entry-needed label Apr 8, 2019

gasche approved these changes Apr 9, 2019

View reviewed changes

lpw25 mentioned this pull request May 9, 2019

Decide unboxing of let-bound expressions based on their Cmm translation #2165

Merged

alainfrisch mentioned this pull request Sep 30, 2019

Tidy-up Option.join #9000

Merged

gasche self-requested a review September 30, 2019 13:42

Optimization contributions should come with numbers

28d5817

alainfrisch force-pushed the contributing_optimizations branch from 320bff0 to 28d5817 Compare October 1, 2019 09:51

gasche approved these changes Oct 1, 2019

View reviewed changes

gasche merged commit c82405b into ocaml:trunk Oct 1, 2019

lthls mentioned this pull request Oct 1, 2019

Unbox across static handlers #8735

Merged

Conversation

alainfrisch commented Apr 4, 2019

Uh oh!

dra27 commented Apr 9, 2019

Uh oh!

gasche commented Apr 9, 2019

Uh oh!

mshinwell commented Apr 9, 2019

Uh oh!

gasche commented Apr 9, 2019

Uh oh!

mshinwell commented Apr 9, 2019

Uh oh!

gasche commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshinwell commented Apr 9, 2019

Uh oh!

lpw25 commented Apr 9, 2019

Uh oh!

gasche commented Apr 9, 2019

Uh oh!

lpw25 commented Apr 9, 2019

Uh oh!

alainfrisch commented Apr 9, 2019

Uh oh!

alainfrisch commented Apr 9, 2019

Uh oh!

alainfrisch commented May 9, 2019

Uh oh!

mshinwell commented May 9, 2019

Uh oh!

alainfrisch commented May 9, 2019

Uh oh!

mshinwell commented May 9, 2019

Uh oh!

gasche commented May 9, 2019

Uh oh!

yallop commented May 9, 2019

Uh oh!

damiendoligez commented Sep 10, 2019

Uh oh!

gasche commented Sep 30, 2019

Uh oh!

alainfrisch commented Oct 1, 2019

Uh oh!

lpw25 commented Oct 1, 2019

Uh oh!

alainfrisch commented Oct 1, 2019

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

alainfrisch commented Oct 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

gasche commented Apr 9, 2019 •

edited

Loading