Skip to content

Optimization contributions should be justified with empirical evidence#8584

Merged
gasche merged 1 commit intoocaml:trunkfrom
alainfrisch:contributing_optimizations
Oct 1, 2019
Merged

Optimization contributions should be justified with empirical evidence#8584
gasche merged 1 commit intoocaml:trunkfrom
alainfrisch:contributing_optimizations

Conversation

@alainfrisch
Copy link
Copy Markdown
Contributor

I suggest to document in CONTRIBUTING.md that we expect "optimization" contributions to come with at least some empirical results to illustrate their usefulness.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Apr 9, 2019

Just for context, are there some specific PRs which led to this?

The idea of having this paragraph seems fine, but I think the language could be friendlier - I don't think someone should be discouraged from opening a PR with an optimisation, but requesting help to get more numbers, or an indication of whether with numbers the optimisation may then be accepted.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2019

I can think of a couple specific optimization PRs that were submitted without giving, at first, concrete evidence of performance improvements, but I would rather not point at any specific one right now because I wouldn't want the authors to feel guilty about it.

I don't think someone should be discouraged from opening a PR with an optimisation

My understanding is that we don't want to accept optimization PRs that don't provide clear performance improvements in well-understood scenarios. Asking for performance numbers upfront sounds quite reasonable to me. Raising the barrier of entry for optimization PRs might be the right choice if that means less time collectively spent (for submitters and reviewers) discussing optimizations that don't make it in the end.

(It feels a bit weird to write this as I just opened a stdlib PR without testsuite coverage for the new functions, which is just as bad.)

@mshinwell
Copy link
Copy Markdown
Contributor

I'm not really sure why it's necessary to restrict contributions. If a particular PR doesn't come with sufficient evidence, then the reviewers can always ask for such. As I have mentioned before, it is also not always possible to determine that a particular optimisation is beneficial by using numbers, as the improvement may (for example) be within noise. However multiple such optimisations may bring noticeable value, especially when considering second-order effects, such as code size and cache behaviour.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2019

In that case we could review and benchmark/measure these "multiple such optimisations" together.

@mshinwell
Copy link
Copy Markdown
Contributor

Except that you might not have written all of the optimisations at the time when one is ready for submission. I really don't see why this cannot be left to the discretion of the reviewers.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2019

Here is what Alain wrote:

In general, contributions which are intended to optimize something
which can be measured in an objective way should come with empirical
evidence to illustrate the expected gains. This includes optimizing
the speed of generated code, the speed of the compiler, or the
compactness of the generated code. At the very minimum, the
contributor should provide one concrete example where the optimization
gives a significant improvement, even if this comes in the form of a
specially crafted micro-benchmark. This gives an estimated upper
bound to the expected gains on realistic payloads, which is already
very useful to appreciate the usefulness of the contribution and the
tradeoffs it involves (added complexity/maintainance burden, possible
degradations in other cases, etc). Of course, larger-scale
experiments, with realistic and various code bases are even
better. (See Benchmarking section above.)

I don't see a situation where it would be reasonable to accept an optimization, with the regression risks that it implies, without any factual evidence that it actually makes something better for someone -- which is the minimal standard being demanded in this text. Clarifying this fairly minimal and reasonable standard in CONTRIBUTING.md sounds like an excellent idea to me, and the somewhat surreal discussion we are having about it would tend to strengthen this opinion.

@mshinwell
Copy link
Copy Markdown
Contributor

From my point of view, based on a fair amount of experience in this area, this isn't a surreal discussion. I would have appreciated being able to reach an agreement before the pull request was approved.

The discussion above and the text differ as to whether actual numbers are required. Was the intention for them to be required?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Apr 9, 2019

I don't have a problem with requiring some evidence that the optimisation has a benefit, but I do think we should allow for indirect evidence of the benefit -- for example concrete numbers for a measure like icache misses when hoping to improve execution time -- simply because sometimes the direct measurement is much more noisy than the indirect measurement.

Do people think that the wording

This includes optimizing the speed of generated code, the speed of the compiler, or the
compactness of the generated code.

allows for that or not?

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 9, 2019

In my mind cache misses (or branch mispredictions, cycle count, etc.) fall under the scope of "something which can be measured in an objective way" in the first sentence above:

contributions which are intended to optimize something
which can be measured in an objective way should come with empirical
evidence to illustrate the expected gains

So I would say "yes they are in scope". Would you like to propose a better wording to capture your idea of "indirect evidence"?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Apr 9, 2019

Would you like to propose a better wording to capture your idea of "indirect evidence"?

The current wording is probably fine. I just wanted to make sure that people were on the same page about that aspect.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

The ultimate goal of optimizations is usually to make things faster; reducing code size and cache misses usually go in the right direction, so it seems fine to optimize for them, but the case for including a PR will be more easily made if the thing being measured is the thing that end users actually care about.

Anyway, the goal of this PR is not to set in stone rigid constraints, but to encourage a minimal standard of empirical testing; I assume that people working on optimization somehow look results upfront, and it's a bit unproductive when the reviewer needs to ask for them.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

(An interesting case is #8596. I believe it's a useful one, but reviewing such PR takes time and the discussion shows that the safety of the optimization is not completely straightforward. Luckily, gains mentioned in #6148 are significant, and, in my opinion, justifies the time spent on the PR and the increased fragility of the code w.r.t. future evolutions. If the gains were around 1 or 2%, I'd say it would have been probably better not to do the optimization.)

@alainfrisch
Copy link
Copy Markdown
Contributor Author

Let's not waste too much time on this possibly surreal discussion. @mshinwell: are you ok with the current text? If not, do you have suggestions to improve it, or are you globally against the idea of formalizing minimal expectations?

@mshinwell
Copy link
Copy Markdown
Contributor

I'm not really happy with the text as it stands. I'm also a bit concerned that there seem to be undertones in this discussion suggesting that people are just writing optimizations for the sake of it, without due thought being given to their applicability -- in general I would be surprised if that were the case.

Here is a revised proposal:

Contributions to improve the compiler's optimization capabilities are welcome. However, due to the potential risks involved with such changes, we ask the following of contributors when submitting pull requests:

  • Explain the benefits of the optimization (faster code, smaller code, improved cache behaviour, lower power consumption, increased compilation speed).

  • Explain when the optimization does and does not apply.

  • Explain when, if ever, the optimization may be detrimental.

  • Provide benchmark measurements to justify the expected benefits. Which kinds of measurements are appropriate will vary depending on the optimization; some optimizations may have to be measured indirectly (for example, by measuring cache misses for a code size optimization). Measurements should ideally include experiments with full-scale applications as well as with microbenchmarks. If the improvement provided by an optimisation is difficult to measure on its own, for example because it is within noise, this should be explained together with a convincing argument as to the optimization's merits.

  • At least some of the measurements provided should be from experiments on open source code.

  • If assistance is sought with benchmarking then this should be made clear on the initial pull request submission.

A major criterion in assessing whether to include an optimisation in the compiler is the balance between the increased complexity of the compiler code and the expected benefits of the benchmark. Contributors are asked to bear this in mind when making submissions. The use of formal methods to increase confidence is encouraged.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

I'm not really happy with the text as it stands. I'm also a bit concerned that there seem to be undertones in this discussion suggesting that people are just writing optimizations for the sake of it

No, really, I'm convinced that people actually do the benchmark. The main idea here is to make it clear that sharing those results upfront, without the reviewer having to ask it, improves the decision process. There is also the idea that micro-benchmarks are good to show potential benefits, perhaps more than larger-scale existing applications (where performance critical code might have been written in a style aware of previous limitations of the optimize).

I'm ok with your alternative proposal, except for the claimed rationale ("However, due to the potential risks involved with such changes") which sounds overly limited. Taking reviewer resources and increasing the code complexity and maintenance costs are also important criteria (they are shared with any kind of contributions, though, so perhaps this is obvious).

Waiting for others to comment before switching the PR to your version.

@mshinwell
Copy link
Copy Markdown
Contributor

I tried to address the code complexity issue in the last paragraph. I don't think that we need to explicitly need to mention reviewers' time; as you say, that's the same for any submission.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 9, 2019

I also like Mark's proposal. I would suggest adding a list item about testing and correctness:

  • Justify the correctness of the optimization, and discuss a testing strategy to ensure that it does not introduce bugs.

@yallop
Copy link
Copy Markdown
Member

yallop commented May 9, 2019

I think Mark's proposal is good. The rationale ("Due to the risks ...") seems reasonable enough to me: if there weren't any risks then it would be fine to accept changes without evidence that they improve things. (Wasting reviewers' time and increasing code complexity are among the possible risks, which could be enumerated for clarity.)

More generally, it certainly seems appropriate to insist on some kind of measurement and at least one improvement. If there's no way to show an improvement then "optimization" isn't really the right term. Concretely, I'd drop the sentence beginning "If the improvement provided by an optimisation is difficult to measure on its own" from Mark's proposal, and perhaps instead say explicitly that measurements showing clear benefits when combined with some other optimization/change are acceptable.

@damiendoligez
Copy link
Copy Markdown
Member

FTR I also like Mark's proposal.

@alainfrisch alainfrisch mentioned this pull request Sep 30, 2019
@gasche gasche self-requested a review September 30, 2019 13:42
@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 30, 2019

@alainfrisch there seems to be a consensus around Mark's proposed revision. Could you adopt it, and then we merge?

@alainfrisch alainfrisch force-pushed the contributing_optimizations branch from 320bff0 to 28d5817 Compare October 1, 2019 09:51
@alainfrisch
Copy link
Copy Markdown
Contributor Author

Ok, I've switched to Mark's version, including the suggestions from Gabriel (put together with Mark's final sentence) and Jeremy. I'm still unsure with:

some optimizations may have to be measured indirectly
(for example, by measuring cache misses for a code size
optimization).

(Code size is something which can be measured directly, I don't see why one would want to measure cache misses instead.)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 1, 2019

I think the intention is to say that measuring cache misses rather than execution time can be used to justify changes that reduce code size.

As you say, you could just measure code size itself and say that is good enough -- but some would argue that code size is also an indirect measure since it is probably execution time that you actually care about.

Maybe you can improve the wording to make the point clearer.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

Thanks for the explanation. But I suspect that in the vast majority of cases, achieving a significant reduction of cache misses (through reduced code size or other means) should lead to observable speedups, at least on micro-benchmarks (and if one cannot exhibit such benchmark, perhaps the optimization is not worth the trouble). Anyway, if everyone else likes the current version, I'm not opposed to merging as is.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think that the fine details of this discussion do not matter as much as the fact that it now exists, so we shouldn't be too perfectionist about it.

@gasche gasche merged commit c82405b into ocaml:trunk Oct 1, 2019
@alainfrisch
Copy link
Copy Markdown
Contributor Author

Thanks @gasche and to everyone involved in this discussion. Now let's all make OCaml faster/smaller/better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants