Skip to content

Inline and specialize Stdlib.{min,max}#10251

Closed
alainfrisch wants to merge 7 commits intoocaml:trunkfrom
alainfrisch:afrisch_inline_min_max
Closed

Inline and specialize Stdlib.{min,max}#10251
alainfrisch wants to merge 7 commits intoocaml:trunkfrom
alainfrisch:afrisch_inline_min_max

Conversation

@alainfrisch
Copy link
Copy Markdown
Contributor

@alainfrisch alainfrisch commented Feb 24, 2021

Fix #5541, fix #4808.

Stdlib.{min,max} are currently not inlined; this is because their size is above the default inlining threshold. Compiling Stdlib with -inline 2 would be enough to have them inlined, but since type-specialization of primitives happen before inlining, one ends up with a C call to the generic comparison, even if arguments are integers or floats.

This PR implements Stdlib.{min,max} with two new %-primitives %min, %max, applies the same specialization as for comparison primitives, then expands them to their definition when generating the Lambda code. (No new runtime primitive has been added, so no need to adapt js_of_ocaml for instance.)

Some micro-bencharks:

(* int64 *)
let r = ref 0L in for i = 1 to 1_000_000_000 do r := max !r (Int64.of_int i) done
(* float *)
let r = ref 0. in for i = 1 to 1_000_000_000 do r := max !r (float i) done
(* int *)
let r = ref 0 in for i = 1 to 1_000_000_000 do r := max !r i done
(* string *)
let r = ref "" in for i = 1 to 1_000_000_000 do r := max !r (if i mod 2 = 1 then "Hello" else "World") done
trunk PR speedup
int64 15s 0.875s x17
float 13.6s 0.98s x13
int 6.0s 0.48s x12
string 16.6s 5.1s x3

No noticeable impact for non-specialized cases.

Notes for the reviewer:

  • The first two commits are general cleanup on the primitive specialization code; they potentially lose some sharing of primitive descriptions in lambda code, but I suspect this is completely negligible.
  • The real stuff are the third and fifth commits, with a bootstrap of the compiler in-between.
  • There is no explicit performance test (I haven't seen any other test doing explicit time measurement).

Also, one could consider further improvements:

  • Better code generation in the backend (a branchless implementation for integers, for instance).
  • Explicit type specialized versions {Int,Float,...}.{min,max}

@alainfrisch alainfrisch force-pushed the afrisch_inline_min_max branch from 386c1d6 to 90cf5e7 Compare March 8, 2021 08:12
@alainfrisch
Copy link
Copy Markdown
Contributor Author

@xavierleroy : since you were the one suspending #5541 do you have an opinion on this PR? ("I'm not expecting a solution soon, which is why I'm putting this PR in suspended state." => indeed, that was 9 years ago!)

@xavierleroy
Copy link
Copy Markdown
Contributor

#5541 was 9 years ago... What I meant at the time is that type specialization after inlining the (OCaml) definition of min and max was far away in the future, and still is (?). You sidestep the problem by turning min and max into primitives. On the one hand it does the job. On the other hand, it feels sad that operations that are perfectly definable in OCaml are being turned into primitives just to get better optimization.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

I perfectly agree, and I'll be happier with an inliner that would having to do that. But it also feels sad to know that the polymorphic comparison will be involved when min/max are used on simple numbers. Do you think the workaround is decent enough? If the inliner ever gets better, we can always get rid of it. Or perhaps one should just expose specialized functions {Int,Int32,Int64,Float}.{min,max} (and deprecate the functions in Stdlib?).

(One could argue that not is also definable in OCaml! And all comparison could be derived from the single %compare operation -- except that's not really the case, but oh well.)

@alainfrisch
Copy link
Copy Markdown
Contributor Author

@xavierleroy : what do you think we should do here? Merge this PR, expose explicit functions like Int.{min,max} (also in Float, Int64, etc), accept the bad performance when people use min/max on numbers until we get a better inliner?

@xavierleroy
Copy link
Copy Markdown
Contributor

I'm still not enthusiastic about adding "min" and "max" as primitives with specific rules for type specialization.

Half of the dev team wants to kill polymorphic equality. While I stand up for polymorphic equality, I agree that polymorphic "min" and "max" are questionable and it would make sense to remove them. Adding special compiler support for them goes against this grain.

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me. Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

@dbuenzli
Copy link
Copy Markdown
Contributor

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Apr 30, 2021

Plus, for Float.min and Float.max we'll get to argue about their semantics w.r.t. NaN.

That already happened, the discussion starts here.

Right, I forgot about it. The current semantics (return NaN as soon as one of the arguments is NaN) looks healthy to me (*). IEEE 754-2008 made different choices, which were partially reverted in IEEE 754-2019, leaving me confused.

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

@xavierleroy
Copy link
Copy Markdown
Contributor

If anyone is interested, here is a lot of background info on FP min/max: http://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf

@dbuenzli
Copy link
Copy Markdown
Contributor

(*) Even though it doesn't map to any "fmin" or "fmax" processor instruction I known of. so don't expect a super fast implementation any time soon.

Note that this PR also introduced the equivalent to the C99 fmin and fmax (NaN as missing data) under the names Float.{min,max}_num.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

Having specialized min and max functions in Float, Int, Int32, etc, would make more sense to me.

Ok, this makes sense! I've opened #10389 accordingly.

The first two commits are general code cleanup (which was useful to implement this PR), but I don't think it's worth spending reviewing time on them, so I'm closing this PR.

@alainfrisch alainfrisch closed this May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pervasives.min/max are not inlined better compilation of min and max

3 participants