Skip to content

Configure-time option for float array optimization.#1294

Merged
damiendoligez merged 3 commits intoocaml:trunkfrom
damiendoligez:float-hack-option
Sep 15, 2017
Merged

Configure-time option for float array optimization.#1294
damiendoligez merged 3 commits intoocaml:trunkfrom
damiendoligez:float-hack-option

Conversation

@damiendoligez
Copy link
Copy Markdown
Member

This PR introduces a configure-time option to remove the float array optimization. Removing this optimization makes float array operations slower, but all other polymorphic array operations slightly faster. It also removes some restrictions on let rec and especially on unboxed types.

This is only the minimal changes to the runtime and compiler to enable experimenting. What's missing is a new Floatarray module to give operations on the new floatarray type that represents flat-allocated arrays of unboxed float. Because it is user-visible, this new module will generate a lot of discussion on naming, which is not yet relevant: we should first run some benchmarks on this PR to decide if we want to shift the trade-off away from float array optimization.

See also #163 #1169

This PR hat two commits, you will need to bootstrap (make core && make coreboot) after applying the first commit.

The test suite passes on my machine, but I haven't tried OPAM packages yet and I don't know how many libraries with C bindings will be impacted.

There is one design decision that should be discussed: should the Obj.field and Obj.set_field functions remain ad hoc (i.e. work on both regular arrays and flat float arrays) or should they be low-level primitives that assume regular arrays (and then we need to introduce double_field and set_double_field) ? I've done the latter but this is open for discussion.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Aug 18, 2017

and then we need to introduce double_field and set_double_field)

double_field and set_double_field already exist in Obj.

I've done the latter but this is open for discussion.

I firmly believe that this is the right approach (and was planning to make a pull request that did just this change anyway).

@damiendoligez
Copy link
Copy Markdown
Member Author

double_field and set_double_field already exist in Obj.

Oh yes that's right. I had to add them to the OBJ module type in toplevel, which also impacts the debugger.

external create : int -> floatarray = "caml_floatarray_create"
external get : floatarray -> int -> float = "caml_floatarray_get"
external set : floatarray -> int -> float -> unit = "caml_floatarray_set"
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need more functions here to handle this type (map, fold, etc), right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That's the user-facing part that I want to do in another PR because it will take longer to discuss and it's not really needed for benchmarking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good.

byterun/alloc.c Outdated
/* [size] is a [value] representing number of floats. */
CAMLprim value caml_alloc_dummy_float (value size)
{
#ifdef FLAT_FLOAT_ARRAY
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look correct to me. caml_alloc_dummy_float is used to compile rec-bindings, in two cases: (i) unboxed float arrays; (ii) unboxed float records. AFAICT, (i) cannot happen any more (there are not Floatarray litterals) but (ii) can still happen. So I think caml_alloc_dummy_float should be left unchanged.

byterun/alloc.c Outdated
CAMLassert (tag < No_scan_tag || tag == Double_array_tag);

Tag_val(dummy) = tag;
#ifdef FLAT_FLOAT_ARRAY
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. caml_update_dummy might still be called (for unboxed float record) on a Double_array_tag block, no?

else
return caml_array_get_addr(array, index);
#endif
CAMLassert (Tag_val(array) != Double_array_tag);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting the assertion under a #elseif to avoid it in legacy mode? (But perhaps the C compilers are clever enough to avoid it.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point? The assertion is also true in legacy mode, and more assertions are always better. Assertions are only compiled in the debug runtime anyway.

stdlib/array.ml Outdated

module Floatarray = struct
external create : int -> floatarray = "caml_floatarray_create"
external get : floatarray -> int -> float = "caml_floatarray_get"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is WIP, but do we agree that this should be a %-primitive instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely should. Otherwise it will be very slow.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course.

external create_float: int -> float array = "caml_make_float_vect"
let make_float = create_float

module Floatarray = struct
Copy link
Copy Markdown
Contributor

@alainfrisch alainfrisch Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the naming side, I don't like Array.Floatarray. I'd prefer any of Array.FloatArray, Array.Float, Float.Array, or a toplevel FloatArray.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of these I like Float.Array best, since it's easiest to extend to additional types later (e.g. a specialized Int64.Array or Bool.Array).

Of course, we'd need a Float module first.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yallop Array.Float has my preference, and I'd say it's just as easy to extend.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damiendoligez: I think type-specialized array representation is a bit like type-specialized compare in this respect. Rather than enumerating all the type-specialized compare functions in a single module, we've put each instance alongside its type: Int64.compare, Char.compare, etc.

Doing things that way has the advantage that it extends well to types outside the standard library, too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, it would avoid awkward dependency cycles. Array is towards the start of the dependency chain right now. Adding Array.* for each base type of the stdlib would need to move it towards the end.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Float.Array is most satisfactory. It matches up with existing usage in the compiler of Foo.Map, Foo.Set, etc.

return caml_array_get_addr(array, index);
}

CAMLprim value caml_floatarray_get(value array, value index)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in another note, I believe this should become a %-primitive and be handled directly by the compiler.

return caml_array_set_addr(array, index, newval);
}

CAMLprim value caml_floatarray_set(value array, value index, value newval)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, this should be a %-primitive I think.

#define Double_field(v,i) Double_field_flat(v,i)
#define Store_double_field(v,i,d) Store_double_field_flat(v,i,d)
#else
static inline double Double_field (value v, mlsize_t i) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the need for Double_field is only for the transition period, to allow C stubs to be compatible with both modes. In a world without the legacy mode, one would not need it, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clarifying this part by adding comments, and I'm finding bugs and redundancies. Stay tuned for a new commit soon.

byterun/interp.c Outdated
accu = Atom(*pc++); Next;

Instruct(MAKEBLOCK): {
Instruct(MAKEBLOCK):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stylistic change is at odds with other similar cases around.

@alainfrisch
Copy link
Copy Markdown
Contributor

alainfrisch commented Aug 23, 2017

Firstly: hoorah!

See some inline comments (note: I haven't reviewed everything). Some other general points:

  • In the legacy mode, do we want to add a manifest from the new explicit floatarray currently-abstract type to float array? On the one hand, this might help people migrate their code base incrementally to using the explicit Floatarray module. On the other hand, it might makes it more difficult to identify which code needs to be adapted. Could this deserve a third "intermediate" mode?

  • We need to think about ways to help people migrate their code base. Contrary to e.g. the immutable string case, enabling the new mode would not break compilation, but can degrade performance significantly. How will people identify code that needs to be adapted? What about warning when one use generic arrays on the float type? There could be both a compile-time warning (when the type is known statically, using approach previously used to switch to specialized float-array primitives; or for any instantiation of the 'a array on float) and a (conditional) runtime one (when creating a statically-generic array which happens to hold floats). An alternative to the static warning would a tool parsing .cmt file to detect uses of float array, but a warning would be simpler to use (and could be applied after inlining). I believe thinking about a good migration path is critical to the success of the new mode.

  • For immutable strings, I think there is a consensus that the long-term goal is to get rid of the old mode -- we would do it right now if all community projects were ready for it. For the current PR, I was under the impression that @xavierleroy was still reluctant to getting rid of automagically unboxed float arrays. Has this changed? Do we have some hope of dropping the legacy mode at some point? Part of the benefit of the new mode is that it will remove quite a bit of complexity (and edge cases) from the compiler code base and runtime system. If we think that the legacy mode is here to stay, we won't benefit from these simplifications (e.g. the code for deciding if a GADT single-constructor sum type case be unboxed).

  • We might want some protection to avoid linking together code compiled with old and new modes, or using the wrong runtime system.

  • Let me quote a comment from @garrigue on https://www.lexifi.com/blog/about-unboxed-float-arrays. If course, this improvement can come later.

This is just a reminder that the special representation of float arrays is also the reason we cannot
have a Top type in OCaml. Recovering Top would allow to generalize not only covariant type
variables but also contravariant ones.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Aug 23, 2017

Recovering Top would allow to generalize not only covariant type variables but also contravariant ones.

Beyond that it would also allow generalizing all covariant type variables, rather than just "strongly covariant" ones.

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Aug 23, 2017

Recovering Top would allow to generalize not only covariant type variables but also contravariant ones.

Beyond that it would also allow generalizing all covariant type variables, rather than just "strongly covariant" ones.

I'd love some explanation of this for those of us not as well-versed in type theory.

First, @garrigue's comment: how does the float optimization prevent us from using top (the universal supertype)? Does it apply only to subtyping in OCaml ie. row types? Second, @lpw25's comment: what is the current limitation with regard to strong covariant types?

@damiendoligez
Copy link
Copy Markdown
Member Author

New version with bugs removed and primitives added. @alainfrisch

| (Parraysetu Pgenarray, p1 :: _) -> Parraysetu(array_type_kind env p1)
| (Parrayrefs Pgenarray, p1 :: _) -> Parrayrefs(array_type_kind env p1)
| (Parraysets Pgenarray, p1 :: _) -> Parraysets(array_type_kind env p1)
| (Parraylength Paddrarray, [p])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this chunk needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As evidenced by testsuite/tests/translprim/array_spec.ml, without this code we lose some type specialization: without -no-flat-float-array, the compiler will specialize some accesses from gen to int, while with the option they are not gen but addr, and the compiler doesn't specialize them. This block of code recovers this optimization in the -no-flat-float-array case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this specialization from gen to int happen? Woudl'nt this be the place to change to make it happen from addr to int as well? (Sorry if I miss something obvious.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... it happens right here in this pattern matching.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, sorry. I would find it easier to follow if the cases were merged:

   | (Parraylength old_kind, [p])   -> Parraylength(cap_array_kind old_kind (array_type_kind env p))

v_time = allocate_outside_heap_with_tag(sizeof(double), Double_tag);
Double_field(v_time, 0) = time;
/* CR doligez: either this will have to change for no-flat-float-array,
or the ML-side type will change to floatarray */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just supposed to be floats, not float arrays -- I think I may have used the wrong macro here. Could you fix it? (I think you can remove the CRs then.)

-e 's|%%WITH_PROFINFO%%|$(WITH_PROFINFO)|' \
-e 's|%%WITH_SPACETIME%%|$(WITH_SPACETIME)|' \
-e 's|%%WITH_SPACETIME_CALL_COUNTS%%|$(WITH_SPACETIME_CALL_COUNTS)|' \
-e 's|%%FLAT_FLOAT_ARRAY%%|$(FLAT_FLOAT_ARRAY)|' \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more descriptive and less potentially confusing, for the C defines and the variable in Config, to just use float_array_hack.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to use "hack" because it has a negative connotation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that the most important thing is to have a name that really won't be confused with anything else. There are various different concepts in the code now relating to various kinds of "float array"s and the risk of accidental confusion is I suspect quite high. The "float array hack" may not be the ideal term, but it is the standard terminology, and so seems the right thing to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mshinwell: I do find the "hack" terminology offensive, and it is "standard" only in your mind. Do you still want to argue pointlessly?

Copy link
Copy Markdown
Contributor

@bluddy bluddy Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other option I could think of was float array optimization, but then you need to contract it to float_array_opt or some such thing and it can be confused with option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The thinking is that maybe "unboxed" is a more common term for this kind of representation than "flat". I suppose if doing that it would make sense to rename "flat" to "unboxed" throughout the patch.)

Copy link
Copy Markdown
Member Author

@damiendoligez damiendoligez Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I start renaming everything, @xavierleroy do you agree with unboxed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAGICALLY_UNBOXED_FLOAT_ARRAY? (It's not like we're getting rid of unboxed float arrays.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of UNBOXED_FLOAT_ARRAY (same for FLAT_FLOAT_ARRAY, I imagine, too) is that it means "unboxed float array" (as opposed to "unboxed floatarray"). As such I'm not sure any extra qualification is needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they're both unboxed/flat, and that's where the confusion stems from. The difference is that one is demarcated by the type system, and one isn't.

let res = Array.Floatarray.create (List.length fields) in
List.iteri (fun i f -> Array.Floatarray.set res i (float_of_string f))
fields;
Obj.repr res
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Can we change Array.Floatarray to Float.Array? (Then we can also add Float.pi which will close another GPR too.)
  2. This is Float.Array.of_list, let's just move the code to the stdlib.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, because there is no Float module in the stdlib. But I agree this naming is temporary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't introduce a Float module? Otherwise we'll be stuck with Array.Floatarray for evermore, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this PR it contains only the minimum needed to patch the compiler and to experiment with the new type. No reason we can't move or duplicate it to a proper stdlib module when we decide the option should be the standard.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

byterun/array.c Outdated
#undef Restore_after_gc
Store_double_val(res, d);
return res;
#else /* FLAT_FLOAT_ARRAY */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing -- it looks like the "else" branch is when the float array hack is enabled, which it's not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This has been addressed)

return (Tag_val(array) == Double_array_tag);
}

/* [ 'a array -> int -> 'a ] where 'a != float */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong when the float array hack is disabled: in that case, 'a can be float. (At least one other occurrence of this below.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are intended to be true in both cases, as a guide to write C code that works with and without the option.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps clarify the comments along those lines? I think it's potentially misleading at the moment.

}
}
#else
return init;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be worth a comment explicitly noting that a copy is not required, since the name of the function makes it sound like it should always copy. Or maybe you should change the name of the function, in fact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This has been addressed)

if (!vv || ft == Forward_tag || ft == Lazy_tag || ft == Double_tag){
if (!vv || ft == Forward_tag || ft == Lazy_tag
#ifdef FLAT_FLOAT_ARRAY
|| ft == Double_tag
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to update the comments in stdlib/lazy.ml.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This has been addressed)

byterun/interp.c Outdated
Instruct(GETFLOATFIELD): {
double d = Double_field(accu, *pc);
/* CR doligez: change to Double_flat_field ? Need to check how the
compiler uses this instruction. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are only used for accesses to values known to have tag Double_array_tag. (So yes.)

#define Double_field(v,i) Double_val((value)((double *)(v) + (i)))
#define Store_double_field(v,i,d) do{ \

/* The [_flat_field] macros are for [floatarray] values and float-only records.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere there should be a note observing that record types "obviously of all floats" are still represented using tag Double_array_tag. There may otherwise be a misconception that this optimisation has also been removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This has been addressed)

@damiendoligez
Copy link
Copy Markdown
Member Author

@mshinwell @alainfrisch I have changed the code according to most of your comments and answered the others.

@mshinwell
Copy link
Copy Markdown
Contributor

I will read the new diff first thing tomorrow, Friday.

double d = Double_field(accu, *pc);
/* CR doligez: change to Double_flat_field ? Need to check how the
compiler uses this instruction. */
double d = Double_flat_field(accu, *pc);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed the SETFLOATFIELD case (below).

@DemiMarie
Copy link
Copy Markdown
Contributor

I wonder if it would be possible to use a limited form of monomorphization to bring up the performance of polymorphic code, at least when the entire program is available. OCaml as a language does not allow full monomorphization, but in many cases I suspect that it should be possible.

@damiendoligez
Copy link
Copy Markdown
Member Author

It's not just a question of performance, though. There is also typechecker complexity and runtime-system complexity. It also spills into the type system itself (see the remark about the top type) and in strange places like restrictions on let rec and [@@unboxed].

@mshinwell
Copy link
Copy Markdown
Contributor

@damiendoligez For avoidance of doubt I'm happy with this now once you fix the SETFLOATFIELD case. I'd rather see the name change as discussed above, but that's up to you.

Copy link
Copy Markdown
Contributor

@mshinwell mshinwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK subject to SETFLOATFIELD fix as per other comment

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Sep 15, 2017

The gain versus using an explicit Float.Array or a functor over the Array signature (or using modular implicits over the Array signature) is fairly minimal, especially with flambda.

Not sure about this claim. I don't think it's practical in OCaml, but it's Rust's approach, and Rust is the fastest language out there right now. They have other tricks up their sleeves, but this is one of the big ones.

@mshinwell
Copy link
Copy Markdown
Contributor

This all looks fine to me now.

@alainfrisch
Copy link
Copy Markdown
Contributor

What are the next steps?

  • The current API for Array.Float is clearly not ready for prime time (many missing functions, and there is agreement on the name of the module). Should we start working to complete this part?

  • There hasn't been much discussion about possible migration strategies. But it seems critical to the success of the switch to offer tools to help with the migration: compilation/runtime warnings or external tools when generic arrays are used on floats; perhaps a mode where floatarray is an alias for float array?

  • One of the benefits of the change is to simplify some tricky parts in the compiler code base. This means planning for the actual removal of the original mode. This won't happen overnight, but before embarking on the journey, it would be good to hear about @xavierleroy's position on the matter. It could make sense to say that "we will try and see if the community switches", but if we don't send the signal that the goal is to get rid of the old mode, we remove some incentive for the community to switch.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 15, 2017

They have other tricks up their sleeves, but this is one of the big ones.

I'm pretty sure unboxing floats in arrays is not the source of Rust's performance. In fact Rust doesn't do the suggested optimisation -- they monomorphise everything but they don't use that to optimise unboxing, the user indicates what data to unbox by hand themselves.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Sep 15, 2017

they monomorphise everything but they don't use that to optimise unboxing

This statement is probably confusing. I mean they do not take advantage of monomorphisation to optimise the layout of data in memory, a la Mlton. Instead the user has complete control over how data is represented.

@damiendoligez
Copy link
Copy Markdown
Member Author

@alainfrisch The next step is benchmarking.

@damiendoligez damiendoligez merged commit c11d94c into ocaml:trunk Sep 15, 2017
@alainfrisch
Copy link
Copy Markdown
Contributor

Did you merge on prupose? At least we should say that Array.Float is experimental, no? Or are we confident that this module will only be extended but not be moved or renamed?

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Sep 15, 2017

Will we be ok with breaking software using Array.Float when the time comes to switch to Float.Array?

@damiendoligez
Copy link
Copy Markdown
Member Author

We can keep Array.Floatarray with a [@@deprecated] flag when we add Float.Array.

@alainfrisch
Copy link
Copy Markdown
Contributor

Ok, I missed that Array.Floatarray is explicitly excluded from the documentation, and should thus not be relied upon.

The next step is benchmarking.

Do you agree that benchmarking numerical code is rather useless (performance sensitive code bases make sure to avoid generic accesses, and will need to be rewritten to use floatarray and regain exactly the same performance as today)? It would indeed be interesting to get numbers from non-numerical code using arrays intensively to see the gains. Coq?

@xavierleroy
Copy link
Copy Markdown
Contributor

Do you agree that benchmarking numerical code is rather useless

I disagree. I'd like to see numbers for OCaml code as it exists today in the wild, not for future rewrites of this code. That will help determining how much code needs rewriting, among other things.

It would indeed be interesting to get numbers from non-numerical code using arrays

I agree we need numbers both for numerical and symbolic code.

@bluddy
Copy link
Copy Markdown
Contributor

bluddy commented Sep 18, 2017

As an aside, with floatarray code, it should be relatively easy to start generating vectorized instructions, which should significantly improve performance. Perhaps we want to create annotations for this purpose as a simple first measure.

@xavierleroy
Copy link
Copy Markdown
Contributor

with floatarray code, it should be relatively easy to start generating vectorized instructions

No easier than today with float array.

@DemiMarie
Copy link
Copy Markdown
Contributor

DemiMarie commented Sep 19, 2017 via email

@damiendoligez damiendoligez deleted the float-hack-option branch September 22, 2017 11:56
@ejgallego
Copy link
Copy Markdown

ejgallego commented Nov 18, 2017

@gasche suggested to try Coq with the -flat-float-array. Here are some preliminary results for building Coq's stdlib:

Time{4.06.0                   |master}: 237.319752
Time{4.06.0+flambda           |master}: 216.754369
Time{4.06.0+flat_float        |master}: 238.061874
Time{4.06.0+flambda+flat_float|master}: 218.273305

I guess Coq could also really benefit from some more [@@inline ...] annotations, I think @ppedrot wanted to add a few.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 18, 2017

@ejgallego I should have given more information, sorry. I think that the -flat-float-array option corresponds to the default (float arrays are "flat" because they are unboxed), what you should measure is the -no-flat-float-array switch which performs the non-standard behavior of disabling unboxed float arrays.

(You can test either (4.06 -flat-float-array against 4.06 -no-flat-float-array) or (4.06 against 4.06 -no-flat-float-array).)

@ejgallego
Copy link
Copy Markdown

Indeed sorry, I got it right at first but then got confused when feeding my bench script. Relaunching.

@ejgallego
Copy link
Copy Markdown

No significative change in timing that I could observe when using -no-flat-float-array.

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 18, 2017

cc @ppedrot: this is a flag to disable the float-array optimization (and correspondingly speedup generic array creation and access), and it preliminary testing suggests that it does not seem to change Coq performances (on the stdlib). Do you think there is a test that would reveal a speedup for Coq users? Do you know which OCaml functions are critically more expensive this way (maybe those don't take the flag into account properly)? Do we need to revert some of your manual library-specialization work to see the difference?

@ejgallego
Copy link
Copy Markdown

@gasche I have added this particular flag to the main benchmark script so soon it will get tested against quite a few 3rd party developments, that should provide us with some more data as their performance profile is fairly diverse.

@bobzhang
Copy link
Copy Markdown
Member

Is this configuration used in some production environment (heavily tested)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.