Conversation
|
Random question: are immutable polymorphic arrays ( |
|
Good question. I can't remember if I remembered to fix that. |
|
Looks like I was on the ball this time: they are indeed covariant. |
|
Even if the proposed set of primitives is complete in theory, it does not allow for an efficient implementation of e.g. the blit operation, except if we rely on the compiler to remove bound checks on such a function implemented naively (with explicit assertions on the lengths) -- but this is another story. What about polymorphic algorithm such as sorting, or even map? Would the recommended approach be to pass a dictionary of functions (explicitly or with modular implicits)? An alternative design could be to have a new construction at the module expression level, which would produce not only a fresh array type, but also a (larger) set of operations to work on it as regular functions (which would hopefully be inlined when used directly). E.g. module IntArray = [|mutable int|]This would produce a module matching module type In your proposal, you reuse the syntax of field projection for the length. It seems you could similarly reuse the syntax of constructors for the init operation, with the same type-based lookup: |
Indeed. To be complete whilst keeping efficiency in terms of bound checks and use of memcpy we would require additional primitives for handling subarrays:
Yes. Operations that are polymorphic over arrays with different representations are essentially ad-hoc polymorphism and so they would be best handled with modular implicits.
Actually, the PR allows you to specify the appropriate constructors and destructors using paths, just like with records and variants:
Labels are a reasonable syntax for |
Would you need to create the instances manually for each fresh array type? What about my proposal of extending the module layer to return directly the instance?
Ok, but this is already at the level of the module system. Two array types in the same structure cannot work without type-based selection. This is already the case with two record types sharing the same labels, but the user is free to choose the labels; for array type, the overloading would mandated by the design. |
This is identical to generic operations over records and variants and the same solutions should be used. At the moment, that probably means
Personally, that seems like a pretty ugly solution. It is completely inconsistent with how other data types are handled. I also don't see how it would work for patterns or with type-based disambiguation. What module type are you expecting to give these modules? It wouldn't really help you avoid
True, but I would be surprised if that was a common use-case. |
I don't think this needs to be supported; arrays are so rare (strings excluded, but they would keep some ad hoc treatment anyway). Same for array literals.
We don't need that, we just use
module IMMUTABLE_ARRAY = sig
type t
type elt
val init: int -> (int -> elt) -> t
val get: t -> int -> elt
(* and perhaps some more fields, unsafe_get, sub, blit, etc *)
end |
This would mean anyone hoping to switch from To be honest, I see no real benefit in the approach you are proposing, whilst there is an obvious cost in the usability of the array types. |
|
It looks like you're not storing anything in the object tag. So could this approach support "inline" array types, like this? type 'a tree =
Leaf of 'a
| Branch of [| 'a |] |
I don't immediately see a reason why not, and that certainly seems like a useful feature. |
It's local syntactic changes, not a different style. Getting rid of the syntax for literals and array access seems very minor to me (and a new indexing syntax could be introduced, and combined with modular implicits, if really desired). Switching a few With your proposal, one would already need to add type annotations and replace Array by FloatArray. And this works because FloatArray comes with a lot of hand-written code. For another array type, one would need to use a new syntax for init and we loose unsafe array access and efficient blit (except by piling up some more syntax). Relying on the module system seems much more natural to me. This is the standard way to create ad hoc monomorphic data structures ( |
Or just use
That's not some minor difference: it is fundamental. Set and Hashtbl are parametric in their parameter, whilst these array types aren't -- that is the whole point. Using the module system for something like this is completely alien to the rest of the language. Adding expression syntax is cheap, the only cost is that people need to know what the new syntax means. That is why I have been emphasising that the new syntax is common in existing languages: people already know what it means. This makes the cost of the new syntax very low, whilst there are obvious benefits in terms of usability to having array syntax (otherwise why have array syntax in the first place). |
I don't think it's fundamental. For instance one can write a functor which chooses a custom representation of arrays based on a type witness: type _ ty =
| Char: char ty
| Pair: 'a ty * 'b ty -> ('a * 'b) ty
| Other: 'a ty
module type TY = sig
type t
val ty: t ty
end
module type ARRAY = sig
type t
type elt
val init: int -> (int -> elt) -> t
val get: t -> int -> elt
end
let rec make_array: type t. t ty -> (module ARRAY with type elt = t) =
function
| Char ->
(module struct
type t = string
type elt = char
let init = String.init
let get = String.get
end)
| Other ->
(module struct
type elt = t
type t = elt array
let init = Array.init
let get = Array.get
end)
| Pair (ty1, ty2) ->
let module A1 = (val make_array ty1) in
let module A2 = (val make_array ty2) in
(module struct
type elt = A1.elt * A2.elt
type t = A1.t * A2.t
let init n f =
let res = Array.init n f in
A1.init n (fun i -> fst res.(i)),
A2.init n (fun i -> snd res.(i))
let get (a1, a2) i =
A1.get a1 i,
A2.get a2 i
end)
module MyArray(X : TY)() : ARRAY with type elt = X.t = (val make_array X.ty) |
|
Fair enough. Generative functors do allow the creation of fresh non-parametric types. This discussion seems to come down to whether the syntaxes Personally, I think the expression syntaxes are much easier for people to learn since they already exist in other languages and are very similar to existing syntaxes. Whereas, I would expect most people to respond to the module syntax with some variation on "WTF is that". Combined with the additional benefit of supporting standard array syntax on fresh array types, I do not see the advantage of the module-based approach. |
|
Can you give a concrete example on how we would create "instances" (for modular implicit, or explicit passing of dictionnaries) for generic array-based algorithms, in your proposal? My current understanding is that you would need to write something like (in the explicit case): type int_array = [| mutable int |]
module IntArray = struct
type t = int_array
type elt = int
let init n f = [| f x for x = 0 to n - 1 |]
let length (a : t) = a.length
let get (a : t) i = a.(i)
let set (a : t) i x = a.(i) <- x
end
....
let sort_int_array = generic_sort (module IntArray)and you would need to duplicate the code above for each instance.
Rather between type int_array = [| mutable foo |]
let a0 = ([| x for x = 0 to 10 |] : int_array)
module IntArray = struct
(* see above *)
end
let a1 = generic_sort (module IntArray) a0and module IntArray = [| mutable foo |]
let a0 = IntArray.init 11 (fun x -> x)
let a1 = generic_sort (module IntArray) a0 |
|
As a pretty heavy array user, I'd encourage further progress on this issue. I find @lpw25 arguments convincing. |
|
Maybe I missed something but what's the status of passing (mutable) array elements to other functions? Last I checked it is not be possible to pass unboxed records or tuples to other functions the way values are passed. Instead a pointer + index would be needed. Or copying. And copying doesn't work with mutable and destroys physical equality. Overall I see this only working when no functions are called on elements or those functions can be inlined. Otherwise there is a performance penalty and breakage. |
We don't pass a "mutable array element " to functions anymore than we do today. We can get an element out of an array and pass it to a function. If we want to pass a reference to an element so that the function can mutate it, we pass the array + the index, as we do today. |
Except that is not true. Till now an array contains immutable primitive values or pointers and that pointer is simply passed to other functions. If the pointer is to a record or object with mutable fields it can simply be mutated in place. As you say, for array data types, you have to pass array + index to functions to get that behaviour and that needs something new. A function 'a -> unit doesn't handle 'a = 'b array * 'b index on its own. |
I'm not sure what you're talking about, but no, one cannot pass a pointer to some middle of an array to a function. One extracts the value stored in some array cell and pass this value to a function. And the current proposal does not change anything in this respect. |
|
It's not a pointer to the middle of an array that is passed, it is the pointer stored in the array cell that is passed. Problem is that with the unboxing suggested there is no more pointer. And "extracting" the array cell then means copying the record or object stored there. And that breaks mutability. Or is the array data type now restricted to immutables? |
|
Ok, the problem is not the mutability of the arrays, but the mutability within its elements. This PR (as far as I remember) does not implement the unboxing representation yet. When/if this is added, it is likely that this unboxed representation cannot be used for a record type with mutable fields (or alternatively that we will just document the copying semantics in that case). |
|
Note that you started the discussion with this example: type point = { x: int; y : int } type point_array = [| mutable point [@unboxed] |]here there is no problem. The array is mutable but the elements are not. So there is no observable change in semantics compared to today's regular arrays. |
|
In the interest of getting the actual feature merged, I think the notion of adding all this syntax to the core language needs to be abandoned and debated independently. I'd much rather see the type (...) t = [| ... |]
[@@deriving array]solution implemented, whereby magic functions are created for |
|
What's the status of this? I would really like immutable arrays in my compiler. |
|
Last year, I accidentally raised a similar issue (#8927) which now redirects to this PR. I'm not competent to discuss the implementation requirements, or whether it would require unreasonable maintenance costs from the ocaml maintainers. That said, as a user, I like the following:
I don't mind new syntax if it's natural, i.e. it's easy to guess what it means. I tend to dislike solutions that require memory efforts from the author or from the reader. Simply said, if we have to apply a functor to benefit from read-only arrays, I probably would not use it. Overall, I would like the benefit of not writing a custom module on my end to have a private array in my application. Consistency of private arrays with other private types was the main thing I had in mind originally and still today. |
|
I just looked over this PR again and I still like it. The syntax with comprehension seems clear as well. But one problem remains and has not been addressed: @unboxed. @unboxed can not work with elements that have mutable fields. Will you make sure the compiler only allows @unboxed when the elements have no mutable field? Or what is the plan there? Without a solution for how @unboxed is going to work in the future or abandoning unboxing for generic types in arrays I think this PR is stuck. PS: I really would love to have arrays of unboxed records, mutable fields or not, even if that means code has to be rewritten to work with |
|
re. unboxed types: the plan is to integrate this with the results of the unboxed types proposal. |
|
Nice! Good to see this being borrowed from haskell. |
|
That seems to still be problematic although it will just fail to compile I think or be inefficient. Say you have The RFC then says mutate is called with each field of #t passed in a register. If #t is large that will put a lot of data on the stack. Then mutate changes something and returns all the fields in registers (and the stack) again. And then it has to be copied over s.bar. There is no way I can see to make this efficient if mutate is not inlined. Only the trivial cases of will produce efficient code. I think that effectively means you can only unbox types with up to 4 fields. Beyond that boxed is better. PS: That's still a big step forward to more efficient code and probably acceptable. |
* Refactor elf notes (amd64) * Emit labels for trap-handling code locations (push, pop, enter) Record the labels in a new dedicated trap notes sections * Emit at most one .stapsdt.base symbol in each compilation unit Assembler error even though the symbols are both .weak
Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>
This PR adds support for defining fresh array types:
Motivation
The benefit of fresh array types over instances of the standard
'a arraytype is that they can use optimised representations without resorting to a non-uniform representation. OCaml already has a couple of such types builtin:stringandbytes, in addition to using a non-uniform representation to optimisefloat array.The intention is that in the future one will be able to write definitions such as:
to create an array type using an unboxed representation of record and tuple types (e.g.
point).This patch only adds support for unboxing
floatcontents (which is done without an annotation just like in records). The support for an[@unboxed]attribute would be better done as part of a large patch to support it for all data types (i.e. records fields, variant constructors, and array contents).FloatArrayThe addition of fresh array types allows an implementation of the
FloatArray.ttype from #163. This provides an unboxed float array type, without the dynamic boxing offloat array. This means it is more efficient thanfloat array(due to fewer dynamic checks) and would serve as a good first step to eliminating the dynamic boxing offloat array.Array primitives.
Since these fresh array types are not equal to
'a arraythey require their own primitives for construction, mutating and destructing them. As with the other data types (variants and records) these primitives are scoped and support type-based disambiguation. The primitives are as follows:x.(5).x.(5) <- "hello"[|1; 2; 3|]x.length[| x * 2 for x = 0 to 10 |]The first three are syntactically identical to the OCaml's existing array primitives. The last two fill the roles traditionally taken by
Array.lengthandArray.init. They are needed because we cannot have a function which is polymorphic across all array data types, and they are sufficient for implementing the remaining functions in theArraymodule.I imagine some people will be against the array comprehensions on various aesthetic grounds. I would argue that they provide exactly the required primitive, and that they are already well known (and liked) by programmers familiar with other languages. Both
forand arrays are from the imperative half of OCaml, so it seems natural to me to use afor-based syntax in connection with arrays.Notes
bytesandstringdue to the difficulties around handling-safe-string.[@dynamic_boxing]attribute is used to get the traditionalarraybehaviour. This is not really intended for public use: it is mostly just there so that theArraymodule can reexport the definition ofarray.