First steps at adding raw pointers operations to the stdlib#652
First steps at adding raw pointers operations to the stdlib#652DemiMarie wants to merge 7 commits intoocaml:trunkfrom
Conversation
This includes C primitives for bytecode and for loading 64-bit ints on 32-bit platforms.
|
Why are your pointers 64bit on a 32bit system? |
|
I feel the type parameter |
|
@nojb The big advantage is that a higher-level library could do something like this: type _ referent =
| Char : char referent
| Int32 : Int32.t referent
| Int64 : Int64.t referent
(* ... *)
let load (type 'a) (a : 'a referent) (ptr: 'a Ptr.t): 'a =
match a with
| Char -> Ptr.load8 ptr
| Int32 -> Ptr.aligned_load32 ptr
| Int64 -> Ptr.aligned_load64 ptr
(* ... *)@mrvn They are On 64-bit systems I could potentially optimize on the basis that no existing CPU has a true 64-bit address space, thus I could store the pointer in a plain Function pointers are not supported because I don't know how to implement them efficiently in Also, this is very incomplete – I have not implemented loads to pointers yet! The I deliberately did not implement pointer subtraction, because it is error-prone. You can implement it yourself if you wish: let ptr_diff : 'a Ptr.t -> 'b Ptr.t -> nativeint = fun a b ->
Nativeint.sub (Ptr.to_nativeint a) (Ptr.to_nativeint b)and this is guaranteed to be correct. Also, for FFI purposes I am wondering if a pointer and a C The end goal is to be able to bind C functions and data structures just like how Haskell's FFI does it: with the C data structures manipulated by OCaml code. |
|
I still don't see why is it necessary/useful to have the phantom type. In your example you could instead use the signature val load: 'a referent -> Ptr.t -> 'a |
|
@nojb There the phantom type does help: it prevents someone from passing the wrong pointer type of pointer to the |
|
I think for a first step you should keep things simple: remove the phantom type and move the functions to a submodule of
If thanks to this raw pointer view we can factorize some of these operations, I think it would be a great achievement in addition to support for C bindings in OCaml. |
|
@bobot Neither of these two can be implemented in terms of the other, at least not safely with respect to GC. The However, the bigarray and bigstring functions most definitely could be implemented in terms of the |
|
As far as I remember, the unsafe versions of the string get/set functions do not make any assumptions about the structure of the value. I would also suggest that if writing a raw pointer module, performance should be a prime concern, which means not representing pointers as boxed values. There are reasonable ways of representing them that do not require this, for example setting the bottom bit on a suitably-aligned pointer, and using |
|
Please keep the phantom type. Otherwise any pointer type can be passed as any other pointer type to a function expecting one. @DemiMarie The problem with function pointers is that they may (and are not) be the same size as data pointers. They do not fit into a value. A function pointer needs to be boxed. As for misusing intptr_t to pass pointers I think that would fail on at least m68k (does ocaml support m68k?) since addresses are passed in A0/A1 while integers are passed in D0/D1 as far as I remember from AmigaOS. Why don't you want to cast pointers to (void*)? Have you looked at Ctypes and how it does it? |
|
@mrvn The reason for wanting to misuse Also, on what architectures do function pointers not fit in a nativeint, that are 32 or 64 bit? @mshinwell The problem is that C pointers, unlike OCaml pointers, cannot be assumed to be aligned. On some 32-bit systems, user code can use 3GB of address space, so a pointer can use all 32 bits. On 64 bits, sign extending a 63-bit value can point to the entire address space. |
|
@DemiMarie I thought function pointers where always larger than void* but I was confusing them with c++ method pointers (which need extra space to handle virtual table lookup). Cases where function pointers differ from data pointers are when you have segmented memory like on MSDos. I don't know what other architectures ocaml supports but google shows there is an ocaml for MSDos. As for boxing pointers: Could we have a [@@Aligned] attribute that would keep them unboxed? |
|
OCaml does not support 16 bit machines. The ideal solution to boxing of pointers (and of other types) is As far as workarounds, I am not sure how to make the representation of a I could have a separate type for pointers that are guaranteed to be 16-bit
|
|
@DemiMarie I think you should have one type for 16-bit aligned pointers (int) and one for misaligned pointers (nativeint). Almost all pointers will be aligned to 16-bits (actually 32 bits) on most platforms AFAIK, with the exception of char*. Even in structs, you'll get padding unless you specifically disable it (with attribute((packed))). Also, why do you have options for loading Pint32/64/nativeint? AFAIK all platforms nowadays match their nativeint size to the pointer size. |
|
Will implement (along with writes to pointers and possibly other
|
|
For fun I played to access bigarray using Remaks:
|
|
A C pointer has 2 hidden attributes:
|
|
I don't think we should bother to unbox pointers. When one knows that a pointer is 2-aligned, they can always convert it to an |
|
I thought a bit more about this, I think we should do the same as for the string/bigstring primitives: just add compiler support - i.e. the primitives - and let users write their own API with phantom types if they want. @DemiMarie do you think you can finish the primitives, and remove the stdlib additions? I'll then review the patch and merge it. We've have a use case at Jane Street where using the string primitives wouldn't be safe so I'd like to have this feature in. Can you also add some tests for these primitives? You can copy the tests for the string primitives in |
| if aligned then | ||
| Cop(Cload Word_val, [unboxed_arg]) | ||
| else unaligned_load_64 unboxed_arg (Cconst_int 0)) | ||
| | Pload(Pnativeint, aligned) -> |
There was a problem hiding this comment.
We don't have primitves for loading/storing nativeints for strings/bigstrings so we don't need ones for raw pointers. One can implement them in terms of the int32/int64 primitives and it should be as efficient.
|
@bobot, I'm not sure at all about the What I'm going to do is start from your branch, cleanup a bit the history, cleanup the bytecode stubs (they should be shared with the string stubs) and add some tests, then submit a new PR |
@diml seems good, and a worthwhile cleanup with @DemiMarie technique. The last cleanup would be, once all these PR are accepted, to remove the bigstring primitive and code all directly in OCaml. |
| do {\ | ||
| CAMLparam1(ptr);\ | ||
| u##type##_t res;\ | ||
| memcpy(&res, (void*)Nativeint_val(ptr), sizeof res);\ |
There was a problem hiding this comment.
That will not work for bigendian, str.c is in fact right to be so complicated.
There was a problem hiding this comment.
Sorry I'm wrong @DemiMarie's code is good and the one in str.c is overly complicated. But we should give an API for the %bswap16, %bswap_int32 %bswap_int64 %bswap_native, so that people can take into account endianness in there OCaml code.
There was a problem hiding this comment.
However the CAMLparam1 and CAMLreturn can be removed like in str.c since nothing is done after the first allocation.
|
Also @diml you can change a bit the primitives to assume a nativeint as index argument. This could allow to remove some untagging in some cases. The usual way to use them would probably still be |
|
By the way I now consider it a mistake to have used |
|
@bobot, If we implement some bigarray primitives in OCaml, we need to be quite careful to have some ways to keep references to the bigarray value alive while the bigarray memory is used: let ba_get ba i =
let data = Ptr.loadnative ba 0 in
Ptr.loadnative data i
let iter ba f =
for i = 0 to Array1.dim a - 1 do
f (ba_get ba i)
done
let stuff ba =
let f x = Sys.opaque_identity (x, x) in
f ba_get ba 0;
iter ba fSupposing that everything is inlined, f will allocate, but won't invalidate pointer loads, hence all the This means that we need some way to tell that |
Could we have another more general way? In tests for the GC, it often happen that we also want to keep a value alive for some time and to let it go later. We could have something like: external keep_alive: 'a -> (unit -> 'b) -> 'b = "%keep_alive"
(** [keep_alive handled f] ensure that [handled] is alive until the
end of the computation of [f]. It returns the result of [f]. *) |
|
@bobot for the user facing part yes, but we would still need something else to represent that in Mach. |
|
Should I change the documentation of |
|
@bobot I don't like the keep_alive function since nothing ensures it actually gets used. I think at least for bigarray (and similar structure access) it should be impossible to screw this up by accident. For that I think we need a temporary object that contains a) the original pointer to the bigarray block, b) the offset into the bloc (0 here). And then a function to load from an indirect pointer, e.g. Ptr.loadnativeref, that takes the temporary object and returns the data. A more general form could be to store the pointer to the ocaml value keeping the data alive + C pointer to the data. |
|
@mrvn said:
That's true but only the developers of the bigarray library must not forget to use it. Raw pointers are already not for the casual users and very unsafe, it should be used indirectly only behind nice API (ex:bigarray), or with the help of library similar to ctypes. So I don't see the need to over-engineer the API provided by the compiler. PS: just for simplifying |
This is what ctypes currently does. It'd be handy to have support in the standard library. |
|
How does Have you considered passing |
It's a use of the value that can't be optimized away. Whenever |
|
The 'efficient' version of |
|
Is there a reason to keep this PR open, considering #724? |
|
I suppose there is no need to have both PRs |
…trunk Make young_start/end/ptr pointers to value
This adds support to the standard library for operations on raw pointers, along with associated compiler support.
The intended use case is to make libraries like ctypes simpler and faster – they can manipulate C data structures in OCaml code, instead of calling out to a C primitive. I believe that this should be part of the standard library, since it cannot be implemented as efficiently outside. (The operations are %-primitives and depend on compiler support.)
This patch is very incomplete. It compiles, but any use of pointers in bytecode will fail, as will reading 64-bit pointers on 32-bit systems, due to missing support in the runtime. Writes to pointers are not yet implemented. No new tests are present either.
Nevertheless, I am asking for review of this patch, in particular the compiler changes and the (so far implemented) API. This is my first time modifying any production compiler, so please pardon any silly mistakes! Smoke testing (
make natruntopfollowed by entering some simple operations in the toplevel) has already revealed some bugs, so I am sure I screwed up somewhere.