Conversation
|
/cc @seliopou |
|
We'll also need an opam package |
Since when do we expect all changes to the stdlib to be made available to older versions of OCaml through OPAM? Why is adding a module different from adding a function to an existing one? |
|
It would useful to have some indexing operators. One way of accomplishing that is to directly expose the equality between It would also be useful to have the Finally, the naming is perhaps confusing, since strings are immutable, but bigstrings are mutable. |
Bigbyte? |
That was my initial intention, but it complicates the definition of functions on this type in |
True. I chose this name since it is the one historically given to this concept, but I am open to suggestions! Some existing libraries that use this name: |
Let me rephrase with |
|
I see, but doing the same for Option, Result, and all new functions would be nice as well. Of course, I've no problem (and I'm actually grateful) if people want to maintain backported forks of the stdlib for older versions of OCaml, as long as this does not slow down evolutions of the stdlib itself. |
|
Re: backporting stdlib improvements, see also https://github.com/thierry-martinez/stdcompat |
This will be done in |
| val output_bigstring : out_channel -> bigstring -> int -> int -> unit | ||
| (** Same as {!output}, but takes the data from a big string. | ||
|
|
||
| @since 4.08.0 *) |
There was a problem hiding this comment.
Give that it's easier to add new modules to the stdlib now, should we consider not polluting the global namespace with "IO" functions today, and instead add them to a separate module? (possibly later and outside of this MR)
There was a problem hiding this comment.
Yes, we should consider it. However:
- I don't see anyone working on this hypothetical
IOmodule. - If and when such a module is introduced we will need to deprecate existing I/O functions in
Stdlibanyway; adding one more function will not change anything much. - Personally, I am not super confident that such a module will land soon because, contrary to all the recently considered new modules (
Float,Seq,Option,Result), the design space seems much larger and closely related to things such as effects and multicore, which will complicate matters.
Having said that, of course I am open to discussion, and if anyone is sitting on a IO proposal, I am happy to delay the inclusion of this function.
There was a problem hiding this comment.
the design space seems much larger and closely related to things such as effects and multicore
We are indeed working on an effects-based IO library that supports direct-style function calls using effects. It's probably not worth doing a large stdlib rearrangement before that prototype is complete.
There was a problem hiding this comment.
Well, conveniently this just came up: #640 (comment). And I think this is another point in favour of not adding the IO functions in Stdlib:
One way of accomplishing that is to directly expose the equality between bigstring and Bigarray.Array1.t
That was my initial intention, but it complicates the definition of functions on this type in Stdlib/Pervasives (e.g. output_bigstring, etc).
|
I've looked over the interface, and it looks suitable to port Mirage's cstruct over to use this module instead. In particular, we need the efficient conversion functions to/from int16/32 that were formerly the domain of Bigarray only. |
|
I'll again suggest "Bigbytes", as it's much more bytes than string... |
|
|
||
| let set_int64_be b i x = | ||
| if not Sys.big_endian then set_int64_ne b i (swap64 x) | ||
| else set_int64_ne b i x |
There was a problem hiding this comment.
Could these be written as:
let set_int64_be =
if not Sys.big_endian then set_int64_swap
else set_int64_neI.e., moving the endianness check to the top-level?
There was a problem hiding this comment.
Yes, they could, but I don't think it amounts to anything: under ocamlopt both variants generate the same code thanks to inlining. And even under ocamlc you are trading a branch on Sys.big_endian for an extra function call to set_int64_swap.
There was a problem hiding this comment.
In this particular case, the two ways to write the code give exactly the same result: the test is reduced at compile-time, and the branches in your version are external primitives, so they are eta-expanded when they are not applied.
| [dst]. *) | ||
|
|
||
| val blit_bytes : bytes -> int -> t -> int -> int -> unit | ||
| (** Same as {!blit}, but read data from a byte sequence. *) |
There was a problem hiding this comment.
You should precise than this function (instead blit) use internally memcpy.
|
The name I agree the main motivation for these bigarrays-of-bytes is interface with systems code. But we have 3rd-party llibraries that do this very well already, like cstruct (https://github.com/mirage/ocaml-cstruct). Is the need so strong that it must be put in the standard library? How will nonexpert users know how to choose between bytes and externally-allocated-bytes (what you call "bigstrings")? Finally, the last time I mentioned using bigarrays-of-bytes internally in the compiler, @yminsky lectured me on how bad memory management for bigarrays is. I know we're making progress on this front, and the Mirage people don't seem to care, as they allocate small bigarrays like crazy. Still, before blessing the use of bigarrays-of-bytes as an alternative to bytes for interfacing with C code, we need to be very sure they no longer cause memory management issues. |
As a Mirage guy, this is my biggest problem about some implementations like |
|
@dinosaure: the bigarray-allocation issues should have been largely fixed by GPR #1476, which was part of the 4.07 release. Have you tried again with 4.07 to see if you still had memory-management problems with bigarrays? No matter what the answer to this question is, the discussion a bit off-topic here, could we continue discussion bigarray (and bigstring) memory-allocation policies in the proper Mantis ticket, MPR#7100? |
|
BTW, I get that I'm in the minority, but I still think having inconsistent naming (i.e. naming a thing with mutability "string") is unfortunate. |
|
Thanks for the discussion everybody! While personally I think that having bigarrays-of-bytes in the stdlib as the standard way to interface with C code would be useful, its addition looks premature at this time. As a bare minimum, a confirmation that memory management issues have been solved is necessary to move forward. Since I don't have the time to confirm this right now, I am closing this PR. Discussion can be re-opened in the future as needed. |
This PR proposes the addition of a
Bigstringmodule to the standard library (see MPR#7772). "Big strings" refer to one-dimensional big arrays of characters, and they are widely used for interfacing with C code, and in other contexts.The new module
Bigstringexposes an abstract type, and provides roughly the same functions asBytes, a non-copyingsuboperation, plus binary numeric functions (as in #1864), and conversions to- and from-bytes,string, and bigarray types.In addition, the new type is integrated with existing stdlib modules:
Bytes.sub_bigstringis defined as well as I/O functions in bothStdlibandUnix. In order to define these functions in their natural modules and to bring the bigstring type to the same level asbytesandstring, a new built-in, predefined typebigstringis defined in the compiler.Missing:
Unixfunctions under Windows.Feedback welcome! (in particular, whether the more invasive changes to the existing standard library and compiler are worthwhile).
/cc @hhugo @diml