Conversation
|
Of course, the first reactions will be to question my choice of Zlib, instead of criticizing the overall design. I chose Zlib because it's available everywhere and I'm somewhat familiar with its API. For unmarshaling, compressed data is inflated in one go (we know the sizes of the compressed and uncompressed blocks), so any compression library could fit there. A streaming API could be used when unmarshaling from an input channel, but I did not implement this to keep the code simple. For marshaling, I use the streaming API of Zlib to compress block per block, freeing the input blocks as we go, so that memory requirements decrease during compression. I like this code :-) |
|
If brave souls would like to experiment with this approach on the scale of a full OPAM installation, I think (but haven't checked) that this PR should apply mostly cleanly to OCaml 5.0. |
|
Testing shows a rare segmentation fault in |
|
I suspect that the impact on |
| but slows down marshaling and, to a lesser extent, unmarshaling. | ||
| Compression is not supported on some platforms; in this case, | ||
| the [Marshal.Compression] flag is silently ignored and uncompressed | ||
| data is written to channel [chan]. |
There was a problem hiding this comment.
I think this is good default behaviour. However for clients that want to warn or behave differently when compression is unsupported, it would be nice to have compression_supported, or compression_available or a better name with:
val Marshal.compression_supported : bool
(** [compression_supported] is [true] if and only if compressed (un)marshaling is
supported on the platform. *)There was a problem hiding this comment.
There are no technical difficulties to provide such a compression_supported value. I still like the idea that compression is an implementation detail that we can hide (almost) completely from our users. But if there's consensus for compression_supported, I'm fine with it.
There was a problem hiding this comment.
I still like the idea that compression is an implementation detail that we can hide (almost) completely from our users.
Your users will be happy to have a mean to understand or detect why they specified Compressed on marshal but the result isn't. Also since this new behaviour somehow breaks this promise:
The bytes can then be read back later, possibly in another process, and decoded back into a data structure. The format for the byte sequences is compatible across all machines for a given version of OCaml.
I think it's good to have a flag that allows to reconcile this view modulo the boolean.
There was a problem hiding this comment.
Regarding the "read back on all machines" guarantees, my understanding is that the intent is to support configurations that do not support compression, but try hard to have all configurations support at least decompression. If we manage to get there, we can have the property that all machines can read each other's marshalled data at a given version.
There was a problem hiding this comment.
Regarding the "read back on all machines" guarantees, my understanding is that the intent is to support configurations that do not support compression, but try hard to have all configurations support at least decompression.
That's great.
But as long as this remains an optional dependency for compression I'd really like to have a way to understand whether my program is going to compress or not.
The design of a system may rely on this, there's various third parties I don't control that may be in charge of installing and configuring the toolchain on a given machine, I'd like to be able to assert this information in my program and take appropriate actions (log a warning, bail out, etc.) if compression is not enabled.
There was a problem hiding this comment.
Added Marshal.compression_supported in commit c41b4cb . It's a unit -> bool function instead of a plain Boolean because the implementation has a small but non-null cost.
There was a problem hiding this comment.
Can you observe any negative performance impact of compression on the jsoo side?
There was a problem hiding this comment.
I didn't manage to build jsoo with trunk yet (opam switch + jsoo deps), I might try again later. Once this is resolved, we should be able measure the perf impact loading the in-browser toplevel, reading compressed cmis.
I've just tried to unmarshall some data marashaled using this PR in my 5.0.0 switch.
There was a problem hiding this comment.
I was able to run more test after pinning the right branch of ppxlib.
I didn't observe any negative perf impact while the size of the generated files was reduced by 25M (due to embedded cmi files)
|
I finally read the Zstd API documentation, and the API is actually quite reasonable. So I just pushed a commit that replaces Zlib by Zstd for compression and decompression of marshaled data. Compression is clearly faster, so much so that the time overhead of compressing .cmt files is now lost in measurement noise. More measurements are needed, of course. |
afe24a0 to
0d17d23
Compare
For jsoo, I think we could support decompression (of zstd) only by relying on https://www.npmjs.com/package/fzstd. |
We're observing the same failure in the CI in #12001 (comment). See https://github.com/ocaml/ocaml/actions/runs/4163285265/jobs/7203489175#step:6:5191. |
Another option is to vendor in to OCaml the MIT-licensed zlib replacement miniz.c https://github.com/richgel999/miniz which is a single C file, so compression would always be available. It has an optimised-for-speed compressor which is faster than zlib's. (As an side-effect, this would allow CamlZip to be brought into the Standard Library for everyone to use, if that is considered desirable.) |
Good to know! Thanks for the pointer. Speaking of license, the file miniz.c is weird: at the top there's a MIT license, and at the bottom it claims it's in the public domain ?
Do you have an idea of it compares with zlib, speed-wise? I found https://code.google.com/archive/p/miniz/wikis/miniz_performance_comparison_v110.wiki but I cannot make sense of the numbers. |
|
Concerning "vendoring" of C libraries inside the OCaml runtime system in general : it can be considered if the library is small enough, e.g. a single .c file and a single .h file are fine. However, there's still the question of maintenance and bug fixing, esp. when security holes are found in the original library. |
I think this page has lost some formatting; the version in https://web.archive.org/web/20150220064400/https://code.google.com/p/miniz/wiki/miniz_performance_comparison_v110 has the original formatting and is more understandable. |
That file is not the actual miniz.c. The actual built miniz.c is 7500 lines, not 650, and has only the MIT license at the top; nothing at the bottom. See the releases here: https://github.com/richgel999/miniz/releases |
I consider that there are two intertwined questions: a) should this API be supported in a (much) wider scope than the original topic of compressing .cm* files? b) should an underlying C library be fully exposed? I am in favor of making all of this as small as possible. Marshal is unsafe. Reading untrusted marshaled data is a known security issue (I expect it can easily lead to code execution). As such, using a C library in that setting does not add any new concern. If this API is instead meant for general use, it adds all the known issues with C code. There is no middle-ground situation where some uses would introduce fewer concerns than others: limiting this to Marshal means no new issue and making it usable for anything else means the whole set of issues right away. As for vendoring, I'm not sure which benefits it would bring. I don't think there are many places where zlib or another library can be used but cannot be obtained/installed. The main concerns we've had so far are mirage and jsoo but their issue is C code no matter where it is. Vendoring wouldn't help them. Moreover, and if I'm not mistaken, integrating such C libraries as static libraries will also pollute the symbol namespace, making it an additional concern. By the way, exposing compression APIs is also a lot more work for relatively minor gains. Zstd is probably the compression library that we want the least to expose due to its frequent change and to the size of its API. This is better served by a dedicated library. If there are efficiency gains to be found with a more-integrated approach than what is available today (maybe by chaining libraries in C-land), this is probably better solved by adding the ability to do plumbing and data shuffling in a new way (that's really just an example) because sometime zstd is more appropriate but sometimes it's lz4, sometimes it's xz, sometimes an intermediate compression filter will do wonders (I've had a delta filter make everything ten times better). The list goes on. NB: if you want to go the zlib route, you might want to look at zlib-ng (after checking that upstream is focused enough on API stability); for an idea of the performance changes, see https://www.megastormsystems.com/news/benchmarks-results-of-zlib-and-zlib-ng-running-on-some-amd-and-intel-cpus I'm also wondering how this change will impact bootstrapping: without vendoring, the bootstrap compiler cannot support decompression but at the same time, I don't think we'd want to vendor zstd for that purpose because of how big it is (700KB for the .so on my machine). On a related note, zstd is C++ and this can have a noticeable effect on platform support and build requirements. |
|
I just pushed a version of this PR where compression is also applied to .cmi files and to the debug info in .cmo and .cma files. All this data compresses well, and the impact on compilation times seems lost in the noise (it will need a better analysis later). |
|
Replying to @adrien-n
I generally agree! In particular, providing a compression API in the stdlib should be considered separately and later.
You really believe I didn't think of that? :-) The bootstrap bytecode executables If we wanted to compress bytecode data segments in the future, the The other marshaled data in bytecode executables is the debug information, which I have kept uncompressed so far. I now propose to compress debug info in .cmo / .cma files, but it may be preferable to keep bytecode executables 100% portable, including uncompressed debug info. |
It is, indeed! Thanks! My quick reading is that miniz is slightly faster than zlib, but nothing like zstd (which is not on this page but is 10x faster for similar compression rate in my experience). Also, I'm getting bad vibes from the fact we have to go to archive.org to get readable documentation. |
|
I just |
gasche
left a comment
There was a problem hiding this comment.
I had a first quick look at the peripheral logic (not the main compression stuff).
runtime/caml/intext.h
Outdated
| 5 and following | ||
| 5 variable-length integers, in VLQ format (1 to 10 bytes each) | ||
| - length of compressed marshaled data | ||
| - length of uncompressed marshaled data |
There was a problem hiding this comment.
Suggestion: length of ..., in bytes.
runtime/caml/intext.h
Outdated
| 24 size in words when read on a 64-bit platform | ||
| The 3 numbers are 64 bits each, in big endian. | ||
|
|
||
| Header format for the "compressed" model: |
There was a problem hiding this comment.
is it "10 to 55 bytes"? You give the size for the other formats; no big deal, but it's not completely obvious to compute it from the description.
There was a problem hiding this comment.
It is 10 to 55 bytes, indeed. Added as part of 507b0a9.
runtime/extern.c
Outdated
| len = storevlq(header + pos, s->obj_counter); pos += len; | ||
| len = storevlq(header + pos, s->size_32); pos += len; | ||
| len = storevlq(header + pos, s->size_64); pos += len; | ||
| CAMLassert(len < 64); |
There was a problem hiding this comment.
I don't understand the assertion here, and I am uneasy about the overflow behavior of the storevlq function.
At this point len is the number of base-128 digits of the last field read. The check len < 64 on just this last length is odd.
At first I thought that maybe you meant to check that pos < MAX_INTEXT_HEADER_SIZE. But then this check would also be odd, because only runs that perform out-of-bound writes would fail the test -- so in particular the compiler may erase it.
I would feel more at ease if the storevlq and readvlq functions took as extra parameter the "end" of the input buffer, and abort if they reach the end while looking for the end of the digit.
(For storevlq this is arguably academic as we can statically bound the number of base-128 digits needed for an uintnat on today's machines.)
There was a problem hiding this comment.
The assertion was left over from an earlier design. Currently, it is wrong (it should be about pos, not len) and useless (as you noticed), so I removed it in 754b1a2.
For the foreseeable future, uintnat is at most 64 bits, so storevlq is guaranteed to write at most 10 bytes (since 9*7 < 64 <= 10*7). I don't think we need any more bounds checking.
readvlq can be abused by artificially-padded input 0x80 ... 0x80 <the actual number>. However, the whole unmarshaler can be abused a zillion different ways by malicious input, so what the heck...
Besides, if we were to do bounds checking in intern.c, we'd have checks in read8u first, and then readvlq would automatically be checked.
| CAMLassert (ofs < s->obj_counter); | ||
| CAMLassert (s->intern_obj_table != NULL); | ||
| v = s->intern_obj_table[s->obj_counter - ofs]; | ||
| v = s->intern_obj_table[ofs]; |
There was a problem hiding this comment.
I reviewed this logic and found it correct, including the assert changes. (But confusing; I wish we could have a pos variable instead of changing the meaning of "offset" in the middle.)
| /* #4056: using absolute references for shared objects improves | ||
| compressibility. */ | ||
| uintnat d = s->extern_flags & COMPRESSED ? pos : s->obj_counter - pos; | ||
| extern_shared_reference(s, d); |
There was a problem hiding this comment.
I have mixed feeling about using conditional logic here. It makes the code more complex, and we lose the property that the only difference between both payloads is compression. Are we sure that it is worth it? (Are the space saving in the uncompressed case substantial enough to justify relative offsets, and the space saving in the compressed case substantial enough to justify absolute offsets?)
There was a problem hiding this comment.
As mentioned in 6f0b636, switching to absolute references reduced the size of a big .cmt file (probably camlinternalFormat.cmt) by 15% on top of the savings obtained by zlib compression. I haven't re-done the measurements since switching to zstd. But it's clear that the insight in #4056 is correct and that the compressed format must use absolute references.
On the other hand, you don't change the existing uncompressed formats lightly. Switching to absolute references there would mean new magic numbers, inability to read persistent data marshaled with older OCaml versions, and perhaps some "fun" in the bootstrapping process. Or, if you want backward compatibility, you'd need to support both absolute and relative references, with different magic numbers, and the "conditional logic" you dislike would be there again. I don't even want to know which of relative or absolute references perform better without compression, because switching to absolute references for the uncompressed formats is not on my roadmap.
configure.ac
Outdated
| ## ZSTD compression library | ||
|
|
||
| AS_IF([test x"$with_zstd" = "xyes"], | ||
| AS_IF([pkg-config --atleast-version 1.4 libzstd 2>/dev/null], |
There was a problem hiding this comment.
Just noting that I think this adds a new dependency on pkg-config for the OCaml compiler build, but that I think it's probably fine. Although, at a glance on most common distros zstd is pretty standalone, and so just using AC_SEARCH_LIBS might be sufficient and avoid this dependency.
There was a problem hiding this comment.
Right. I'm not fixated on using pkg-config, and the quick-and-dirty test above just fails silently if pkg-config is not available. Moreover, the test will certainly have to be reworked to handle the Windows ports.
Although, at a glance on most common distros zstd is pretty standalone
Linux distros, yes. For macOS, zstd is in bizarre Homebrew or MacPorts locations, and I don't know of any better way to find those locations than pkg-config.
There was a problem hiding this comment.
Also, because the zstd streaming API is in constant flux, we need to check for version >= 1.4, version 1.3 (as found in Ubuntu 18.04 LTS for instance) doesn't work. Again, pkg-config makes it trivial to do this check, but AC_SEARCH_LIBS will need a bit more work.
There was a problem hiding this comment.
I pushed an improved zstd detection code: it tries pkg-config first, and if it fails it then uses AC_CHECK_LIBS and AC_CHECK_DECL to see if zlib is in the standard location and defines the functions we need. On our CI most of the machines have pkg-config, and those that don't have no zstd library or an outdated version, so the new code is not fully exercised.
It's quite likely, but this is not relevant to this PR. Can you move this comment to https://github.com/ocaml/opam/ ? |
|
I'm still trying to gather performance data. For a build and installation of the core OCaml system (
I think the benefits of compression are clear, and there are no major downsides so far. So, please REVIEW this PR. Thanks. |
|
(FYI, zstd support (decompression only) has been merged into jsoo) |
|
Some numbers from testing this PR on a substantial subset of Jane Street's code base using Flambda1. Building while compression artifacts took 2h41min versus 2h37min for a build without compression. These numbers are within noise meaning that there wasn't any noticeable slowdown caused by this PR.
|
|
Thanks for the measurements. I don't understand the second line of the table: |
|
Xavier Leroy (2023/02/25 07:43 -0800):
> The pkg-config package provides a `PKG_CHECK_MODULES` you may want to
> use rather invoking `pkg-config` manually. This would actually be
> felpful also in other cases where we could take advantage of pkg-config,
> I think.
Yes, I saw these pkg.m4 macros, but they don't quite do what I need,
e.g. I saw no way to check for the version of a package.
I think it's not so much in autoconf's culture to check for versions.
They rather encourage to test for features than for versions, actually.
> One other point is about how the `--with-zstd` option is supported.
> Currently, if `zstd` is not found you get warning, whether you did
> specify `--with-zstd` explicitly or not.
> I think this should not be so. If nothing is specified then yes, we are
> in best effort mode and it is legitimate that `zstd` not being there
> just triggers a warning.
> If it has been explicitly requested, though, then the unability of
> ocnfigure to find it should, IMO, trigger an error. That's exactly the
> behaviour we have for other options.
That's reasonable. I tried to implement this kind of 3-way behavior
(yes/no/do your best) in commit d46f542 .
I couldn't find that commit but I assume some rebasing / squashing
pccurred, meanwhile.
I just noticed the `Compute `Config.configuration_variables` on demand`
commit: does it really belong here? I remember you being a bit strict
recently on another PR and saying this PR should do only what it was
aiming for. Indpendently of the ocntents ofthis commit, wouldn't it make
sense to submit it independently?
> To further elaborate my comment above, reporting an error (rather than a
> warning) when zstd support ismissing whereas it has explicitly been
> requested would make it easier to do CI on the feature. It would indeed be
> sufficient to add --with-zstd to the flags passed to configure on our CI
> script to get a failure as soon as something oges wrong on one platform.
Noooo!
(there is no need to shout. Really.)
We want (and have today) a mixture of CI platforms with and without
ZSTD, and we want to test autodetection, because that's the default
(and most complicated) mode...
It is not at all obvious to me that we want to test *only*
autodetection. I'm actually rather of the opinion that it would be good
to test both autodetection and requested support. We do have the
flexibility in our CI infrastructure to use different configure flags on
different platforms, so why not using it to make sure both modes work?
> Finally, and althoughthis is not so much emphasized in the documentation of
> the AC_ARG_WITH macro, in theory there is some freedom in the form
> it can take, so e.g. --with-zstd=/path/to/zstdpackage is legitimate and it
> may be useful to be able to deal with such a call. In other words the
> withval variable may contain something else than yes, no, etc. as is
> already the case for the --with-flexdll option (although I am not sure how
> well we deal with it). Another example that we already have is
> target-bindir where the argument has to be a directory.
I chose to interpret `--with-zstd=<something other than "yes" or
"no">` as "no option given", i.e. "do your best".
You may want to also have special exceptions for "true" and "false".
|
|
I was curious about |
|
Regarding the size of cmx files, I assume that Jane Street tends to use flambda with |
|
Xavier Leroy (2023/02/26 06:10 -0800):
> just for fun I pushed a commit that adds compression_supported to ocamlrun -config, on the suggestion of @dra27.
Hey, I was adding it in parallel :-) I also added it to `ocamlc -config`, since it's no big deal.
My modest improvement (?) to the config.ml contraption is commit
091d18b. There's a big list `Config.configuration_variables` that,
currently, is always computed every time the compilers start up, even
though it is used only if the `-config` or `-config-variable` options
are given, and then it is used once and the compiler exits. My modest
improvement is to evaluate this list when needed (call by name).
Just to say, if you want to keep it in this PR, I'm fine.
|
Hello Sébastien, the PR has already been merged :) Cheers! |
|
Oops sorry to arrive so late! :)
|
|
Just a comment about the interface: it might be nice if users can specify the compression level they want. Last time I read about fast compressors/decompressors, lz4 and lzo came up. |
The final implementation uses zstd |
The Marshal header size was changed in ocaml/ocaml#12006.
The Marshal header size was changed in ocaml/ocaml#12006.
- Remove compressed marshaling support from runtime/{extern.c,intern.c}
- Remove the `Marshal.Compression` flag
- Remove its uses in the compiler
runtime/extern.c still accepts the old Compression flag and just ignores it.
This avoids a bootstrap at this point. The support will be removed later.
- Remove compressed marshaling support from runtime/{extern.c,intern.c}
- Remove the `Marshal.Compression` flag
- Remove its uses in the compiler
runtime/extern.c still accepts the old Compression flag and just ignores it.
This avoids a bootstrap at this point. The support will be removed later.
This is a proof-of-concept for ocaml/RFCs#23 .
It supports optional compression to marshaled data, with transparent decompression during unmarshaling.
It adds a
Marshal.Compressionflag to theMarshal.to_functions, instructing them to produce compressed marshaled data.Compressed marshaled data is identified with a new magic number and uses a new, compact format for its header, see
runtime/caml/intext.h.Unmarshaling functions (
input_value,Marshal.from_*) recognize compressed marshaled data and transparently uncompress before unmarshaling.The compression library currently used is Zlib (the one used by zip and gzip), at the default compression level of 7. It is autodetected in
configureand can be turned off withconfigure --without-zlib.If the Zlib library is not available or was turned off, the
Marshal.Compressionflag is silently ignored and uncompressed marshaled data is produced. The only failure case is if compressed marshaled data produced on another OCaml installation is read back in an installation that doesn't have Zlib.When compression is on, the format of shared object references is changed from relative to absolute, as suggested in #4056. This does reduce the size of compressed data noticeably (e.g. -15% on a typical .cmt file).
Finally, the compiler was changed to use compression for data marshaled in
.cmtand.cmtifiles. The resulting files are 35-40% the size of the uncompressed files. This comes at a slight increase in compilation times: on my machine, byte-compiling stdlib/camlinternalFormat.ml takes 10% longer;make world.opttakes 3% longer. I have not tried to compress other bulky marshaled data, such as debugging info in.cmofiles, or persistent signatures in.cmifiles.