Marshaling with compression by xavierleroy · Pull Request #12006 · ocaml/ocaml

xavierleroy · 2023-02-12T16:16:06Z

This is a proof-of-concept for ocaml/RFCs#23 .

It supports optional compression to marshaled data, with transparent decompression during unmarshaling.

It adds a Marshal.Compression flag to the Marshal.to_ functions, instructing them to produce compressed marshaled data.

Compressed marshaled data is identified with a new magic number and uses a new, compact format for its header, see runtime/caml/intext.h.

Unmarshaling functions (input_value, Marshal.from_*) recognize compressed marshaled data and transparently uncompress before unmarshaling.

The compression library currently used is Zlib (the one used by zip and gzip), at the default compression level of 7. It is autodetected in configure and can be turned off with configure --without-zlib.

If the Zlib library is not available or was turned off, the Marshal.Compression flag is silently ignored and uncompressed marshaled data is produced. The only failure case is if compressed marshaled data produced on another OCaml installation is read back in an installation that doesn't have Zlib.

When compression is on, the format of shared object references is changed from relative to absolute, as suggested in #4056. This does reduce the size of compressed data noticeably (e.g. -15% on a typical .cmt file).

Finally, the compiler was changed to use compression for data marshaled in .cmt and .cmti files. The resulting files are 35-40% the size of the uncompressed files. This comes at a slight increase in compilation times: on my machine, byte-compiling stdlib/camlinternalFormat.ml takes 10% longer; make world.opt takes 3% longer. I have not tried to compress other bulky marshaled data, such as debugging info in .cmo files, or persistent signatures in .cmi files.

xavierleroy · 2023-02-12T16:19:21Z

Of course, the first reactions will be to question my choice of Zlib, instead of criticizing the overall design. I chose Zlib because it's available everywhere and I'm somewhat familiar with its API.

For unmarshaling, compressed data is inflated in one go (we know the sizes of the compressed and uncompressed blocks), so any compression library could fit there. A streaming API could be used when unmarshaling from an input channel, but I did not implement this to keep the code simple.

For marshaling, I use the streaming API of Zlib to compress block per block, freeing the input blocks as we go, so that memory requirements decrease during compression. I like this code :-)

xavierleroy · 2023-02-12T16:20:13Z

If brave souls would like to experiment with this approach on the scale of a full OPAM installation, I think (but haven't checked) that this PR should apply mostly cleanly to OCaml 5.0.

xavierleroy · 2023-02-12T16:22:41Z

Testing shows a rare segmentation fault in tests/lib-marshal/intext_par.ml, which I was able to trigger under debugger once, and was caused by unmarshaling of objects (in non-compressed mode) producing a wrong code pointer. I don't think this failure is related to this PR, but it may indicate something fishy in the management of code segments.

gasche · 2023-02-12T20:45:09Z

I suspect that the impact on .cmi size could also be an interesting data point. In particular, in codebases that suffer from too-large generated artifacts, disabling .cmt generation is an option, but disabling .cmi generation is not a possibility.

dbuenzli · 2023-02-13T11:55:40Z

stdlib/marshal.mli

+   but slows down marshaling and, to a lesser extent, unmarshaling.
+   Compression is not supported on some platforms; in this case,
+   the [Marshal.Compression] flag is silently ignored and uncompressed
+   data is written to channel [chan].


I think this is good default behaviour. However for clients that want to warn or behave differently when compression is unsupported, it would be nice to have compression_supported, or compression_available or a better name with:

val Marshal.compression_supported : bool (** [compression_supported] is [true] if and only if compressed (un)marshaling is supported on the platform. *)

There are no technical difficulties to provide such a compression_supported value. I still like the idea that compression is an implementation detail that we can hide (almost) completely from our users. But if there's consensus for compression_supported, I'm fine with it.

I still like the idea that compression is an implementation detail that we can hide (almost) completely from our users.

Your users will be happy to have a mean to understand or detect why they specified Compressed on marshal but the result isn't. Also since this new behaviour somehow breaks this promise:

The bytes can then be read back later, possibly in another process, and decoded back into a data structure. The format for the byte sequences is compatible across all machines for a given version of OCaml.

I think it's good to have a flag that allows to reconcile this view modulo the boolean.

Regarding the "read back on all machines" guarantees, my understanding is that the intent is to support configurations that do not support compression, but try hard to have all configurations support at least decompression. If we manage to get there, we can have the property that all machines can read each other's marshalled data at a given version.

Regarding the "read back on all machines" guarantees, my understanding is that the intent is to support configurations that do not support compression, but try hard to have all configurations support at least decompression.

That's great.

But as long as this remains an optional dependency for compression I'd really like to have a way to understand whether my program is going to compress or not.

The design of a system may rely on this, there's various third parties I don't control that may be in charge of installing and configuring the toolchain on a given machine, I'd like to be able to assert this information in my program and take appropriate actions (log a warning, bail out, etc.) if compression is not enabled.

Added Marshal.compression_supported in commit c41b4cb . It's a unit -> bool function instead of a plain Boolean because the implementation has a small but non-null cost.

PoC support for jsoo ocsigen/js_of_ocaml#1411

Can you observe any negative performance impact of compression on the jsoo side?

I didn't manage to build jsoo with trunk yet (opam switch + jsoo deps), I might try again later. Once this is resolved, we should be able measure the perf impact loading the in-browser toplevel, reading compressed cmis.

I've just tried to unmarshall some data marashaled using this PR in my 5.0.0 switch.

I was able to run more test after pinning the right branch of ppxlib.
I didn't observe any negative perf impact while the size of the generated files was reduced by 25M (due to embedded cmi files)

xavierleroy · 2023-02-13T14:02:58Z

I finally read the Zstd API documentation, and the API is actually quite reasonable. So I just pushed a commit that replaces Zlib by Zstd for compression and decompression of marshaled data. Compression is clearly faster, so much so that the time overhead of compressing .cmt files is now lost in measurement noise. More measurements are needed, of course.

hhugo · 2023-02-13T19:11:55Z

I finally read the Zstd API documentation, and the API is actually quite reasonable. So I just pushed a commit that replaces Zlib by Zstd for compression and decompression of marshaled data. Compression is clearly faster, so much so that the time overhead of compressing .cmt files is now lost in measurement noise. More measurements are needed, of course.

For jsoo, I think we could support decompression (of zstd) only by relying on https://www.npmjs.com/package/fzstd.
This would allow to read compressed cmi files when using the online toplevel.

kayceesrk · 2023-02-14T05:23:56Z

Testing shows a rare segmentation fault in tests/lib-marshal/intext_par.ml

We're observing the same failure in the CI in #12001 (comment). See https://github.com/ocaml/ocaml/actions/runs/4163285265/jobs/7203489175#step:6:5191.

johnwhitington · 2023-02-14T17:00:49Z

Of course, the first reactions will be to question my choice of Zlib, instead of criticizing the overall design. I chose Zlib because it's available everywhere and I'm somewhat familiar with its API.

Another option is to vendor in to OCaml the MIT-licensed zlib replacement miniz.c https://github.com/richgel999/miniz which is a single C file, so compression would always be available. It has an optimised-for-speed compressor which is faster than zlib's.

(As an side-effect, this would allow CamlZip to be brought into the Standard Library for everyone to use, if that is considered desirable.)

xavierleroy · 2023-02-14T17:35:01Z

Another option is to vendor in to OCaml the MIT-licensed zlib replacement miniz.c https://github.com/richgel999/miniz which is a single C file, so compression would always be available.

Good to know! Thanks for the pointer.

Speaking of license, the file miniz.c is weird: at the top there's a MIT license, and at the bottom it claims it's in the public domain ?

It has an optimised-for-speed compressor which is faster than zlib's.

Do you have an idea of it compares with zlib, speed-wise? I found https://code.google.com/archive/p/miniz/wikis/miniz_performance_comparison_v110.wiki but I cannot make sense of the numbers.

xavierleroy · 2023-02-14T17:36:18Z

Concerning "vendoring" of C libraries inside the OCaml runtime system in general : it can be considered if the library is small enough, e.g. a single .c file and a single .h file are fine. However, there's still the question of maintenance and bug fixing, esp. when security holes are found in the original library.

nojb · 2023-02-14T18:03:38Z

Do you have an idea of it compares with zlib, speed-wise? I found https://code.google.com/archive/p/miniz/wikis/miniz_performance_comparison_v110.wiki but I cannot make sense of the numbers.

I think this page has lost some formatting; the version in https://web.archive.org/web/20150220064400/https://code.google.com/p/miniz/wiki/miniz_performance_comparison_v110 has the original formatting and is more understandable.

johnwhitington · 2023-02-14T19:13:32Z

Another option is to vendor in to OCaml the MIT-licensed zlib replacement miniz.c https://github.com/richgel999/miniz which is a single C file, so compression would always be available.

Good to know! Thanks for the pointer.

Speaking of license, the file miniz.c is weird: at the top there's a MIT license, and at the bottom it claims it's in the public domain ?

That file is not the actual miniz.c. The actual built miniz.c is 7500 lines, not 650, and has only the MIT license at the top; nothing at the bottom. See the releases here: https://github.com/richgel999/miniz/releases

adrien-n · 2023-02-17T10:13:36Z

Concerning "vendoring" of C libraries inside the OCaml runtime system in general : it can be considered if the library is small enough, e.g. a single .c file and a single .h file are fine. However, there's still the question of maintenance and bug fixing, esp. when security holes are found in the original library.

I consider that there are two intertwined questions: a) should this API be supported in a (much) wider scope than the original topic of compressing .cm* files? b) should an underlying C library be fully exposed?

I am in favor of making all of this as small as possible.

Marshal is unsafe. Reading untrusted marshaled data is a known security issue (I expect it can easily lead to code execution). As such, using a C library in that setting does not add any new concern. If this API is instead meant for general use, it adds all the known issues with C code. There is no middle-ground situation where some uses would introduce fewer concerns than others: limiting this to Marshal means no new issue and making it usable for anything else means the whole set of issues right away.

As for vendoring, I'm not sure which benefits it would bring. I don't think there are many places where zlib or another library can be used but cannot be obtained/installed. The main concerns we've had so far are mirage and jsoo but their issue is C code no matter where it is. Vendoring wouldn't help them. Moreover, and if I'm not mistaken, integrating such C libraries as static libraries will also pollute the symbol namespace, making it an additional concern.

By the way, exposing compression APIs is also a lot more work for relatively minor gains. Zstd is probably the compression library that we want the least to expose due to its frequent change and to the size of its API. This is better served by a dedicated library. If there are efficiency gains to be found with a more-integrated approach than what is available today (maybe by chaining libraries in C-land), this is probably better solved by adding the ability to do plumbing and data shuffling in a new way (that's really just an example) because sometime zstd is more appropriate but sometimes it's lz4, sometimes it's xz, sometimes an intermediate compression filter will do wonders (I've had a delta filter make everything ten times better). The list goes on.

NB: if you want to go the zlib route, you might want to look at zlib-ng (after checking that upstream is focused enough on API stability); for an idea of the performance changes, see https://www.megastormsystems.com/news/benchmarks-results-of-zlib-and-zlib-ng-running-on-some-amd-and-intel-cpus

I'm also wondering how this change will impact bootstrapping: without vendoring, the bootstrap compiler cannot support decompression but at the same time, I don't think we'd want to vendor zstd for that purpose because of how big it is (700KB for the .so on my machine). On a related note, zstd is C++ and this can have a noticeable effect on platform support and build requirements.

xavierleroy · 2023-02-17T16:19:16Z

I just pushed a version of this PR where compression is also applied to .cmi files and to the debug info in .cmo and .cma files. All this data compresses well, and the impact on compilation times seems lost in the noise (it will need a better analysis later).

xavierleroy · 2023-02-17T16:27:00Z

Replying to @adrien-n

I am in favor of making all of this as small as possible.

I generally agree! In particular, providing a compression API in the stdlib should be considered separately and later.

I'm also wondering how this change will impact bootstrapping

You really believe I didn't think of that? :-) The bootstrap bytecode executables boot/ocamlc and boot/ocamllex must not contain compressed marshaled data. That's the case in this PR and that's quite easy to ensure : the only marshaled data in these bytecode executables is the data segment (containing structured constants in particular), and I don't plan to apply compression to this data segment, as it is quite small in general.

If we wanted to compress bytecode data segments in the future, the stripdebug tool that is used to prepare the archival versions of boot/ocamlc and boot/ocamllex would then have to uncompress the data segments before archival, which is not particularly difficult.

The other marshaled data in bytecode executables is the debug information, which I have kept uncompressed so far. I now propose to compress debug info in .cmo / .cma files, but it may be preferable to keep bytecode executables 100% portable, including uncompressed debug info.

xavierleroy · 2023-02-17T19:05:22Z

I think this page has lost some formatting; the version in https://web.archive.org/web/20150220064400/https://code.google.com/p/miniz/wiki/miniz_performance_comparison_v110 has the original formatting and is more understandable.

It is, indeed! Thanks! My quick reading is that miniz is slightly faster than zlib, but nothing like zstd (which is not on this page but is 10x faster for similar compression rate in my experience). Also, I'm getting bad vibes from the fact we have to go to archive.org to get readable documentation.

xavierleroy · 2023-02-17T19:07:26Z

I just ~~turned on the fasten seat belt sign~~ marked this PR as ready for review, since I'm quite happy with the current design and overall performance. Thanks for the feedback received so far, and feel free to REVIEW !

gasche

I had a first quick look at the peripheral logic (not the main compression stuff).

gasche · 2023-02-17T20:31:47Z

runtime/caml/intext.h

+       5 and following
+           5 variable-length integers, in VLQ format (1 to 10 bytes each)
+           - length of compressed marshaled data
+           - length of uncompressed marshaled data


Suggestion: length of ..., in bytes.

OK. Done in 507b0a9.

gasche · 2023-02-17T20:36:34Z

runtime/caml/intext.h

      24   size in words when read on a 64-bit platform
   The 3 numbers are 64 bits each, in big endian.
+
+  Header format for the "compressed" model:


is it "10 to 55 bytes"? You give the size for the other formats; no big deal, but it's not completely obvious to compute it from the description.

It is 10 to 55 bytes, indeed. Added as part of 507b0a9.

gasche · 2023-02-18T06:28:10Z

runtime/extern.c

+    len = storevlq(header + pos, s->obj_counter); pos += len;
+    len = storevlq(header + pos, s->size_32); pos += len;
+    len = storevlq(header + pos, s->size_64); pos += len;
+    CAMLassert(len < 64);


I don't understand the assertion here, and I am uneasy about the overflow behavior of the storevlq function.

At this point len is the number of base-128 digits of the last field read. The check len < 64 on just this last length is odd.

At first I thought that maybe you meant to check that pos < MAX_INTEXT_HEADER_SIZE. But then this check would also be odd, because only runs that perform out-of-bound writes would fail the test -- so in particular the compiler may erase it.

I would feel more at ease if the storevlq and readvlq functions took as extra parameter the "end" of the input buffer, and abort if they reach the end while looking for the end of the digit.
(For storevlq this is arguably academic as we can statically bound the number of base-128 digits needed for an uintnat on today's machines.)

The assertion was left over from an earlier design. Currently, it is wrong (it should be about pos, not len) and useless (as you noticed), so I removed it in 754b1a2.

For the foreseeable future, uintnat is at most 64 bits, so storevlq is guaranteed to write at most 10 bytes (since 9*7 < 64 <= 10*7). I don't think we need any more bounds checking.

readvlq can be abused by artificially-padded input 0x80 ... 0x80 <the actual number>. However, the whole unmarshaler can be abused a zillion different ways by malicious input, so what the heck...

Besides, if we were to do bounds checking in intern.c, we'd have checks in read8u first, and then readvlq would automatically be checked.

gasche · 2023-02-18T06:44:37Z

runtime/intern.c

+        CAMLassert (ofs < s->obj_counter);
        CAMLassert (s->intern_obj_table != NULL);
-        v = s->intern_obj_table[s->obj_counter - ofs];
+        v = s->intern_obj_table[ofs];


I reviewed this logic and found it correct, including the assert changes. (But confusing; I wish we could have a pos variable instead of changing the meaning of "offset" in the middle.)

gasche · 2023-02-18T06:46:21Z

runtime/extern.c

+        /* #4056: using absolute references for shared objects improves
+           compressibility. */
+        uintnat d = s->extern_flags & COMPRESSED ? pos : s->obj_counter - pos;
+        extern_shared_reference(s, d);


I have mixed feeling about using conditional logic here. It makes the code more complex, and we lose the property that the only difference between both payloads is compression. Are we sure that it is worth it? (Are the space saving in the uncompressed case substantial enough to justify relative offsets, and the space saving in the compressed case substantial enough to justify absolute offsets?)

As mentioned in 6f0b636, switching to absolute references reduced the size of a big .cmt file (probably camlinternalFormat.cmt) by 15% on top of the savings obtained by zlib compression. I haven't re-done the measurements since switching to zstd. But it's clear that the insight in #4056 is correct and that the compressed format must use absolute references.

On the other hand, you don't change the existing uncompressed formats lightly. Switching to absolute references there would mean new magic numbers, inability to read persistent data marshaled with older OCaml versions, and perhaps some "fun" in the bootstrapping process. Or, if you want backward compatibility, you'd need to support both absolute and relative references, with different magic numbers, and the "conditional logic" you dislike would be there again. I don't even want to know which of relative or absolute references perform better without compression, because switching to absolute references for the uncompressed formats is not on my roadmap.

avsm · 2023-02-20T18:11:15Z

configure.ac

+## ZSTD compression library
+
+AS_IF([test x"$with_zstd" = "xyes"],
+  AS_IF([pkg-config --atleast-version 1.4 libzstd 2>/dev/null],


Just noting that I think this adds a new dependency on pkg-config for the OCaml compiler build, but that I think it's probably fine. Although, at a glance on most common distros zstd is pretty standalone, and so just using AC_SEARCH_LIBS might be sufficient and avoid this dependency.

Right. I'm not fixated on using pkg-config, and the quick-and-dirty test above just fails silently if pkg-config is not available. Moreover, the test will certainly have to be reworked to handle the Windows ports.

Although, at a glance on most common distros zstd is pretty standalone

Linux distros, yes. For macOS, zstd is in bizarre Homebrew or MacPorts locations, and I don't know of any better way to find those locations than pkg-config.

Also, because the zstd streaming API is in constant flux, we need to check for version >= 1.4, version 1.3 (as found in Ubuntu 18.04 LTS for instance) doesn't work. Again, pkg-config makes it trivial to do this check, but AC_SEARCH_LIBS will need a bit more work.

I pushed an improved zstd detection code: it tries pkg-config first, and if it fails it then uses AC_CHECK_LIBS and AC_CHECK_DECL to see if zlib is in the standard location and defines the functions we need. On our CI most of the machines have pkg-config, and those that don't have no zstd library or an outdated version, so the new code is not fully exercised.

xavierleroy · 2023-02-22T18:54:49Z

there might be some low-hanging fruit in '.opam' as well, there is a lot duplicate or unnecessary data

It's quite likely, but this is not relevant to this PR. Can you move this comment to https://github.com/ocaml/opam/ ?

xavierleroy · 2023-02-22T19:07:28Z

I'm still trying to gather performance data. For a build and installation of the core OCaml system (make world.opt && make install):

total build time increases by 1%
the size of the installed lib/ directory decreases by 36%
.cmi files decrease by 58%
.cmt / .cmti files decrease by 64%
.cmo / .cma files decrease by 51% (because of compressed debug info)

I think the benefits of compression are clear, and there are no major downsides so far. So, please REVIEW this PR. Thanks.

hhugo · 2023-02-28T09:55:18Z

(FYI, zstd support (decompression only) has been merged into jsoo)

poechsel · 2023-02-28T11:05:19Z

Some numbers from testing this PR on a substantial subset of Jane Street's code base using Flambda1.

Building while compression artifacts took 2h41min versus 2h37min for a build without compression. These numbers are within noise meaning that there wasn't any noticeable slowdown caused by this PR.

File type	Compressed size	Uncompressed size
cmt	21.5 GB	61.5 GB
cmx	5.2 GB	10.7 GB
cmi	6.4 GB	9.7 GB
cmo	0.6 GB	1.8 GB

xavierleroy · 2023-02-28T16:53:25Z

Thanks for the measurements. I don't understand the second line of the table: .cmx files are not compressed in this PR, and they are generally not that big either.

shindere · 2023-02-28T16:59:00Z

Xavier Leroy (2023/02/25 07:43 -0800):

> The pkg-config package provides a `PKG_CHECK_MODULES` you may want to > use rather invoking `pkg-config` manually. This would actually be > felpful also in other cases where we could take advantage of pkg-config, > I think. Yes, I saw these pkg.m4 macros, but they don't quite do what I need, e.g. I saw no way to check for the version of a package.

I think it's not so much in autoconf's culture to check for versions. They rather encourage to test for features than for versions, actually.

> One other point is about how the `--with-zstd` option is supported. > Currently, if `zstd` is not found you get warning, whether you did > specify `--with-zstd` explicitly or not. > I think this should not be so. If nothing is specified then yes, we are > in best effort mode and it is legitimate that `zstd` not being there > just triggers a warning. > If it has been explicitly requested, though, then the unability of > ocnfigure to find it should, IMO, trigger an error. That's exactly the > behaviour we have for other options. That's reasonable. I tried to implement this kind of 3-way behavior (yes/no/do your best) in commit d46f542 .

I couldn't find that commit but I assume some rebasing / squashing pccurred, meanwhile. I just noticed the `Compute `Config.configuration_variables` on demand` commit: does it really belong here? I remember you being a bit strict recently on another PR and saying this PR should do only what it was aiming for. Indpendently of the ocntents ofthis commit, wouldn't it make sense to submit it independently?

> To further elaborate my comment above, reporting an error (rather than a > warning) when zstd support ismissing whereas it has explicitly been > requested would make it easier to do CI on the feature. It would indeed be > sufficient to add --with-zstd to the flags passed to configure on our CI > script to get a failure as soon as something oges wrong on one platform. Noooo!

(there is no need to shout. Really.)

We want (and have today) a mixture of CI platforms with and without ZSTD, and we want to test autodetection, because that's the default (and most complicated) mode...

It is not at all obvious to me that we want to test *only* autodetection. I'm actually rather of the opinion that it would be good to test both autodetection and requested support. We do have the flexibility in our CI infrastructure to use different configure flags on different platforms, so why not using it to make sure both modes work?

> Finally, and althoughthis is not so much emphasized in the documentation of > the AC_ARG_WITH macro, in theory there is some freedom in the form > it can take, so e.g. --with-zstd=/path/to/zstdpackage is legitimate and it > may be useful to be able to deal with such a call. In other words the > withval variable may contain something else than yes, no, etc. as is > already the case for the --with-flexdll option (although I am not sure how > well we deal with it). Another example that we already have is > target-bindir where the argument has to be a directory. I chose to interpret `--with-zstd=<something other than "yes" or "no">` as "no option given", i.e. "do your best".

You may want to also have special exceptions for "true" and "false".

poechsel · 2023-02-28T16:59:25Z

I was curious about .cmx so I decided to slightly tweaked the PR and collect the corresponding sizes

lthls · 2023-02-28T17:20:08Z

Regarding the size of cmx files, I assume that Jane Street tends to use flambda with -O3, which can produce significantly bigger cmx files. For instance, on my 4.14 opam switches, the closure version of camlinternalFormat.cmx is 8509 bytes while the flambda version is 528565 bytes (the typical ratio seems to be lower, with a few cases where the flambda version is smaller).
Given the numbers reported by @poechsel, I think it would make sense to apply compression on cmx files too, at least on optimisation settings known to generate big cmx files.

shindere · 2023-02-28T17:20:43Z

Xavier Leroy (2023/02/26 06:10 -0800):

> just for fun I pushed a commit that adds compression_supported to ocamlrun -config, on the suggestion of @dra27. Hey, I was adding it in parallel :-) I also added it to `ocamlc -config`, since it's no big deal. My modest improvement (?) to the config.ml contraption is commit 091d18b. There's a big list `Config.configuration_variables` that, currently, is always computed every time the compilers start up, even though it is used only if the `-config` or `-config-variable` options are given, and then it is used once and the compiler exits. My modest improvement is to evaluate this list when needed (call by name).

Just to say, if you want to keep it in this PR, I'm fine.

nojb · 2023-02-28T17:30:34Z

Just to say, if you want to keep it in this PR, I'm fine.

Hello Sébastien, the PR has already been merged :) Cheers!

shindere · 2023-03-01T08:39:41Z

Oops sorry to arrive so late! :)

This is a GC issue that was introduced along with support for compression (#12006). It fixes most of the occurrences of issue #12056.

UnixJunkie · 2023-04-12T01:54:11Z

Just a comment about the interface: it might be nice if users can specify the compression level they want.
level 0: no compression (the default)
1..9: the usual meaning for gzip and friends.

Last time I read about fast compressors/decompressors, lz4 and lzo came up.
Maybe they could be considered as alternative compressors, in case users want marshaling/unmarshalling to still be very fast.

hhugo · 2023-04-12T06:43:53Z

Last time I read about fast compressors/decompressors, lz4 and lzo came up. Maybe they could be considered as alternative compressors, in case users want marshaling/unmarshalling to still be very fast.

The final implementation uses zstd

The Marshal header size was changed in ocaml/ocaml#12006.

- Remove compressed marshaling support from runtime/{extern.c,intern.c} - Remove the `Marshal.Compression` flag - Remove its uses in the compiler runtime/extern.c still accepts the old Compression flag and just ignores it. This avoids a bootstrap at this point. The support will be removed later.

xavierleroy mentioned this pull request Feb 12, 2023

compress cmt/cmti files ocaml/RFCs#23

Closed

dbuenzli reviewed Feb 13, 2023

View reviewed changes

xavierleroy force-pushed the compressed-marshaling branch from afe24a0 to 0d17d23 Compare February 13, 2023 14:03

xavierleroy mentioned this pull request Feb 13, 2023

Optimize representation of Env.t #12003

Open

This was referenced Feb 15, 2023

Hashcons identifier-like strings in Parsetree #12009

Draft

Tweak representation of Location.t to reduce size of build artefacts #12010

Draft

xavierleroy marked this pull request as ready for review February 17, 2023 19:05

gasche reviewed Feb 18, 2023

View reviewed changes

hhugo mentioned this pull request Feb 20, 2023

Support for ocaml/ocaml#12006 ocsigen/js_of_ocaml#1411

Merged

avsm reviewed Feb 20, 2023

View reviewed changes

xavierleroy mentioned this pull request Feb 21, 2023

ocamlc -g1 : generate smaller, less complete bytecode debugging information #11982

Closed

hhugo added a commit to ocsigen/js_of_ocaml that referenced this pull request Feb 27, 2023

Support for ocaml/ocaml#12006 (#1411)

7eb388a

xavierleroy deleted the compressed-marshaling branch February 28, 2023 06:52

jmid mentioned this pull request Feb 28, 2023

Add opam installation instructions ocaml-multicore/multicoretests#301

Merged

damiendoligez mentioned this pull request Mar 1, 2023

Random failure of tests/intext_par.ml #12057

Closed

gasche mentioned this pull request Mar 1, 2023

tracking intext_par.ml failures #12056

Closed

mshinwell mentioned this pull request Mar 1, 2023

Backport #12006 (compression within Marshal) oxcaml/oxcaml#1162

Closed

xavierleroy mentioned this pull request Mar 2, 2023

fix random failure in Marshal.from_{string,bytes} (PR#12056) #12064

Merged

xavierleroy pushed a commit that referenced this pull request Mar 4, 2023

Fix random failure in Marshal.from_{string,bytes} (#12064)

0492b9a

This is a GC issue that was introduced along with support for compression (#12006). It fixes most of the occurrences of issue #12056.

sim642 mentioned this pull request Jun 20, 2023

OCaml 5.1 support hackwaly/ocamlearlybird#48

Closed

sim642 added a commit to sim642/lwt that referenced this pull request Jul 13, 2023

Fix Lwt_io.read_value on OCaml 5.1

3805e89

The Marshal header size was changed in ocaml/ocaml#12006.

sim642 mentioned this pull request Jul 13, 2023

Fix Lwt_io.read_value on OCaml 5.1 ocsigen/lwt#995

Merged

raphael-proust pushed a commit to sim642/lwt that referenced this pull request Jul 17, 2023

Fix Lwt_io.read_value on OCaml 5.1

1462be3

The Marshal header size was changed in ocaml/ocaml#12006.

xavierleroy mentioned this pull request Sep 20, 2023

Linking libzstd statically #12562

Open

This was referenced Oct 30, 2023

Avoid a systematic dependency on the ZSTD compression library #12704

Closed

[5.1] Avoid a systematic dependency on the ZSTD compression library #12705

Closed

xavierleroy mentioned this pull request Nov 12, 2023

Move compressed marshaling away from the stdlib and into the compilers (5.1 version) #12734

Merged

jmid mentioned this pull request Dec 29, 2025

[new package]: shakuhachi: 0.1.0 ocaml/opam-repository#29136

Merged

Conversation

xavierleroy commented Feb 12, 2023

Uh oh!

xavierleroy commented Feb 12, 2023

Uh oh!

xavierleroy commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Feb 12, 2023

Uh oh!

gasche commented Feb 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xavierleroy commented Feb 13, 2023

Uh oh!

hhugo commented Feb 13, 2023

Uh oh!

kayceesrk commented Feb 14, 2023

Uh oh!

johnwhitington commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Feb 14, 2023

Uh oh!

xavierleroy commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojb commented Feb 14, 2023

Uh oh!

johnwhitington commented Feb 14, 2023

Uh oh!

adrien-n commented Feb 17, 2023

Uh oh!

xavierleroy commented Feb 17, 2023

Uh oh!

xavierleroy commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xavierleroy commented Feb 17, 2023

Uh oh!

xavierleroy commented Feb 17, 2023

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xavierleroy Feb 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xavierleroy commented Feb 12, 2023 •

edited

Loading

johnwhitington commented Feb 14, 2023 •

edited

Loading

xavierleroy commented Feb 14, 2023 •

edited

Loading

xavierleroy commented Feb 17, 2023 •

edited

Loading

xavierleroy Feb 18, 2023 •

edited

Loading

xavierleroy Feb 18, 2023 •

edited

Loading

xavierleroy Feb 22, 2023 •

edited

Loading

poechsel commented Feb 28, 2023 •

edited

Loading