Skip to content

Conversation

@regiskuckaertz
Copy link
Contributor

Overview

Add two new built-ins to the base library:

Bytes.zstd.compress : Bytes -> Bytes
Bytes.zstd.decompress : Bytes -> Bytes

To offer basic support for compression using the ZStandard algorithm (http://facebook.github.io/zstd/).

Implementation notes

I added a new dependency to the unison-util-bytes project: https://hackage.haskell.org/package/zstd which is a thin wrapper over the C library.

Interesting/controversial decisions

To keep the same interface as gzipCompress, I'm using the same default compression level as the zstd CLI, i.e. 3, which offers a good compromise between speed/size and memory/cpu consumption.

Test coverage

I saw one tests for the other two compression functions which I blatantly added to, please let me know if there is anything else I missed.

@regiskuckaertz regiskuckaertz force-pushed the rk-zstd branch 2 times, most recently from dcff362 to 26b1abe Compare September 30, 2025 12:22
@pchiusano
Copy link
Member

Cool!

I'd add a Nat parameter to the compression function. We can still offer a convenience function that calls it with a value of 3.

@aryairani is there anything we should be aware of regarding: integrating this into builds or releases?

@aryairani aryairani requested a review from dolio September 30, 2025 14:47
@aryairani
Copy link
Contributor

aryairani commented Sep 30, 2025

@aryairani is there anything we should be aware of regarding: integrating this into builds or releases?

Err, no — thanks for taking a look. I do like getting an external opinion whenever we add builtins as far as whether we want to make that commitment in the runtime vs leave it to pure Unison.

As far as integrating it into builds or releases, the only caveat I can think of that it may not be on Share until we cut the next UCM release, as it's only at those times that we force a "sync" for new builtins.

@regiskuckaertz If you could, go to https://github.com/regiskuckaertz/unison/actions/workflows/update-transcripts.yaml#rgh-run-workflow and run that workflow on your rk-zstd branch. That should fix up the failing tests;

or alternatively if you're still at your dev desk, just run ./scripts/check.sh and then push the .md files that get updated to your branch

Copy link
Contributor

@dolio dolio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look okay to me.

@regiskuckaertz regiskuckaertz force-pushed the rk-zstd branch 2 times, most recently from b05e1b0 to 408e5f4 Compare September 30, 2025 19:49
@pchiusano
Copy link
Member

pchiusano commented Sep 30, 2025

@regiskuckaertz Here's the transcript error in builtins.md

This bytes literal isn't valid syntax: 0xs28badeffffd045861000048656c6c6f20776f726c640a8ca0389a

   27 |           isLeft (zstd.decompress 0xs28badeffffd045861000048656c6c6f20776f726c640a8ca0389a)

I was expecting an even number of hexidecimal characters (one of
0123456789abcdefABCDEF) after the 0xs.

That's a lexer / parser error before getting to typechecking.

You can use stack build && stack exec unison -- transcript unison-src/transcripts/idempotent/builtins.md to run / refresh just a single transcript.

@regiskuckaertz
Copy link
Contributor Author

ah, thank you I'll do that! yeah I fixed the length and also the integer literal, still don't have the muscle memory of typing +

@regiskuckaertz
Copy link
Contributor Author

After looking at the implementation of gzip, I noticed they throw on error so I did the same—had to introduce a custom data type though, maybe best to have the same exception for all 3? Anyway after that, I'm still puzzled by the failing transcript. Is there a way to output the result of an expression in the transcript? I tried Debug.trace but no luck.

@aryairani
Copy link
Contributor

Hey @regiskuckaertz, I'm taking a look at it currently. My approach to debugging it was to just work on it interactively:

$ stack exec -- unison --create-codebase zstd

Then

scratch/main> builtins.mergeio lib.builtin

^ cryptic, but this installs all the builtins including your new one into lib.builtin.

Then in my scratch.u file, I narrowed the failure down to

isLeft = cases
    Left _  -> true
    Right _ -> false

> isLeft (zstd.decompress 0xs28badffffd045861000048656c6c6f20776f726c640a8ca0389a)
> zstd.decompress 0xs28badffffd045861000048656c6c6f20776f726c640a8ca0389a
    5 | > isLeft (zstd.decompress 0xs28badffffd045861000048656c6c6f20776f726c640a8ca0389a)
          ⧩
          false
  
    6 | > zstd.decompress 0xs28badffffd045861000048656c6c6f20776f726c640a8ca0389a
          ⧩
          Right 0xs

So I guess decompress isn't failing for some reason, it's just returning an empty Bytes.

btw I'm @aryairani on our discord if you wanna try to get me more interactively, but I'll try to keep an eye out here also.

@aryairani
Copy link
Contributor

aryairani commented Oct 2, 2025

I'm not really sure what's happening here, but I do get Skip when I try to Zstd.decompress that String.

$ stack ghci
ghci> :set -package base16-bytestring
ghci> import Data.ByteString.Base16 qualified as Base16
ghci> import Data.ByteString.Char8 qualified as BS
ghci> import Codec.Compression.Zstd qualified as Zstd
ghci> Zstd.decompress <$> Base16.decode (BS.pack "28badffffd045861000048656c6c6f20776f726c640a8ca0389a")
Right Skip

and then zstdDecompress turns a Skip into empty Bytes. 😬

@aryairani
Copy link
Contributor

I'm even more confused now, because this change didn't seem to have any effect:

    getOrThrow Zstd.Skip = throw $ ZstdDecompressException "Zstd.decompress returned Skip"

I'm still seeing the empty Bytes.

@regiskuckaertz
Copy link
Contributor Author

hi @aryairani many thanks for your help, I should have thought about going straight to Haskell 🤦 but also the way of loading the unison codebase with the base library is super valuable. Leave this with me, I must be able to fashion a payload that will produce an error. The weird thing is I tried with the CLI first, maybe I did introduce a skipped frame without knowing.

@regiskuckaertz
Copy link
Contributor Author

(I can't work on this today, but tomorrow I'll keep on investigating)

@aryairani

This comment was marked as outdated.

@aryairani

This comment was marked as outdated.

@aryairani
Copy link
Contributor

@regiskuckaertz I went ahead and deleted the failure test, I'm not sure if it's worth the trouble to force having one. The round-trip tests work, so I'm willing to merge this as-is now.

Let me know what you think.

@regiskuckaertz
Copy link
Contributor Author

Fixed. The Haskell implementation only produces an error when the size in the header is too large or there are leftovers after decompression. Skippable frames are supposed to start with 0x184D2A5?though, so the current behaviour is clearly wrong :-/

@regiskuckaertz
Copy link
Contributor Author

@aryairani all done Sir, thanks heaps for your help on this! Do I need to add documentation somewhere? iiuc, it is generated from an already published version on the library.

@pchiusano pchiusano merged commit 2a759f3 into unisonweb:trunk Oct 3, 2025
1 check passed
@regiskuckaertz regiskuckaertz deleted the rk-zstd branch October 19, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants