Skip to content

Conversation

@runarorama
Copy link
Contributor

@runarorama runarorama commented Aug 13, 2025

Overview

This adds builtin little-endian equivalents for all the big-endian read operations on mutable and immutable byte arrays.

Implementation notes

I changed the way these reads are implemented. Previously, e.g. the 64-bit read would be eight one-byte reads, assembled into a Nat with shifts. The new implementation is a single 64-bit read. It takes into account the native endianness of the machine we're running on and re-orders the bytes if necessary, using a single additional primop.

Interesting/controversial decisions

The new implementation of the read/index functions use some GHC internals, so this code is very much not portable.

Test coverage

Every combination of reads and writes is tested in builtins.md.

Dependencies

Depends on #5832 but probably doesn't have to.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds little-endian equivalents for all the big-endian read operations on mutable and immutable byte arrays. It introduces more efficient implementations using direct memory operations instead of byte-by-byte assembly, along with support for pinned byte arrays and buffer-based I/O operations.

Key changes include:

  • Added little-endian read/write functions for 16, 24, 32, 40, and 64-bit values
  • Introduced PinnedByteArray type and related operations
  • Implemented buffer-based socket send/receive functions
  • Enhanced I/O operations to work with pinned buffers

Reviewed Changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
unison-src/transcripts/idempotent/*.md Updated builtin counts to reflect new functions
unison-src/transcripts-using-base/net.md Added buffer-based networking examples
unison-runtime/src/Unison/Runtime/Foreign/Function/Type.hs Added new foreign function types for little-endian operations
unison-runtime/src/Unison/Runtime/Foreign/Function.hs Implemented little-endian read/write operations using GHC primitives
unison-runtime/src/Unison/Runtime/Builtin.hs Declared new foreign functions
unison-core/src/Unison/Type.hs Added PinnedByteArray type reference
parser-typechecker/src/Unison/KindInference/Generate.hs Updated constraint tree for new type
parser-typechecker/src/Unison/Builtin.hs Added type signatures for new operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

withMutableByteArrayContents :: (PA.PrimBase m) => PA.MutableByteArray (PA.PrimState m) -> (Ptr Word8 -> m a) -> m a
{-# INLINE withMutableByteArrayContents #-}
withMutableByteArrayContents mba =
PA.keepAlive (PA.mutableByteArrayContents mba)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't understand what's going on here. It doesn't look like it should even type check. Is Github showing me the wrong diff or something?

Copy link
Contributor

@dolio dolio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my confusion mentioned above, this looks good. The thing with the native endianness is nice; I didn't know that information was exposed by GHC.

@runarorama
Copy link
Contributor Author

Other than my confusion mentioned above, this looks good. The thing with the native endianness is nice; I didn't know that information was exposed by GHC.

Looks well-typed to me given the type of keepAlive in Prim 8.0.0

@dolio
Copy link
Contributor

dolio commented Aug 15, 2025

Oh, that explains it. They changed the type of keepAlive in 0.9 and I was looking at the latest docs.

I guess I have one more concern about this, though.

You are calling keepAlive on the Ptr. I think that might not actually keep the memory alive. A Ptr isn't a GC root, because it might have been created with malloc or something that doesn't participate in GC. So keeping the pointer alive, I think, doesn't prevent a GC-able byte array from being collected.

So, I think you might need to be calling keepAlive on the byte array, then within the block getting the byte array contents. That ensures that the byte array part doesn't get collected, causing the pointer to become invalid. This is similar to what they do to implement your function in the 0.9 series:

https://hackage-content.haskell.org/package/primitive-0.9.1.0/docs/src/Data.Primitive.ByteArray.html#withMutableByteArrayContents

I guess there's also caveats there about the boxed keepAlive. Maybe we should just see if we can upgrade primitive to 0.9.1.0, overriding stackage, and use the function from there?

@dolio dolio self-requested a review August 15, 2025 18:49
@runarorama runarorama merged commit 791e98b into trunk Aug 15, 2025
31 checks passed
@runarorama runarorama deleted the runarorama/littleEndianByteArray branch August 15, 2025 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants