Skip to content

Conversation

@dolio
Copy link
Contributor

@dolio dolio commented Nov 18, 2025

This PR implements FFI to dynamically linked libraries. At the moment, it's a bit limited as far as types are concerned, but the basic functionality works, and types aren't difficult to add (I'll continue to do so after this writeup).

The functionality is provided by several new builtins.

  • The DLL type represents a handle to a dynamic library
  • openDLL : Text ->{IO, Exception} DLL allows opening a dynamic library by file path
  • Signatures for DLL functions are built using the FFI.Type and FFI.Spec types. These have a parameter giving the unison type they correspond to.
  • int64 : Type Int and uint64 : Type Nat are base cases
  • base : Type a -> Type r -> Spec (a -> r) is the base case of a specification of a pure function
  • baseIO : Type a -> Type r -> Spec (a ->{IO} r) is the base case of an effectful function
  • arr : Type a -> Spec b -> Spec (a -> b) allows adjoining more arguments onto a specification
  • getDLLSym : DLL -> Text -> Spec a ->{IO, Exception} a imports the function with the given symbol name in the DLL, declaring that its signature matches the specification.

Using this, I have successfully called the following C function from a DLL:

int64_t test(int64_t m, int64_t n) {
  return 1 + m + n;
}

The implementation uses libffi. This is a library for dynamically arranging for calls into libraries using C calling conventions. Apparently this is used by GHC in some capacity, and the Haskell library just uses whatever version of the library is bundled with GHC. I didn't use the high level API, which involves some redundancy and use of lists. Instead, I wrote my own calls directly into the low level stuff the library imports, with some lower level copying to/from our stacks and such.

As you can see, right now I only have support for 64-bit signed and unsigned integers. But adding more C types isn't difficult. One thing I'd like some thoughts on is how smaller values should be handled. E.G. should there be uint32 : Type Nat that just automatically chops/promotes between 32 and 64 bits? That's the best we can do right now, I think, because we only have 64-bit types in unison.

Another thing I'll need help with once this is cleaned up a bit more is testing on other platforms. I can only really test on Linux, so we need people to try Mac and Windows. I wrote a wrapper library to provide a common API on top of the DLL loading differences between Unix and Windows, but I haven't actually built the Windows version yet.

The signature specifiers are obviously not great to use in their form here. But it should be pretty easy to write a parser that takes a nicer looking signature and produces the right terms.

@aryairani
Copy link
Contributor

Awesome! I saw you mentioned you could use help with manual testing, but can you discuss a bit what automated testing could look like? We have access to many platforms via Github runners so that's where my head is at.

@dolio
Copy link
Contributor Author

dolio commented Nov 18, 2025

CI will at least make sure that my library builds on Windows.

Transcripts that import DLLs could also be written, of course. But that requires a DLL. I guess we could also build one of those in CI, but I'd have to do some reading on how to get that done.

Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the API. One thing about it though is that the types don't prevent you from loading a value, getting back a pure function, and then close over that and hopping to another node and attempt to call it. What happens when you attempt to serialize one of these FFI references?

One option is it bombs - basically, "don't do that". I think this could be fine.

Or could these function references serialize just fine and work on the other end as long as the other end has the "same" symbol loaded? If achievable, the sandbox-checking should definitely treat all foreign functions as tracked - you do not want to be able to load a library locally, then hop to the cloud cluster and call the same C functions if the same C libraries happen to be installed in the same place.

Definitely adding more types is good - I'd be interested in supporting arrays of various types, at least as an input type. Could be as a follow up to this PR though.

@dolio
Copy link
Contributor Author

dolio commented Nov 19, 2025

It's just going to bomb if you try to send one of these to another machine.

I don't really see how you could support that with any kind of assurance. DLLs are just files, so even if you sent the information used to load from the DLL on one machine, there's no guarantee that the file with the same name is actually the same library, or the same version of it with identical calling conventions, etc. And if you send something from Windows to Linux, the file names for 'the same' library are probably different.

I would say that if you want to send code around, you send code that uses some API that it gets passed in (possibly via abilities), and that API is implemented locally by foreign functions. The code you send around is portable, but the local implementation of the API isn't. Sending around DLL references that somehow adapt to the local system doesn't seem like a realistic expectation.

@aryairani
Copy link
Contributor

Just brainstorming: we could have a Github job that installs the current build of ucm, installs whatever binary libs, whether it's zlib, libcurl, libmysqlclient, whatever we feel is going to exercise the FFI well, and then run Unison programs against them.

@aryairani
Copy link
Contributor

Or could these function references serialize just fine and work on the other end as long as the other end has the "same" symbol loaded? If achievable, the sandbox-checking should definitely treat all foreign functions as tracked - you do not want to be able to load a library locally, then hop to the cloud cluster and call the same C functions if the same C libraries happen to be installed in the same place.

Could a library bundle some kind of library-specific test to determine if the compatible function exists on the other side?

@pchiusano
Copy link
Member

I would say that if you want to send code around, you send code that uses some API that it gets passed in (possibly via abilities), and that API is implemented locally by foreign functions. The code you send around is portable, but the local implementation of the API isn't. Sending around DLL references that somehow adapt to the local system doesn't seem like a realistic expectation.

I'm pretty sold by this argument. Just pass around functions that use an ability, much more portable.

So I think bombing is fine during serialization, though I'd like it be with a meaningful error that tells the user what they did wrong.

@pchiusano
Copy link
Member

Okay, decision here: we're not going to do anything fancy for transferring FFI pointers over the network, will just bomb with an error. In userland, you could build something which is more portable (using abilities, say).

@dolio is going to get CI fixed up and at least doing smoke tests on Linux / Mac / Windows. @aryairani can advise on this.

Other types like additional primitives and arrays will likely be a follow-up PR.

@dolio dolio requested a review from a team as a code owner November 20, 2025 19:49
@ChrisPenner
Copy link
Member

I wonder if there's any reasonable way we can use the type-system (or some other compiler mechanism) to tag a block as being unable to be serialized.

There's already a good amount of frustration with things like Debug.trace hiding inside pure expressions which prevents them from being evaluated in Share docs;
I could see similar frustrations where you're writing normal unison code, but one of your calls into a library uses FFI and all of a sudden your whole app bricks because now you can't send it over Remote or serialize that continuation like you were before, and you don't find out until runtime.

We have the workaround that you can just express your ffi stuff as an ability and run that ffi outside of where that serialization happens, but I think it'd be a lot more user friendly if we could enforce or hint at that pattern so newer unison users don't get surprised or bitten by this.

I can imagine a system ability like IO, e.g. {Unserializable} which can't be handled except in main;

Not sure exactly how that'd work, and may not need to be in the first iteration, but I think if we're breaking Unison's promise that you can serialize anything that it's good to think of the UX.

@pchiusano
Copy link
Member

pchiusano commented Nov 20, 2025

I wonder if there's any reasonable way we can use the type-system (or some other compiler mechanism) to tag a block as being unable to be serialized.

It's definitely possible but I do not want to do that in this PR. :) I really want this PR and the core FFI functionality to be low level and unsafe and as out of the way as possible.

Adding fancy types could instead be done in various ways as separate userland layers. Like if you're planning to do distributed stuff, I think it's a reasonable thing to pass around functions that use an ability rather than direct pointers to the foreign functions. But again, I don't want to get fancy with this PR or the basic FFI primitives.

@dolio
Copy link
Contributor Author

dolio commented Nov 20, 2025

Also, it's not clear to me that abilities are able to track this. The problem with serialization is that any reference to the value in the closure will fail, not just if you apply the function.

I suppose you could arrange to fail remotely, by serializing something and only failing if you try to apply the function remotely. Then perhaps you could argue that an annotation on the imported function is showing you that its effects won't be supported remotely. But I'm not sure how good that is, either.

dolio and others added 2 commits November 21, 2025 12:51
- Added support for void types, both for unit results and nullary functions.
- Exposed doubles in the API
- Improved errors from FFI failures
@dolio
Copy link
Contributor Author

dolio commented Nov 21, 2025

BTW, it looks like Runar added pinned arrays as a separate type, so they should be relatively easy to add as FFI arguments without people having to worry about whether it's safe to pass. It requires a little tweaking of the implementation (not the interface), though. So I'll leave it to a future PR.

@dolio
Copy link
Contributor Author

dolio commented Nov 21, 2025

I think this is ready to go, unless anyone has objections.

@aryairani
Copy link
Contributor

Yeah looks good to me.

@pchiusano pchiusano merged commit 29e1d3a into trunk Nov 21, 2025
32 checks passed
@pchiusano pchiusano deleted the topic/actual-ffi branch November 21, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants