Skip to content

mirage-entropy functorized#1009

Closed
dinosaure wants to merge 5 commits intomasterfrom
mep
Closed

mirage-entropy functorized#1009
dinosaure wants to merge 5 commits intomasterfrom
mep

Conversation

@dinosaure
Copy link
Copy Markdown
Member

@dinosaure dinosaure commented Oct 23, 2019

Hi @mirage/core!

This PR is one patch to paving the way for the dunification of MirageOS. But, at least, it rollbacks the MirageOS eco-system to something more homogeneous. Indeed, the choice of MirageOS is to take the advantage of functors to be able to choose an implementations of an abstraction - like choose irmin or crunch as a read-only KV-store.

functoria serves this purpose and orchestrates application of functors according definitions.

The black duck

However, a part of the MirageOS decided to choose an other way to specialize an implementation from another by a non-conventional way: mirage-entropy with mirage-os-shim. These packages use the linking-trick:

  • mirage-entropy expects at the link time a CMX which follows CMI provided by mirage-os-shim
  • nocrypto.mirage and mirage-stdlib-random are done on top of that
  • a dummy implementation exists to ensure that this CMX is not missed when we configure an unikernel - to avoid any struggle at the build-time

According discussion on mirage/mirage-os-shim#8, the goal of this PR is to delete this non-explicit dependency path of the MirageOS eco-system and provide a new way to have a random generator implementation.

The solution

This new way is, of course, the functor way. Random generator must initializes an entropy engine at the beginning of the process - so, just before start. mirage-entropy trusts on a CMI to have an access to at_enter_iter and fill a state used by the entropy engine.

However, as I said, implementation of this CMI provided by mirage-os-shim does not exist - not yet. Then, mirage-os-shim provided a specialization of implementations of targets (eg. mirage-solo5, mirage-xen & mirage-unix) under the same CMI.

At the end, mirage-entropy will be linked with the right specialization/implementation of mirage-os-shim (with an help of META file/ocamlfind - hint dunification) and will be have an access to at_enter_iter. However, functoria is not able to follow this kind of dependency - or precisely not conventionally.

The choice was made to provide a new interface Mirage_main.S (see mirage/mirage-os-shim#9) into the mirage-os-shim package and only an interface - so specialization of targets under this interface should not exist anymore.

Then, by this way, we are able to reverse the dependency graph between targets and mirage-os-shim. targets must depend on mirage-os-shim and implement this interface - the choice was made to provide a new sub-package main:

mirage-entropy will be functorized over this interface (mirage/mirage-entropy#44) and, by cascading, mirage-random-stdlib will be functorized too (mirage/mirage-random-stdlib#2). nocrypto.mirage is not needed anymore when a new package mirage-entropy-nocrypto follows the same implementation but, again, in a functorized way.

Random generators expect now an implementation which provides entry-points of the target - see Mirage_impl_entry_points. Of course, any random occurrence must be updated to, at least, choose the default random implementation (eg. mirage-random-stdlib) with the right default target.

Finally, to ensure the existence of the CMX at the link time when we currently compile an unikernel, a dummy implementation was made to track this dependency globally and ensure that CMX of the target to provide at_enter_iter exists at the link time.

Because we follow in this PR an usual pattern (with functoria and dependencies with package), this dummy implementation is not needed anymore - I think. But I don't know real implication of this deletion.

Status of this PR

At first, this PR wants to show what is going on currently and what we will do to start proper discussion of all of that when, even if it concerns few packages, update seems deep enough to avoid to click with blinded eyes.

Secondly, it breaks compatibility specially about random, so we need to take care across several others packages and still have a consistent eco-system. pins was not done, versions should be cleaned and we should start a release plan.

Third, entry_points/mirage-os-shim/*.main use bad names or non-explicit purposes names and we should find a consensus to be clear about all of that to avoid any mis-conception/mis-comprehension.

functor design, linking-trick and virtual-library

As I said into mirage/mirage-os-shim#8, virtual_libraries provided by dune is interesting for us when mirage should be only an interface provider. However, the goal, at this time, is to dunify MirageOS - so I prefer to post-pone this goal.

Design of mirage-entropy/mirage-os-shim is not bad and is an example of how to use virtual_libraries outside the scope of dune - and mirage/mirage-os-shim#8 provides a compatible way across packages of this idea independently of the build-system. However, this patch highlight how linking-trick (at the end, or more generally we can talk about the linking-trick) is fragile specially with functoria (which was not developed with pattern in our mind).

It shows that, even if we can find good aspects of it, it's hard to follow what is really going on when I spend finally 2 weeks to really understand the dependency graph of all with one of my smallest unikernel entropieur.

I think, we should stay far from all of that at the beginning and start a full documentation of all and may be a migration to this kind of design then, but I consider it as orthogonal to the dunification of MirageOS first.

Name of this implementation is not clear but it wants to abstract
some entry-points to underlying target (eg. unix, xen, solo5).
_entry-points_ implements:
- `at_enter`
- `at_exit`

It lets end-user to register functions at the beginning and at the
end of the MirageOS process. It is currently used to initialize
an entropy engine (eg. nocrypto).
an abstraction of entry-points of the underlying target.

This specific patch wants to provide a more conventional way (a
_functor_ way) to generate random generator according an entropy
engine which should register something at the beginning of the
procress. This is why random implementations expect an
entry-points abstraction.

This patch removes a nocrypto implementation which is a dummy
implementation to check if, at the link time, `Entropy` module is
correctly available - and avoid a linking error.
For each implementation which wants a random implementation, we add a
new optional argument `entry_points` to specify which entry-points
implementation you want to use to generate a random generator.

Then, this optional argument is used to produce, by default, the default
random generator described here: Mirage_impl_random.default_random but
can still be specified by the end-user. In this last case, optional
argument entry_points does not have any others impacts.
This last patch removes nocrypto implementation (again, a dummy
implementation to check if an entropy module is available) from conduit
and conduit-connector.

Implication of this patch is not clear but highlights the weird
dependency path of the entropy engine at the end because mirage-entropy
was on top of the linking-trick instead the _functor_ way.
@dinosaure
Copy link
Copy Markdown
Member Author

/cc @Drup

@samoht
Copy link
Copy Markdown
Member

samoht commented Oct 23, 2019

Hi there, thanks for starting that discussion.

There are various things going on there: first mirage-os-shim has a weird position of trying to do things differently from the rest of the packages (and failing to compose properly with the rest of the ecosystem). I do think that this package has to disappear as it doesn't fit properly with the rest of the ecosystem so I am welcoming any change going in that direction.

However, let's step back a little: as far as I can see, mirage-os-shim is used only by mirage-entropy which is calling only the function at_enter_iter. I am fairly convinced that we can remove that dependency with a less intrusive patch :-) Also I strongly suggest we do not try to functorise over the scheduler implementation as I don't expect to see any program running with two different schedulers (especially after #1004). So we have two ways left:

  • first, we can expose a Mirage_main function, which exposes only the scheduler hooks that Lwt_main is exposing (run, at_enter and at_exit). -- so using using the linking trick here should work and be legitimate. For names, Mirage_main is kind of similar to Lwt_main so it's ok, but Scheduler could also work (as it is exactly what this module is: hooks to interact with the thread scheduler).

  • an other option (I am still not totally sure if it's a good idea) would just be to pass the scheduler around as a first class value. The only used function so far is at_enter_iter so that should be fairly easy to construct and pass around.

As usual, let me know what you think :-)

@dinosaure
Copy link
Copy Markdown
Member Author

dinosaure commented Oct 23, 2019

first, we can expose a Mirage_main function, which exposes only the scheduler hooks that Lwt_main is exposing (run, at_enter and at_exit). -- so using using the linking trick here should work and be legitimate. For names, Mirage_main is kind of similar to Lwt_main so it's ok, but Scheduler could also work (as it is exactly what this module is: hooks to interact with the thread scheduler).

This is the current solution used by mirage-entropy and mirage/mirage-os-shim#8 still continues to use it but provides a compatible solution with dune (eg. mirage-unix) and a plan to dunify MirageOS - and, by this way, avoids orchestration by ocamlfind.

As @hannesm said, if we look into the MirageOS eco-system, this solution does not really fit smoothly when a hack with functoria is needed (eg. deletion on mirage_impl_random.ml) and it's hard to really understand what is going on, like:

  • where is mirage_os.cmx? (mirage-os-shim)
  • who provides it? (mirage-solo5, mirage-unix or mirage-xen)
  • what is the step to compile mirage_os.cmx with entropy.cmx? (-no-keeps-locs and ocamlfind)

So we can document it but the hack with functoria (which concerns ocaml-conduit at the end) still is needed. The hack is like: add a dependency between nocrypto and nocrypto-random (in terms of functoria) as dependency like application of functor Nocrypto_random (Nocrypto) but don't try to emit this application because I will do that myself my way.

On the other hand, as I said, playing with the linking-trick currently - with the current state of the MirageOS eco-system - is not so good when we have nothing (or, at least, ocamlfind) to orchestrate implementations under an interface. digestif is an example of that when a _tags is needed at the end to choose the right implementation - and it's not convenient to see a linking error otherwise.

But, as I said too, dune tries to solve that and if we do the migration to dune, see the usability of virtual_libraries and variants into the MirageOS eco-system should definitely be a proper and a good solution. But for this to work, we must compile a unikernel with dune first.

But at this stage, I would like to point another incompatibility: if we use virtual_library in dune, we don't have a way to trick or emit an other META file. mirage-os-shim and mirage-entropy still work because they continue to use topkg. Into details, a move to dune should be fine for mirage-entropy but a move of mirage-os-shim to dune will be more tricky (orchestration, if we look into mirage/mirage-os-shim#8, will be done by the mirage tool - but again, as the _tags file for digestif, it's a hands-written orchestration).

an other option (I am still not totally sure if it's a good idea) would just be to pass the scheduler around as a first class value. The only used function so far is at_enter_iter so that should be fairly easy to construct and pass around.

I'm more convinced by this solution when we can generate a main.ml with the right at_enter_iter and pass it by values however. This solution will completely remove mirage-os-shim and Mirage_random.S.initialize will take an at_enter_iter as an argument.

It still is a special case but as you said - and I agree with this - "we do not try to functorise over the scheduler implementation", The patch should be applied only into mirage-random-stdlib, mirage-entropy and mirage-entropy-nocrypto in the end, I think.

@Drup
Copy link
Copy Markdown
Member

Drup commented Oct 24, 2019

I'm very much in favor of removing the horrible hack around nocrypto/random/entroy in the mirage tool. It's very non-uniform with the rest and the fact that it works to begin with is an implementation accident.

I do not have much of an opinion on how to remove it, so I'll let you guys decide on that.

Regarding the name: when @dinosaure first showed me what he was doing, the name Mirage_main confused the hell out of me, because as far as I'm concerned, it already exists, it's this guy, and it makes no sense to functorize over that. So, let's pick another name.

I do like Scheduler, and @samoht idea to just provide the scheduler as first class value kinda make sense, notably since it already sort-of-happen as part of functoria: bind and return are part of the ambient scope, it would be an extension/formalization of that.

@hannesm
Copy link
Copy Markdown
Member

hannesm commented Oct 24, 2019

  • TL;DR: let's leave mirage-os-shim alone and start a mirage-scheduler interface (with the four respective hooks at_enter_iter/at_exit_iter/at_enter/at_exit), release that, adapt mirage-entropy and the mirage utility, and get it out!

first of all: thanks @dinosaure, this is IMHO the way to go forward. some nitpicks:

  • I like the name scheduler better than main as well (and now wondering whether Mirage_time -- providing the single sleep_ns function should be part of it? -- furthermore I had at various points the need for a cron like scheduling, i.e. Lwt_engine.on_time for MirageOS -- which would fit into a scheduler as well!) -- if we all agree on scheduler, this should live in its own repository (as our other module types)
  • the run -- I'm no longer sure whether we should expose this or not, in the end we don't expect a user to care about it (neither call it ever)
  • about functors vs first-class modules / linking trick: I like homogenous environments, functors are used everywhere in MirageOS, let's continue with that. first-class modules / linking trick comes out of nowhere and is hard to explain that I have a Scheduler.on_enter binding in my unikernel.ml -- this is esp. important if you think about tooling for developing unikernels (e.g. merlin) -- i don't quite understand how the first-class module approach would scale (for other devices -- which ones to have as first class modules, which ones as functor arguments) -- am happy to be convinced otherwise
  • it's imho fine to have run/bind/map special (using whatever trick) since a user is not confronted with this - not supposed to call it. let's reduce the tricks we need in mirage/functoria and improve the tooling ;)
  • while toying around with noconfig (rough prototype, but already working on lots of mirage-skeleton examples), I'm more and more convinced that config.ml contains mainly redundant information (that can be derived from unikernel.ml), and we can improve usability (developer experience, merlin integration) quite a bit by even further using functors in unikernel.ml -- i.e. is there a reason why boot arguments and app_info are not done via functor arguments? -- both are atm rather unfriendly in my emacs since key_gen is generated and merlin won't know much about it -- but that's something to discuss elsewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants