ukvm: Refactor monitor to support multi-{arch, backend}#171
ukvm: Refactor monitor to support multi-{arch, backend}#171mato merged 11 commits intoSolo5:masterfrom
Conversation
This change refactors the ukvm monitor to support multiple processor
architectures and hypervisor backends. It also includes preparatory work to
allow for using monitor code as a library in the future.
Changes visible to ukvm guests:
1. Guest-side APIs are now defined in ukvm_guest.h. ukvm.h has been
repurposed for monitor-side use.
2. A new, architecture-independent API for invoking hypercalls is
introduced: ukvm_do_hypercall(int UKVM_HYPERCALL_nnn, void *arg).
Internal changes:
General conventions:
1. Monitor code should use ukvm_gpa_t to represent guest physical
addresses.
2. Use of macros is discouraged as they are not type-safe.
GUEST_CHECKED_PADDR() has been rewritten appropriately as
UKVM_CHECKED_GPA_P(), the macro is used only to provide __FILE__ and
__LINE__.
Monitor source files have been reorganised as follows:
ukvm.h: Monitor APIs. Arch-independent monitor code does not need to
include any other header files.
ukvm_main.c: Main program. Arch and backend independent.
ukvm_core.c: Core functions. Maintains tables of modules, hypercall
handlers and vmexit handlers. Implements core hypercalls (polling and
console) which are always present. Arch and backend independent.
ukvm_elf.c: ELF loader. Arch and backend independent, will be extended
internally to support future archs as this is just a set of #ifdefs
around hdr.e_machine.
ukvm_cpu_{arch}.[ch]: Common arch-dependent code for backend
implementations.
ukvm_hv_{backend}.[ch]: Arch-independent part of backend implementation.
ukvm_hv_{backend}_{arch}.[ch]: Arch-dependent part of backend
implementation.
ukvm_module_*.c: Module implementations.
Build system changes:
ukvm-configure now accepts a UKVM_CC environment variable specifying the
compiler to use. This is intended to support cross-compilation in the
future.
To ensure that all monitor code and modules build, there is now an
ukvm/Makefile which builds a ukvm-bin with all possible modules configured.
Use as a library:
The code has been cleaned up to only export symbols prefixed with "ukvm_".
Currently all monitor APIs use the "fail fast" principle for error
handling. For production use as a library this will need to be changed to
report (but *not* handle) errors to the top-level caller; this will be done
as a separate change later.
There was a problem hiding this comment.
Hi Mato, looks good, just a couple of questions:
ukvm_elf_loadshould be different depending on the architecture, or at least the checks like ukvm_elf.c:124 should change. Maybe this should be abstracted as well?ukvm-binand the unikernelxxx.ukvmwill be named the same irrespective of the architecutre and platform. Wouldn't it be better to generate something likeukvm-bin-x86_64-linuxandhello.x86_64-linux.ukvm?- Cross-compilation or scenarios like running an i686 guest on a x86_64 host (includeos style) are performed by picking a specific UKVM_CC right?. I really think that naming the binary based on the arch might avoid surprises.
|
On Thursday, 23.03.2017 at 06:45, Ricardo Koller wrote:
ricarkol commented on this pull request.
Hi Mato, looks good, just a couple of questions:
1. `ukvm_elf_load` should be different depending on the architecture, or at least the checks like ukvm_elf.c:124 should change. Maybe this should be abstracted as well?
There's nothing to abstract, it just needs a set of `#ifdef`s around that
line.
2. `ukvm-bin` and the unikernel `xxx.ukvm` will be named the same irrespective of the architecutre and platform. Wouldn't it be better to generate something like `ukvm-bin-x86_64-linux` and `hello.x86_64-linux.ukvm`?
Why?
3. always using the host architecture and platform as the target might be too strict (in ukvm-configure). What about cross-compilation or scenarios like running an i686 guest on a x86_64 host (includeos style)?
Set `$UKVM_CC` when invoking `ukvm-configure` if you want to cross-compile.
|
|
Thanks mato. About the naming of About point 1., the elf structures might have to be changed to 32 bits (if 32 bits): |
|
Hello Mato: On the topic of naming, I'm wondering whether it would make sense to rename ukvm to something more generic, as it is no longer KVM-specific. It sounds a little confusing to have a ukvm monitor for different backends, such as Hypervisor.Framework, libvmmapi and others. Seeing as it's a monitor for unikernels, perhaps something like |
|
On 2017-03-23 18:19, Domagoj Stolfa wrote:
On the topic of naming, I'm wondering whether it would make sense to
rename ukvm to something more generic, as it is no longer
KVM-specific.
Yes. could you open a separate issue so that this discussion stays
focused on the technical multi-{arch,backend} work and start collecting
suggestions?
Note that any renaming of ukvm will happen separately once the
multi-{arch,backend} work has landed on master, so it's some time away.
Also, renaming ukvm has an impact on downstreams (MirageOS) and whatever
name chosen should take into account use in identifiers (library names,
symbol names, MirageOS target, ...).
(You probably want to copy the above paragraph the issue you create)
|
That is part of a separate high-level discussion we need to have about "when/how things get built"; let's do that after this has landed. For the purposes of this change I'm assuming that a normal jane user of ukvm-configure gets a ukvm-bin suitable for the host they're building on and nothing more. UKVM_CC is there mainly for CI purposes right now.
Yes. This should be possible with judicious use of #defines and fixing the loader to use the elf.h types not its own. |
|
Sounds good @mato, we can leave the renaming for later. BTW, based on https://github.com/Weichen81/solo5/blob/arm64/ukvm/arm64/ukvm-setup.c#L174, it seems that setting the guest memory might also be arch/platform specific. Although what's done there for ARM could be done for all architectures. |
I saw that. Depending on arch-specific requirements we might be able to get away with a simpler arrangement where we carve out the MMIO "slot" (if any) at the top of memory rather than below the guest image. This would mean we wouldn't have to play games with as many multiple slots on the backend side. Also, the top-level interface to how you get your memory will change a bit to accomodate bhyve/vmmapi, give me a couple more days to get that in and we'll go from there. |
|
Sorry for replying later. I have read the changes. And I think the host side refactor can satisfy most of what we have done for ARM64. As @ricarkol commented above, setting the guest memory for ARM64 is very different from x86_64. Can we introduce a file like ukvm_mem_{arch}.c to make memory setting become arch/platform specific?
@mato Yes, in current stage, Solo5 is running in kernel mode, we can't simply make it become arch independent. We still need do some changes to make Solo5 support multi-arch. While I was doing ARM64 |
@mato Actually, we prefer to carve out the MMIO below the guest image. It's easier for us to build page table for guest. In my current changes, I placed the MMIO at the beginning is trying to make ARM and x86_64 have the same IO addresses. But MMIO access alignment limit break my attempt. I would move MMIO slot to the end, if this will make the refactor more easier. |
|
On Monday, 27.03.2017 at 00:49, Weichen81 wrote:
Sorry for replying later. I have read the changes. And I think the host side refactor can satisfy most of what we have done for ARM64. As @ricarkol commented above, setting the guest memory for ARM64 is very different from x86_64. Can we introduce a file like ukvm_mem_{arch}.c to make memory setting become arch/platform specific?
If by "setting the guest memory" you mean setting up the guest VCPU
structures and state (page tables, equivalent of x86 descriptor tables),
then the place to put that will be `ukvm_cpu_{arch}.[ch]`. The current
refactoring already does that for x86_64. The idea is that the processor
specific code that _can_ be trivially abstracted out goes in those files,
but is implemented in a way that's agnostic to the hypervisor backend (in
other words, those files should not depend on anything in `struct hv`).
This will hopefully become clearer once I get the vmmapi backend working.
@mato Yes, in current stage, Solo5 is running in kernel mode, we can't simply make it become arch independent. We still need do some changes to make Solo5 support multi-arch. While I was doing ARM64
porting, I have some simple but maybe not very normative changes. Can we coordinate base on these changes?
Yes, but I need to do the vmmapi step first.
|
|
On Monday, 27.03.2017 at 00:58, Weichen81 wrote:
@mato Actually, we prefer to carve out the MMIO below the guest image. It's easier for us to build page table for guest. In my current changes, I placed the MMIO at the beginning is trying to make ARM and x86_64 have to same IO addresses. But MMIO access alignment limit break my attempt. I would move MMIO slot to the end, if this will make the refactor more easier.
I don't know yet. Ideally I'd like to arrive at a situation where we do not
use a sparse memory map so that the host can trivially access guest memory
as `hv->mem + gpa`. This avoids the need for complicated address
translations in the monitor.
|
API change:
struct hv *ukvm_hv_init(size_t mem_size).
The caller may use (hv->mem) and (hv->mem_size) after calling this
function.
Rationale:
FreeBSD's vmm requires mapping guest memory from the vmm device (as
opposed to anonymous memory). Also, we may want to tweak mem_size based
on {arch,backend}-specific requirements so this gives us a way to do it.
We do however lose the option of mmap()ing arbitrary objects into guest
memory, too bad.
Change behaviour of KVM/x86_64 code to initialise VCPU state from scratch rather than using values returned by KVM_GET_SREGS as a base. This allows us to unify initial VCPU state values with those used by other backends. General improvements to the CPU abstractions used, also in preparation for other backends: - rename struct x86_seg to x86_sreg and change the structure to define an x86_64 shadow register, since that's what it is. - update callers to match the above change. - cleanups and more documentation in ukvm_cpu_x86_64.h.
|
I've pushed a couple of changes in preparation for the FreeBSD vmm backend (b6826b9 and 682411f). Also, I've got a work in progress branch with the vmm backend which you can find here. (@djwillia: This will be useful for you to internalize the new abstractions). Note that unlike this PR, the |
To enable multi-backend we need a clock that does not depend on any backend-specific device. This change replaces the use of pvclock by the ukvm target with: - a simplified TSC-only implementation of tscclock - a new hypercall to get wall clock time from the monitor (UKVM_HYPERCALL_WALLTIME). - a new arch-dependent section in struct ukvm_boot_info, used to pass the (x86) TSC frequency to the guest. Note that the monitor will now refuse to run on a host with an unstable TSC, see the implementation notes in kernel/ukvm/tscclock.c for an explanation. Related changes also part of this commit: - Add a simple test that the clock(s) work(s), in tests/test_time. The upper limit on how much is "too much" to sleep may need to be tweaked (or the test result ignored) on Travis. - hypercall_poll() now blocks all signals except SIGINT and SIGTERM rather than restarting the ppoll() call which was incorrectly assuming (*ts) would be modified by ppoll().
Thanks for the heads-up. I rebased the port once again and everything looks fine after some minor changes. |
Add a FreeBSD vmm(4) (a.k.a. bhyve) backend to ukvm. The "gdb" module is not currently supported and is stubbed out when built on FreeBSD. Additional changes: - ukvm-configure has been rewritten to generate a less convoluted Makefile.ukvm, including correct dependencies. I've also added user-facing documentation about its invocation. - the code to ensure that the tap interface is pre-existing in tap_attach() has been simplified and ported to FreeBSD. - the upper sleep limit in test_time has been raised somewhat to account for behaviour on my (oldish) FreeBSD test hardware.
|
e67a007 adds the FreeBSD To test this with MirageOS you will need the following pins: The @hannesm @djwillia Given that the build will require GNU make for the foreseeable future, are you ok with my renaming all our makefiles to This will ensure that *BSD users who have not read the instructions will be presented with a less confusing error message ("No Makefile found" as opposed to "half a screen of obscure syntax errors"). The downside is that |
|
@mato in opam you can use the (this should say |
|
On 03/04/2017 12:38, Martin Lucina wrote:
e67a007 adds the FreeBSD `vmm(4)` backend and associated changes. Note that the net code added to support the backend is around 500 LOC.
nice!
@hannesm @djwillia Given that the build will require GNU make for the foreseeable future, are you ok with my renaming all our makefiles to `GNUmakefile<SUFFIX>` as part of this PR?
I appreciate this change!
The downside is that `mirage` master will need to a) check if we're using the old scheme (`Makefile.ukvm`) or the new scheme (`GNUmakefile.ukvm`) and b) somehow (via `opam`?) figure out if it should call out to `make` or `gmake` on the current system.
Well, once solo5 gets another release, mirage can raise its lower bound
constraint of solo5-kernel-*, and than will always have the new scheme.
|
|
On 03/04/2017 12:54, Thomas Gazagnaire wrote:
@mato in opam you can use `make` (without double quotes) to get the translation to `gmake` in BSD. You can see that by writing:
```
$ opam config var make
make
```
(this should say `gmake` on *BSD systems)
I'm not a fan of introducing callouts to opam from within mirage/mirage
generated code. Can we instead e.g. emit a build rule into the opam
file, and leave the building to opam (I'm sure we can find a sensible
workflow which includes putting the build results somewhere reasonable).
|
|
it looks good to me, some remarks: above in the issue you nicely wrote down synopsises of each file in the repository, it'd imho be great to include this in each .c and .h file a short synopsis. about GNU make vs BSD make: I guess there is no plan to compile VMM unikernels with a linux/MacOSX toolchain, neither KVM with a BSD/MacOSX toolchain, neither uhvf with a linux/BSD toolchain, or is there? if not, maybe instead of requiring a GNU make we can go into the burden of having a solo5-compile-time decision (by providing a different |
|
On Monday, 03.04.2017 at 09:06, Hannes Mehnert wrote:
it looks good to me, some remarks: above in the issue you nicely wrote down synopsises of each file in the repository, it'd imho be great to include this in each .c and .h file a short synopsis.
Good idea, I'll do that.
about GNU make vs BSD make: I guess there is no plan to compile VMM unikernels with a linux/MacOSX toolchain, neither KVM with a BSD/MacOSX toolchain, neither uhvf with a linux/BSD toolchain, or is there?
That's unclear at the moment and I'd like to keep our options open.
[... different configure.sh for ukvm, generating different Makefiles ...] the GNU features used (afaics) are conditional (for passing some `-static` flag to ld, and `addprefix` (did I miss something?)!?
It's more than that, the rest of the Solo5 build system (mainly the tests) rely on including `Makefile.ukvm` from parent `Makefiles`, which is also the way things were originally done in Mirage before the unikernel build got changed to not depend on `make`. In order to support this mode of operation, `Makefile.ukvm` needs to use `%`-style rules which are not available in BSD make.
(Notwithstanding the above, I actually removed the use of `addprefix` and conditionals in the generated `Makefile.ukvm` as part introducing the VMM backend.)
To be honest, I _really_ don't want to go down this rabbit hole any deeper than necessary. There's nothing wrong with GNU make, and given that the rest of the Solo5 build system already depends on it I don't see enough benefit in making a special case for `Makefile.ukvm`.
To summarize:
1. We stay with GNU make for now as the costs of maintaining multiple build systems for `ukvm` outweigh the benefits.
2. I'll rename all instances of `Makefile.*` to `GNUmakefile.*` for the benefit of not confusing users on *BSD as part of this change.
3. MirageOS will need to figure out some way to invoke the correct `make` for `ukvm` targets. @hannesm Can we take that discussion to a separate issue once this change lands?
|
|
On 04/04/2017 13:22, Martin Lucina wrote:
1. We stay with GNU make for now as the costs of maintaining multiple build systems for `ukvm` outweigh the benefits.
2. I'll rename all instances of `Makefile.*` to `GNUmakefile.*` for the benefit of not confusing users on *BSD as part of this change.
3. MirageOS will need to figure out some way to invoke the correct `make` for `ukvm` targets. @hannesm Can we take that discussion to a separate issue once this change lands?
sounds good to me
|
|
Status update / next steps (and recap of yesterday's discussion with @djwillia on Slack):
|
This namespace is reserved by C99, drop the __ for preprocessor symbols (mainly include guards).
This is done as a merge rather than rebase since others are tracking this branch. Fix conflicts with 6a900ae (formatting changes only).
This is for the benefit of *BSD users so that they get a meaningful error message rather than a screen full of obscure syntax errors. (See discussion in Solo5#171)
This is a temporary solution to avoid having to teach MirageOS about "which make to use". It is not expected to work for including Makefile.ukvm from other makefiles.
|
I've pushed the change to rename top-level makefiles only to Per the plan above, and yesterday's release of 0.2.2 from current |
This header should be self-contained, so only use assert if it is defined.
|
(For the record, I'm not going to update any end-user documentation related to the new backend support until the rest of this plan is complete and ideally after #172 is resolved). |
This change refactors the ukvm monitor to support multiple processor architectures and hypervisor backends. It also includes preparatory work to allow for using monitor code as a library in the future.
Changes visible to ukvm guests
ukvm_guest.h.ukvm.hhas been re-purposed for monitor-side use.ukvm_do_hypercall(int UKVM_HYPERCALL_nnn, void *arg).Internal changes
General conventions
ukvm_gpa_tto represent guest physical addresses.GUEST_CHECKED_PADDR()has been rewritten appropriately asUKVM_CHECKED_GPA_P(), the macro is used only to provide__FILE__and__LINE__.Monitor source files have been reorganised as follows:
ukvm.h: Monitor APIs. Arch-independent monitor code does not need to include any other header files.ukvm_main.c: Main program. Arch and backend independent.ukvm_core.c: Core functions. Maintains tables of modules, hypercall handlers and vmexit handlers. Implements core hypercalls (polling and console) which are always present. Arch and backend independent.ukvm_elf.c: ELF loader. Arch and backend independent, will be extended internally to support future archs as this is just a set of#ifdefs aroundhdr.e_machine.ukvm_cpu_{arch}.[ch]: Common arch-dependent code for backend implementations.ukvm_hv_{backend}.[ch]: Arch-independent part of backend implementation.ukvm_hv_{backend}_{arch}.[ch]: Arch-dependent part of backend implementation.ukvm_module_*.c: Module implementations.Build system changes
ukvm-configurenow accepts aUKVM_CCenvironment variable specifying the compiler to use. This is intended to support cross-compilation in the future.ukvm/Makefilewhich builds aukvm-binwith all possible modules configured.Use as a library
The code has been cleaned up to only export symbols prefixed with "ukvm_". Currently all monitor APIs use the "fail fast" principle for error handling. For production use as a library this will need to be changed to report (but not handle) errors to the top-level caller; this will be done as a separate change later.
Porting to a new backend
ukvm_hv_init(),ukvm_hv_vcpu_init()andukvm_hv_vcpu_loop().ukvm-configure.Rationale
#ifdefto select arch-specific code.ukvm_core_register_hypercall().ukvm_core_register_vmexit().Next steps
@mato:
I don't expect to merge this until at least the steps above have been done.
@djwillia @ricarkol:
For now, please review the interfaces and abstractions provided. I wouldn't bother with the diff as such as it's way too large -- sorry about that. I tried to do things in smaller steps but this change touches basically "everything" in ukvm so that was not feasible.
Porting
uhvfto the new order will be possible once I have step 2 above done.@Kensan: As above, this should give you most of what you need for your muen port.
@Weichen81: Please monitor this PR, but don't change what you're doing yet.
/cc #167 #150 #151 #157