Skip to content

ukvm: Refactor monitor to support multi-{arch, backend}#171

Merged
mato merged 11 commits intoSolo5:masterfrom
mato:multifactor
Apr 5, 2017
Merged

ukvm: Refactor monitor to support multi-{arch, backend}#171
mato merged 11 commits intoSolo5:masterfrom
mato:multifactor

Conversation

@mato
Copy link
Copy Markdown
Member

@mato mato commented Mar 22, 2017

This change refactors the ukvm monitor to support multiple processor architectures and hypervisor backends. It also includes preparatory work to allow for using monitor code as a library in the future.

Changes visible to ukvm guests

  1. Guest-side APIs are now defined in ukvm_guest.h. ukvm.h has been re-purposed for monitor-side use.
  2. A new, architecture-independent API for invoking hypercalls is introduced: ukvm_do_hypercall(int UKVM_HYPERCALL_nnn, void *arg).

Internal changes

General conventions

  1. Monitor code should use ukvm_gpa_t to represent guest physical addresses.
  2. Use of macros is discouraged as they are not type-safe. GUEST_CHECKED_PADDR() has been rewritten appropriately as UKVM_CHECKED_GPA_P(), the macro is used only to provide __FILE__ and __LINE__.

Monitor source files have been reorganised as follows:

ukvm.h: Monitor APIs. Arch-independent monitor code does not need to include any other header files.
ukvm_main.c: Main program. Arch and backend independent.
ukvm_core.c: Core functions. Maintains tables of modules, hypercall handlers and vmexit handlers. Implements core hypercalls (polling and console) which are always present. Arch and backend independent.
ukvm_elf.c: ELF loader. Arch and backend independent, will be extended internally to support future archs as this is just a set of #ifdefs around hdr.e_machine.
ukvm_cpu_{arch}.[ch]: Common arch-dependent code for backend implementations.
ukvm_hv_{backend}.[ch]: Arch-independent part of backend implementation.
ukvm_hv_{backend}_{arch}.[ch]: Arch-dependent part of backend implementation.
ukvm_module_*.c: Module implementations.

Build system changes

  • ukvm-configure now accepts a UKVM_CC environment variable specifying the compiler to use. This is intended to support cross-compilation in the future.
  • To ensure that all monitor code and modules build, there is now an ukvm/Makefile which builds a ukvm-bin with all possible modules configured.

Use as a library

The code has been cleaned up to only export symbols prefixed with "ukvm_". Currently all monitor APIs use the "fail fast" principle for error handling. For production use as a library this will need to be changed to report (but not handle) errors to the top-level caller; this will be done as a separate change later.

Porting to a new backend

  1. Implement ukvm_hv_init(), ukvm_hv_vcpu_init() and ukvm_hv_vcpu_loop().
  2. Add the build-time backend selection code to ukvm-configure.

Rationale

  1. Implement the minimal amount of abstraction to unblock multi-{arch,backend} support, preferring code clarity and simplicity over introducing another layer of abstraction. Notably, I have chosen not to provide a backend-independent abstraction for vmexit handlers or VCPU state as the only consumer of this abstraction would be the "gdb" module.
  2. Minimise the use of #ifdef to select arch-specific code.
  3. Hypercall implementations must be arch and backend agnostic. This is provided by ukvm_core_register_hypercall().
  4. It must still be possible to handle any vmexits from a module, but if you do that then you get to deal with the multi-{arch, backend} details yourself. This is what "gdb" does via ukvm_core_register_vmexit().

Next steps

@mato:

  1. Answer all your questions and fill in what's missing.
  2. Implement a bhyve (vmmapi) backend for ukvm to validate that these abstractions work. This will also require solving the "what to do about (lack of) pvclock" issue (cc @hannesm @dstolfa).
  3. Coordinate with @Weichen81 and complete the other (much smaller) part of this change which is multi-arch for the Solo5 kernel so that the guest side of the ARM port has somewhere to land.

I don't expect to merge this until at least the steps above have been done.

@djwillia @ricarkol:

For now, please review the interfaces and abstractions provided. I wouldn't bother with the diff as such as it's way too large -- sorry about that. I tried to do things in smaller steps but this change touches basically "everything" in ukvm so that was not feasible.

Porting uhvf to the new order will be possible once I have step 2 above done.

@Kensan: As above, this should give you most of what you need for your muen port.

@Weichen81: Please monitor this PR, but don't change what you're doing yet.

/cc #167 #150 #151 #157

This change refactors the ukvm monitor to support multiple processor
architectures and hypervisor backends. It also includes preparatory work to
allow for using monitor code as a library in the future.

Changes visible to ukvm guests:

1. Guest-side APIs are now defined in ukvm_guest.h. ukvm.h has been
repurposed for monitor-side use.
2. A new, architecture-independent API for invoking hypercalls is
introduced: ukvm_do_hypercall(int UKVM_HYPERCALL_nnn, void *arg).

Internal changes:

General conventions:

1. Monitor code should use ukvm_gpa_t to represent guest physical
addresses.
2. Use of macros is discouraged as they are not type-safe.
GUEST_CHECKED_PADDR() has been rewritten appropriately as
UKVM_CHECKED_GPA_P(), the macro is used only to provide __FILE__ and
__LINE__.

Monitor source files have been reorganised as follows:

ukvm.h: Monitor APIs. Arch-independent monitor code does not need to
include any other header files.
ukvm_main.c: Main program. Arch and backend independent.
ukvm_core.c: Core functions. Maintains tables of modules, hypercall
    handlers and vmexit handlers. Implements core hypercalls (polling and
    console) which are always present. Arch and backend independent.
ukvm_elf.c: ELF loader. Arch and backend independent, will be extended
    internally to support future archs as this is just a set of #ifdefs
    around hdr.e_machine.
ukvm_cpu_{arch}.[ch]: Common arch-dependent code for backend
    implementations.
ukvm_hv_{backend}.[ch]: Arch-independent part of backend implementation.
ukvm_hv_{backend}_{arch}.[ch]: Arch-dependent part of backend
    implementation.
ukvm_module_*.c: Module implementations.

Build system changes:

ukvm-configure now accepts a UKVM_CC environment variable specifying the
compiler to use. This is intended to support cross-compilation in the
future.

To ensure that all monitor code and modules build, there is now an
ukvm/Makefile which builds a ukvm-bin with all possible modules configured.

Use as a library:

The code has been cleaned up to only export symbols prefixed with "ukvm_".
Currently all monitor APIs use the "fail fast" principle for error
handling.  For production use as a library this will need to be changed to
report (but *not* handle) errors to the top-level caller; this will be done
as a separate change later.
Copy link
Copy Markdown
Collaborator

@ricarkol ricarkol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mato, looks good, just a couple of questions:

  1. ukvm_elf_load should be different depending on the architecture, or at least the checks like ukvm_elf.c:124 should change. Maybe this should be abstracted as well?
  2. ukvm-bin and the unikernel xxx.ukvm will be named the same irrespective of the architecutre and platform. Wouldn't it be better to generate something like ukvm-bin-x86_64-linux and hello.x86_64-linux.ukvm?
  3. Cross-compilation or scenarios like running an i686 guest on a x86_64 host (includeos style) are performed by picking a specific UKVM_CC right?. I really think that naming the binary based on the arch might avoid surprises.

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 23, 2017 via email

@ricarkol
Copy link
Copy Markdown
Collaborator

ricarkol commented Mar 23, 2017

Thanks mato. About the naming of ukvm-bin, I was mainly thinking about usability. It would be like having all my gcc versions called gcc and all overwriting each other every time I build a new cross compiler.

About point 1., the elf structures might have to be changed to 32 bits (if 32 bits):

    Elf32_Off ph_off;
    Elf32_Half ph_entsz;
    Elf32_Half ph_cnt;
    Elf32_Half ph_i;
    Elf32_Phdr *phdr = NULL;
    Elf32_Ehdr hdr;

@dstolfa
Copy link
Copy Markdown

dstolfa commented Mar 23, 2017

Hello Mato:

On the topic of naming, I'm wondering whether it would make sense to rename ukvm to something more generic, as it is no longer KVM-specific. It sounds a little confusing to have a ukvm monitor for different backends, such as Hypervisor.Framework, libvmmapi and others. Seeing as it's a monitor for unikernels, perhaps something like umon or ukmon would be suitable?

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 23, 2017 via email

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 23, 2017

@ricarkol:

Thanks mato. About the naming of ukvm-bin, I was mainly thinking about usability. It would be like having all my gcc versions called gcc and all overwriting each other every time I build a new cross compiler.

That is part of a separate high-level discussion we need to have about "when/how things get built"; let's do that after this has landed. For the purposes of this change I'm assuming that a normal jane user of ukvm-configure gets a ukvm-bin suitable for the host they're building on and nothing more. UKVM_CC is there mainly for CI purposes right now.

About point 1., the elf structures might have to be changed to 32 bits (if 32 bits):

Yes. This should be possible with judicious use of #defines and fixing the loader to use the elf.h types not its own.

@ricarkol
Copy link
Copy Markdown
Collaborator

Sounds good @mato, we can leave the renaming for later.

BTW, based on https://github.com/Weichen81/solo5/blob/arm64/ukvm/arm64/ukvm-setup.c#L174, it seems that setting the guest memory might also be arch/platform specific. Although what's done there for ARM could be done for all architectures.

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 27, 2017

BTW, based on https://github.com/Weichen81/solo5/blob/arm64/ukvm/arm64/ukvm-setup.c#L174, it seems that setting the guest memory might also be arch/platform specific. Although what's done there for ARM could be done for all architectures.

I saw that. Depending on arch-specific requirements we might be able to get away with a simpler arrangement where we carve out the MMIO "slot" (if any) at the top of memory rather than below the guest image. This would mean we wouldn't have to play games with as many multiple slots on the backend side.

Also, the top-level interface to how you get your memory will change a bit to accomodate bhyve/vmmapi, give me a couple more days to get that in and we'll go from there.

@Weichen81
Copy link
Copy Markdown
Contributor

Sorry for replying later. I have read the changes. And I think the host side refactor can satisfy most of what we have done for ARM64. As @ricarkol commented above, setting the guest memory for ARM64 is very different from x86_64. Can we introduce a file like ukvm_mem_{arch}.c to make memory setting become arch/platform specific?

Coordinate with @Weichen81 and complete the other (much smaller) part of this change which is multi-arch for the Solo5 kernel so that the guest side of the ARM port has somewhere to land.

@mato Yes, in current stage, Solo5 is running in kernel mode, we can't simply make it become arch independent. We still need do some changes to make Solo5 support multi-arch. While I was doing ARM64
porting, I have some simple but maybe not very normative changes. Can we coordinate base on these changes?

@Weichen81
Copy link
Copy Markdown
Contributor

Weichen81 commented Mar 27, 2017

arrangement where we carve out the MMIO "slot" (if any) at the top of memory rather than below the guest image. This would mean we wouldn't have to play games with as many multiple slots on the backend side

@mato Actually, we prefer to carve out the MMIO below the guest image. It's easier for us to build page table for guest. In my current changes, I placed the MMIO at the beginning is trying to make ARM and x86_64 have the same IO addresses. But MMIO access alignment limit break my attempt. I would move MMIO slot to the end, if this will make the refactor more easier.

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 27, 2017 via email

@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 27, 2017 via email

mato added 2 commits March 28, 2017 13:04
API change:

struct hv *ukvm_hv_init(size_t mem_size).

The caller may use (hv->mem) and (hv->mem_size) after calling this
function.

Rationale:

FreeBSD's vmm requires mapping guest memory from the vmm device (as
opposed to anonymous memory). Also, we may want to tweak mem_size based
on {arch,backend}-specific requirements so this gives us a way to do it.
We do however lose the option of mmap()ing arbitrary objects into guest
memory, too bad.
Change behaviour of KVM/x86_64 code to initialise VCPU state from
scratch rather than using values returned by KVM_GET_SREGS as a base.
This allows us to unify initial VCPU state values with those used by
other backends.

General improvements to the CPU abstractions used, also in preparation
for other backends:

- rename struct x86_seg to x86_sreg and change the structure to define
  an x86_64 shadow register, since that's what it is.
- update callers to match the above change.
- cleanups and more documentation in ukvm_cpu_x86_64.h.
@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 28, 2017

I've pushed a couple of changes in preparation for the FreeBSD vmm backend (b6826b9 and 682411f). Also, I've got a work in progress branch with the vmm backend which you can find here. (@djwillia: This will be useful for you to internalize the new abstractions).

Note that unlike this PR, the fbsdvmm branch will get rebased occasionally, so beware.

@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented Mar 29, 2017

@Kensan: As above, this should give you most of what you need for your muen port.

I rebased the Muen port on the multifactor branch and the changes look fine from my side.

To enable multi-backend we need a clock that does not depend on any
backend-specific device. This change replaces the use of pvclock by the
ukvm target with:

- a simplified TSC-only implementation of tscclock
- a new hypercall to get wall clock time from the monitor
  (UKVM_HYPERCALL_WALLTIME).
- a new arch-dependent section in struct ukvm_boot_info, used to pass
  the (x86) TSC frequency to the guest.

Note that the monitor will now refuse to run on a host with an unstable
TSC, see the implementation notes in kernel/ukvm/tscclock.c for an
explanation.

Related changes also part of this commit:

- Add a simple test that the clock(s) work(s), in tests/test_time. The
  upper limit on how much is "too much" to sleep may need to be tweaked
  (or the test result ignored) on Travis.
- hypercall_poll() now blocks all signals except SIGINT and SIGTERM
  rather than restarting the ppoll() call which was incorrectly
  assuming (*ts) would be modified by ppoll().
@mato
Copy link
Copy Markdown
Member Author

mato commented Mar 30, 2017

See e3da88b for an elegant solution to (lack of) pvclock. @Kensan this will affect the Muen port as it touches the guest side of things.

@Kensan
Copy link
Copy Markdown
Contributor

Kensan commented Mar 30, 2017

@Kensan this will affect the Muen port as it touches the guest side of things.

Thanks for the heads-up. I rebased the port once again and everything looks fine after some minor changes.

Add a FreeBSD vmm(4) (a.k.a. bhyve) backend to ukvm.

The "gdb" module is not currently supported and is stubbed out when built
on FreeBSD.

Additional changes:
- ukvm-configure has been rewritten to generate a less convoluted
  Makefile.ukvm, including correct dependencies. I've also added
  user-facing documentation about its invocation.
- the code to ensure that the tap interface is pre-existing in
  tap_attach() has been simplified and ported to FreeBSD.
- the upper sleep limit in test_time has been raised somewhat to account
for behaviour on my (oldish) FreeBSD test hardware.
@mato
Copy link
Copy Markdown
Member Author

mato commented Apr 3, 2017

e67a007 adds the FreeBSD vmm(4) backend and associated changes. Note that the net code added to support the backend is around 500 LOC.

To test this with MirageOS you will need the following pins:

opam pin add solo5-kernel-ukvm git://github.com/mato/solo5#multifactor
opam pin add mirage git://github.com/mato/mirage#temp-freebsd

The mirage change is required due to the ukvm build system being dependent on GNU make. This will not change -- I spent several hours on Friday looking for a solution -- with the way the build system is structured (allows including of Makefile.ukvm from parent Makefiles) it is not feasible to have a non-GNU Makefile.

@hannesm @djwillia Given that the build will require GNU make for the foreseeable future, are you ok with my renaming all our makefiles to GNUmakefile<SUFFIX> as part of this PR?

This will ensure that *BSD users who have not read the instructions will be presented with a less confusing error message ("No Makefile found" as opposed to "half a screen of obscure syntax errors").

The downside is that mirage master will need to a) check if we're using the old scheme (Makefile.ukvm) or the new scheme (GNUmakefile.ukvm) and b) somehow (via opam?) figure out if it should call out to make or gmake on the current system.

@samoht
Copy link
Copy Markdown

samoht commented Apr 3, 2017

@mato in opam you can use the make variable (without double quotes) to get the translation to gmake in BSD. You can see that by writing:

$ opam config var make
make

(this should say gmake on *BSD systems)

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented Apr 3, 2017 via email

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented Apr 3, 2017 via email

Copy link
Copy Markdown
Collaborator

@ricarkol ricarkol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented Apr 3, 2017

it looks good to me, some remarks: above in the issue you nicely wrote down synopsises of each file in the repository, it'd imho be great to include this in each .c and .h file a short synopsis.

about GNU make vs BSD make: I guess there is no plan to compile VMM unikernels with a linux/MacOSX toolchain, neither KVM with a BSD/MacOSX toolchain, neither uhvf with a linux/BSD toolchain, or is there?

if not, maybe instead of requiring a GNU make we can go into the burden of having a solo5-compile-time decision (by providing a different ukvm-configure.sh) which Makefile to generate (GNU or BSD style)!? This would be a burden on solo5 (maintaining multiple Makefile flavours), but afaics we don't have other platform-dependent code in mirage (yet?). Another potential solution is to encode into the pkg-config file the make vs gmake, but this will be painful for the future as well. the GNU features used (afaics) are conditional (for passing some -static flag to ld, and addprefix (did I miss something?)!?

@mato
Copy link
Copy Markdown
Member Author

mato commented Apr 4, 2017 via email

@hannesm
Copy link
Copy Markdown
Contributor

hannesm commented Apr 4, 2017 via email

@mato
Copy link
Copy Markdown
Member Author

mato commented Apr 4, 2017

Status update / next steps (and recap of yesterday's discussion with @djwillia on Slack):

  1. I will cut an interim release of the current Solo5 master, so that existing users get the fixes there (especially Replace kernel/lib.c with musl-derived implementations #166) before we start breaking things.
  2. After addressing the remainder of the comments here I will merge this PR.
  3. I will do the guest-(kernel-)side multi-arch refactoring in a separate change. Once that lands we will be able to upstream @Weichen81's work on ARM64 KVM support.
  4. During/once all of the above are done we can work on re-syncing Mirage with the changes.

mato added 2 commits April 4, 2017 16:09
This namespace is reserved by C99, drop the __ for preprocessor symbols
(mainly include guards).
mato added 3 commits April 5, 2017 10:33
This is done as a merge rather than rebase since others are tracking
this branch.

Fix conflicts with 6a900ae (formatting changes only).
This is for the benefit of *BSD users so that they get a meaningful
error message rather than a screen full of obscure syntax errors. (See
discussion in Solo5#171)
This is a temporary solution to avoid having to teach MirageOS about
"which make to use". It is not expected to work for including
Makefile.ukvm from other makefiles.
@mato
Copy link
Copy Markdown
Member Author

mato commented Apr 5, 2017

I've pushed the change to rename top-level makefiles only to GNUmakefile (fc4f953) and a temporary "hack" to allow Makefile.ukvm invocation with BSD make (27d53c1) which removes the immediate need for Mirage to know "which make to use". We should now be good to go on FreeBSD without needing to pin mirage.

Per the plan above, and yesterday's release of 0.2.2 from current master, I think this is OK to merge.

This header should be self-contained, so only use assert if it is
defined.
@mato
Copy link
Copy Markdown
Member Author

mato commented Apr 5, 2017

(For the record, I'm not going to update any end-user documentation related to the new backend support until the rest of this plan is complete and ideally after #172 is resolved).

@mato mato merged commit d057836 into Solo5:master Apr 5, 2017
@mato mato deleted the multifactor branch May 11, 2017 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants