[WIP] api: an early sketch for kata libary api by laijs · Pull Request #32 · kata-containers/runtime

laijs · 2018-02-14T09:39:59Z

A api set that intends for bridging cri(fratki,hyperd) and
oci(kata-runtime) with kata.

Signed-off-by: Lai Jiangshan jiangshanlai@gmail.com

jodh-intel · 2018-02-15T10:41:34Z

api/agent.go

+
+type Agent interface {
+	Config() AgentConfig
+	Realese() (json.RawMessage, error)


typo: Release().

jodh-intel · 2018-02-15T10:41:55Z

api/agent.go

+type Agent interface {
+	Config() AgentConfig
+	Realese() (json.RawMessage, error)
+	Destory()


typo: Destroy().

jodh-intel · 2018-02-15T10:43:02Z

api/agent.go

+	PauseSync() error
+	Unpause() error
+
+	APIVersion() (uint32, error)


How about making this use semantic versioning (https://semver.org/) as that would be more flexible for clients I think:

APIVersion() (string, error)

jodh-intel · 2018-02-15T10:43:25Z

api/agent.go

+	DestroySandbox() error
+	UpdateRoute(r []Route) error
+	UpdateInterface(ifs []Interface) error
+	//OnlineCpuMem() error


Remove or uncomment?

jodh-intel · 2018-02-15T10:44:41Z

api/network.go

+	Bridge  string
+	Ip      string
+	Mac     string
+	Mtu     uint64


I'd prefer IP, MAC and MTU here as they are all abbreviations.

jodh-intel · 2018-02-15T10:52:36Z

api/provision.go

+
+	// add cpu/memory/storage/network and other devices.
+	// if the vm is not started, Device might be added on boot instead of hutpluging
+	AddDeivce(Device) (kata.Device, error)


AddDevice()

jodh-intel · 2018-02-15T10:53:05Z

api/provision.go

+	// the resulted device as normal block device (without dirver field).
+	AddStorage(Storage) (Storage, error)
+	RemoveStorage(Storage) error
+	AddNic(NetowrkDevice) (Interface, error)


AddNIC(NetworkDevice)?

jodh-intel · 2018-02-15T10:53:40Z

api/provision.go

+	AddStorage(Storage) (Storage, error)
+	RemoveStorage(Storage) error
+	AddNic(NetowrkDevice) (Interface, error)
+	RemoveNic(NetowrkDevice) error


RemoveNIC(NetworkDevice)?

jodh-intel · 2018-02-15T10:53:55Z

api/provision.go

+	Build(VMConfig, AgentConfig, QosConfig) (VM, error)
+	BuildFromReleased(json.RawMessage) (VM, error)
+	BuildFromSnapshot(Snapshot, VMConfig, AgentConfig, QosConfig) (VM, error)
+	Cappabilities() BuilderCappabilities


Capabilities() BuilderCapabilities.

jodh-intel · 2018-02-15T10:54:53Z

api/sandbox.go

+type ContainerConfig struct {
+	Id string
+
+	UGI    *UserGroupInfo


Maybe UsersAndGroups would make this a little clearer.

jodh-intel · 2018-02-15T10:55:39Z

Thanks for putting this together @laijs. Just a few initial typo corrections mostly.

sameo · 2018-02-20T09:16:15Z

@laijs If we go with proposal #33 , this PR will have to be redefined as the current virtcontainers API has a lot of overlap with what what you're proposing here.

sameo · 2018-02-21T14:13:50Z

@laijs @gnawux The virtcontainers 1.0 API is now documented here:

https://github.com/containers/virtcontainers/blob/master/documentation/api/1.0/api.md

bergwolf · 2018-02-22T01:32:29Z

@sameo I thought if the virtcontainers API has a lot of overlap with the API in this PR, it will be quite easy for the virtcontainers framework to support the API defined here. Why do you say it has to be redefined?

bergwolf

@laijs what about the persist API and VM Associate? They are needed if we want to reconnect to a running vm.

bergwolf · 2018-02-22T01:38:22Z

api/agent.go

+
+	HyperstartCtlPath, HyperstartStreamPath string
+	KataPath                                string
+	KataVsockCid, KataVsockPort             uint32


These also need documentation especially KataPath which is not a self-explaining name.

sameo · 2018-02-23T09:09:33Z

@bergwolf

I thought if the virtcontainers API has a lot of overlap with the API in this PR, it will be quite easy for the virtcontainers framework to support the API defined here. Why do you say it has to be redefined?

I think there's some confusion here. There is indeed a lot of overlap between what's proposed here and the current virtcontainers API. So are you proposing that we implement the kata containers runtime API on top of the virtcontainers API, or would the (potentially modified) virtcontainers API become the runtime API? Given the overlap between the 2, I think the first option would mostly be redundant.

bergwolf · 2018-02-23T09:40:56Z

@sameo I certainly do not want to see two sets of APIs in the kata runtime. I think the goal of the API discussion here is to have a unified API to support both kata command line and CRI native implementations.

There are already many overlaps between @laijs's PR and the current virtconatiners API. So please comment on what the gaps you have in mind (both design-wise and implementation-wise) and we can work together to improve the API definition to a state that all stakeholders are satisfied. When we all agree that the API is good enough, we should be able to implement the API with virtcontainers framework and use it to support both use cases.

sameo · 2018-02-23T11:10:00Z

@bergwolf I'll do an analysis of what's currently missing from the current virtcontainers API to implement something that would satisfy this PR, at least from what I understand from it.

sameo

Compared to the virtcontainers API, this PR is more imperative. It exposes more structures and interfaces to the caller while virtcontainers provides a declarative only API.

Assuming this PR describes what Frakti would need to consume the Kata Containers runtime API, I wonder how useful exporting the agent, networking, persistence is? Keeping those as internal interfaces would already get us closer in terms of convergence.

sameo · 2018-02-23T15:22:29Z

api/agent.go

+// proxies, shim should be implemented via AgentBuilder (and Agent.CreateContainer() Agent.ExecProcess())
+type AgentBuilder interface {
+	Build(AgentConfig) Agent
+}


For the agent API, virtcontainers is restricting the runtime API to the agent configuration structure:

// HyperConfig is a structure storing information needed for // hyperstart agent initialization. type HyperConfig struct { SockCtlName string SockTtyName string } // KataAgentConfig is a structure storing information needed // to reach the Kata Containers agent. type KataAgentConfig struct { GRPCSocket string }

The agent interface is internal as we believe it should not be implemented by the runtime, but should be an internal virtcontainers implementation. The runtime implementer should only specify which agent type and config it's going to be using. For the record, here is our internal agent interface:

// agent is the virtcontainers agent interface. // Agents are running in the guest VM and handling // communications between the host and guest. type agent interface { // init is used to pass agent specific configuration to the agent implementation. // agent implementations also will typically start listening for agent events from // init(). // After init() is called, agent implementations should be initialized and ready // to handle all other Agent interface methods. init(pod *Pod, config interface{}) error // capabilities should return a structure that specifies the capabilities // supported by the agent. capabilities() capabilities // createPod will tell the agent to perform necessary setup for a Pod. createPod(pod *Pod) error // exec will tell the agent to run a command in an already running container. exec(pod *Pod, c Container, cmd Cmd) (*Process, error) // startPod will tell the agent to start all containers related to the Pod. startPod(pod Pod) error // stopPod will tell the agent to stop all containers related to the Pod. stopPod(pod Pod) error // createContainer will tell the agent to create a container related to a Pod. createContainer(pod *Pod, c *Container) (*Process, error) // startContainer will tell the agent to start a container related to a Pod. startContainer(pod Pod, c *Container) error // stopContainer will tell the agent to stop a container related to a Pod. stopContainer(pod Pod, c Container) error // killContainer will tell the agent to send a signal to a // container related to a Pod. If all is true, all processes in // the container will be sent the signal. killContainer(pod Pod, c Container, signal syscall.Signal, all bool) error // processListContainer will list the processes running inside the container processListContainer(pod Pod, c Container, options ProcessListOptions) (ProcessList, error) }

The agent package and the sandbox package will be different packages. Therefore, Agent and its Methods have to be exposed so that the sandbox package can use the agent package.

sameo · 2018-02-23T15:36:05Z

api/network.go

+	ID() string
+	Add(InterfaceConfig) (InterfaceDeive, error)
+	Remove(InterfaceDeive) error
+}


Here again the virtcontainers approach is to let the runtime API consumer configure the network through a relatively simple networking configuration structure:

// NetworkConfig is the network configuration related to a network. type NetworkConfig struct { NetNSPath string NumInterfaces int InterworkingModel NetInterworkingModel } const ( // NetXConnectDefaultModel Ask to use DefaultNetInterworkingModel NetXConnectDefaultModel NetInterworkingModel = iota // NetXConnectBridgedModel uses a linux bridge to interconnect // the container interface to the VM. This is the // safe default that works for most cases except // macvlan and ipvlan NetXConnectBridgedModel // NetXConnectMacVtapModel can be used when the Container network // interface can be bridged using macvtap NetXConnectMacVtapModel // NetXConnectEnlightenedModel can be used when the Network plugins // are enlightened to create VM native interfaces // when requested by the runtime // This will be used for vethtap, macvtap, ipvtap NetXConnectEnlightenedModel // NetXConnectInvalidModel is the last item to check valid values by IsValid() NetXConnectInvalidModel )

and again the network interface is internal. We believe the runtime should only specify which inter networking model it wants to use, and let virtcontainers handle that. After all both OCI and CRI do not include networking specification and the whole pod/container networking setup is offloaded outside the actual container runtime.
For the record, our internal network interface is defined as:

// network is the virtcontainers network interface. // Container network plugins are used to setup virtual network // between VM netns and the host network physical interface. type network interface { // init initializes the network, setting a new network namespace. init(config NetworkConfig) (string, bool, error) // run runs a callback function in a specified network namespace. run(networkNSPath string, cb func() error) error // add adds all needed interfaces inside the network namespace. add(pod Pod, config NetworkConfig, netNsPath string, netNsCreated bool) (NetworkNamespace, error) // remove unbridges and deletes TAP interfaces. It also removes virtual network // interfaces and deletes the network namespace. remove(pod Pod, networkNS NetworkNamespace) error }

sameo · 2018-02-23T15:56:15Z

api/persistence.go

+	ListSandboxs()[]SandboxStatus
+	ListContainers()[]ContainerStatus
+	SandboxOfContainer(string)string
+}


As @laijs pointed out, the equivalent for this in virtcontainers is the resourceStorage interface:

// resourceStorage is the virtcontainers resources (configuration, state, etc...) // storage interface. // The default resource storage implementation is filesystem. type resourceStorage interface { // Create all resources for a pod createAllResources(pod Pod) error // Resources URIs functions return both the URI // for the actual resource and the URI base. containerURI(podID, containerID string, resource podResource) (string, string, error) podURI(podID string, resource podResource) (string, string, error) // Pod resources storePodResource(podID string, resource podResource, data interface{}) error deletePodResources(podID string, resources []podResource) error fetchPodConfig(podID string) (PodConfig, error) fetchPodState(podID string) (State, error) fetchPodNetwork(podID string) (NetworkNamespace, error) storePodNetwork(podID string, networkNS NetworkNamespace) error // Hypervisor resources fetchHypervisorState(podID string, state interface{}) error storeHypervisorState(podID string, state interface{}) error // Agent resources fetchAgentState(podID string, state interface{}) error storeAgentState(podID string, state interface{}) error // Container resources storeContainerResource(podID, containerID string, resource podResource, data interface{}) error deleteContainerResources(podID, containerID string, resources []podResource) error fetchContainerConfig(podID, containerID string) (ContainerConfig, error) fetchContainerState(podID, containerID string) (State, error) fetchContainerProcess(podID, containerID string) (Process, error) storeContainerProcess(podID, containerID string, process Process) error fetchContainerMounts(podID, containerID string) ([]Mount, error) storeContainerMounts(podID, containerID string, mounts []Mount) error fetchContainerDevices(podID, containerID string) ([]Device, error) storeContainerDevices(podID, containerID string, devices []Device) error }

that is defined under the (I agree, misnamed) filesystem.go. The filesystem structure implements that interface.

I agree part of this API should be exported, but here again not down to the persistence/resourceStorage interface level, but to the persistence/resourceStorage type instead. In other words, allowing the runtime API caller to select which persistence backend it wants to use and configure it.

sameo · 2018-02-23T16:07:15Z

api/provision.go

+
+	// to be filled
+}
+


A combination of those virtcontainers structures are equivalent to the above one:

// Resources describes VM resources configuration. type Resources struct { // VCPUs is the number of available virtual CPUs. VCPUs uint // Memory is the amount of available memory in MiB. Memory uint } // HypervisorConfig is the hypervisor configuration. type HypervisorConfig struct { // KernelPath is the guest kernel host path. KernelPath string // ImagePath is the guest image host path. ImagePath string // FirmwarePath is the bios host path FirmwarePath string // MachineAccelerators are machine specific accelerators MachineAccelerators string // HypervisorPath is the hypervisor executable host path. HypervisorPath string // DisableBlockDeviceUse disallows a block device from being used. DisableBlockDeviceUse bool // KernelParams are additional guest kernel parameters. KernelParams []Param // HypervisorParams are additional hypervisor parameters. HypervisorParams []Param // HypervisorMachineType specifies the type of machine being // emulated. HypervisorMachineType string // Debug changes the default hypervisor and kernel parameters to // enable debug output where available. Debug bool // DefaultVCPUs specifies default number of vCPUs for the VM. // Pod configuration VMConfig.VCPUs overwrites this. DefaultVCPUs uint32 // DefaultMem specifies default memory size in MiB for the VM. // Pod configuration VMConfig.Memory overwrites this. DefaultMemSz uint32 // DefaultBridges specifies default number of bridges for the VM. // Bridges can be used to hot plug devices DefaultBridges uint32 // MemPrealloc specifies if the memory should be pre-allocated MemPrealloc bool // HugePages specifies if the memory should be pre-allocated from huge pages HugePages bool // Realtime Used to enable/disable realtime Realtime bool // Mlock is used to control memory locking when Realtime is enabled // Realtime=true and Mlock=false, allows for swapping out of VM memory // enabling higher density Mlock bool // DisableNestingChecks is used to override customizations performed // when running on top of another VMM. DisableNestingChecks bool // customAssets is a map of assets. // Each value in that map takes precedence over the configured assets. // For example, if there is a value for the "kernel" key in this map, // it will be used for the pod's kernel path instead of KernelPath. customAssets map[assetType]*asset }

sameo · 2018-02-23T16:09:51Z

api/provision.go

+	OutboundPeak    string
+
+	// to be filled
+}


virtcontainers does not have this kind of setting, that is a potential gap. Could you please explain where that is used and what it's for?

network and disk speed limits are necessary in a cloud service deployment.

sameo · 2018-02-23T16:13:53Z

api/sandbox.go

+	Path      string
+	ReadOnly  bool
+}
+


virtcontainer's Mount:

// Mount describes a container mount. type Mount struct { Source string Destination string // Type specifies the type of filesystem to mount. Type string // Options list all the mount options of the filesystem. Options []string // HostPath used to store host side bind mount path HostPath string // ReadOnly specifies if the mount should be read only or not ReadOnly bool }

The idea is that Storage is a pod level resource and Mount in ContainerConfig can reference it. With such separation, the kata runtime can implement the AddStorage() API.

@bergwolf could you elaborate on how Mount would reference a Storage? Who decides that a container's mount is referring to a specific storage definition?

It is referenced by letting OCI.Mount.Source == pb.Storage.Mountpoint (or a subdirectory). The kata runtime API caller needs to craft proper OCI.Mount.Source to establish the mapping. It is a bit awkward than the original runv API but the change is needed because we want to use libcontainer in kata agent.

This does not need to be specified at the API level, this is very OCI specific. The kata agent implementation in virtcontainers kata_agent.go can take care of translating whatever is expected by the Kata agent itself.

@sboeuf we want to pass the OCI spec around through the API to better help kata runtime communicate with libcontainer. The OCI spec field in the agent's API is there for the same purpose -- so that we do not convert from OCI spec to kata spec and then back the OCI spec. Although we do need to make necessary modifications, the most of the OCI spec gets through untouched.

OTOH, the main purpose to have sandbox.AddStorage() is to support explicit hotplug and let upper layer prepare container resources (nics/volumes/rootfs etc.) in parallel. If we eventually decide not to pass the OCI spec around, we still need an Mount field in kata runtime's container spec to properly reference Storage.

The idea is to describe a volume (or a mount) in a generic way, so that it can be used not only by the kata agent implementation.
But I am not saying we're not saving the OCI spec, we actually do that using annotations.
I understand that you want to parallelize container resources preparation but I guess this could be done from virtcontainers if it was implemented the right way, don't you think ?

The difference is that the container rootfs is prepared outside of the runtime library. By letting upper layer initiate the preparation in parallel, we can save hundreds of milliseconds when starting a container (actual time of saving depends on which storage is used though).

If we hide them in sandbox/container creation flow as in current virtcontainers API, we can do nothing but waiting when preparing the container rootfs.

I can live with OCI spec annotation that is really not different from passing through API. IMHO it gives better type checking to pass it through API. Just my two cents though.

sameo · 2018-02-23T16:17:34Z

api/sandbox.go

+
+	Containers map[string]Container
+}
+


virtcontainers does not export the Pod structure directly to the top level API, but the API caller should use the PodConfig structure instead:

// PodConfig is a Pod configuration. type PodConfig struct { ID string Hostname string // Field specific to OCI specs, needed to setup all the hooks Hooks Hooks // VMConfig is the VM configuration to set for this pod. VMConfig Resources HypervisorType HypervisorType HypervisorConfig HypervisorConfig AgentType AgentType AgentConfig interface{} ProxyType ProxyType ProxyConfig ProxyConfig ShimType ShimType ShimConfig interface{} NetworkModel NetworkModel NetworkConfig NetworkConfig // Volumes is a list of shared volumes between the host and the Pod. Volumes []Volume // Containers describe the list of containers within a Pod. // This list can be empty and populated by adding containers // to the Pod a posteriori. Containers []ContainerConfig // Annotations keys must be unique strings and must be name-spaced // with e.g. reverse domain notation (org.clearlinux.key). Annotations map[string]string }

sameo · 2018-02-23T16:18:18Z

api/sandbox.go

+
+	OciProcess ocispecs.Process
+}
+


I think the virtcontainers equivalent for this one is Cmd:

// Cmd represents a command to execute in a running container. type Cmd struct { Args []string Envs []EnvVar WorkDir string // Note that these fields *MUST* remain as strings. // // The reason being that we want runtimes to be able to support CLI // operations like "exec --user=". That option allows the // specification of a user (either as a string username or a numeric // UID), and may optionally also include a group (groupame or GID). // // Since this type is the interface to allow the runtime to specify // the user and group the workload can run as, these user and group // fields cannot be encoded as integer values since that would imply // the runtime itself would need to perform a UID/GID lookup on the // user-specified username/groupname. But that isn't practically // possible given that to do so would require the runtime to access // the image to allow it to interrogate the appropriate databases to // convert the username/groupnames to UID/GID values. // // Note that this argument applies solely to the _runtime_ supporting // a "--user=" option when running in a "standalone mode" - there is // no issue when the runtime is called by a container manager since // all the user and group mapping is handled by the container manager // and specified to the runtime in terms of UID/GID's in the // configuration file generated by the container manager. User string PrimaryGroup string SupplementaryGroups []string Interactive bool Console string Detach bool NoNewPrivileges bool Capabilities LinuxCapabilities }

sameo · 2018-02-23T16:22:14Z

api/sandbox.go

+func (sandbox *Sandbox) Resume() error {
+}
+func (sandbox *Sandbox) Save(path string) error {
+}


The virtcontainers containers and pod API implement a large part of the above API:

CreatePod

// CreatePod is the virtcontainers pod creation entry point. // CreatePod creates a pod and its containers. It does not start them. func CreatePod(podConfig PodConfig) (VCPod, error)

DeletePod

// DeletePod is the virtcontainers pod deletion entry point. // DeletePod will stop an already running container and then delete it. func DeletePod(podID string) (VCPod, error)

StartPod

// StartPod is the virtcontainers pod starting entry point. // StartPod will talk to the given hypervisor to start an existing // pod and all its containers. func StartPod(podID string) (VCPod, error)

StopPod

// StopPod is the virtcontainers pod stopping entry point. // StopPod will talk to the given agent to stop an existing pod // and destroy all containers within that pod. func StopPod(podID string) (VCPod, error)

RunPod

// RunPod is the virtcontainers pod running entry point. // RunPod creates a pod and its containers and then it starts them. func RunPod(podConfig PodConfig) (VCPod, error)

ListPod

// ListPod is the virtcontainers pod listing entry point. func ListPod() ([]PodStatus, error)

StatusPod

// StatusPod is the virtcontainers pod status entry point. func StatusPod(podID string) (PodStatus, error)

PausePod

// PausePod is the virtcontainers pausing entry point which pauses an // already running pod. func PausePod(podID string) (VCPod, error)

ResumePod

// ResumePod is the virtcontainers resuming entry point which resumes // (or unpauses) and already paused pod. func ResumePod(podID string) (VCPod, error)

CreateContainer

// CreateContainer is the virtcontainers container creation entry point. // CreateContainer creates a container on a given pod. func CreateContainer(podID string, containerConfig ContainerConfig) (VCPod, VCContainer, error)

DeleteContainer

// DeleteContainer is the virtcontainers container deletion entry point. // DeleteContainer deletes a Container from a Pod. If the container is running, // it needs to be stopped first. func DeleteContainer(podID, containerID string) (VCContainer, error)

StartContainer

// StartContainer is the virtcontainers container starting entry point. // StartContainer starts an already created container. func StartContainer(podID, containerID string) (VCContainer, error)

StopContainer

// StopContainer is the virtcontainers container stopping entry point. // StopContainer stops an already running container. func StopContainer(podID, containerID string) (VCContainer, error)

EnterContainer

// EnterContainer is the virtcontainers container command execution entry point. // EnterContainer enters an already running container and runs a given command. func EnterContainer(podID, containerID string, cmd Cmd) (VCPod, VCContainer, *Process, error)

StatusContainer

// StatusContainer is the virtcontainers container status entry point. // StatusContainer returns a detailed container status. func StatusContainer(podID, containerID string) (ContainerStatus, error)

KillContainer

// KillContainer is the virtcontainers entry point to send a signal // to a container running inside a pod. If all is true, all processes in // the container will be sent the signal. func KillContainer(podID, containerID string, signal syscall.Signal, all bool) error

ProcessListContainer

// ProcessListContainer is the virtcontainers entry point to list // processes running inside a container func ProcessListContainer(podID, containerID string, options ProcessListOptions) (ProcessList, error)

@sameo These APIs are just wrappers over Sandbox & Persistence APIs, there are too much oci-runtime-commandline specific. What we need is just to expose the Sandbox & Persistence APIs. (The Sandbox APIs are the APIs that take Sandbox as the receiver, not sandbox name).

sameo · 2018-02-23T16:27:35Z

api/storage.go

+	Source        string
+	Fstype        string
+	MoutOption    []string
+}


This Storage is partly covered by virtcontainers Mount but some fields are missing, and the name might be misleading. A few questions:

In which context is this structure used?

How is it different than a generic container Mount

What is the Driver field about? Do you have one example to share?

gnawux · 2018-02-23T17:19:57Z

Compared to the virtcontainers API, this PR is more imperative. It exposes more structures and interfaces to the caller while virtcontainers provides a declarative only API.

@sameo when talking about "imperative" and "declarative", I assume you are talking about style or flavor rather than which is better or worse.

Firstly, the two concepts are relative, there is no absolute declarative/imperative in most of the cases. Actually, comparing to kubernetes APIs, which is a typical declarative API set, CRI is a more "imperative" interface -- for example, having launch PodSandbox, it will create containers in sandbox independently, which implies the devices may be added to a sandbox after the RunPodSandbox operation.

In fact, the higher level APIs tend to be more declarative, which is application friendly and easier for operating a big application. On the other hand, the lower level APIs tend to be not so declarative, which is more efficient and close to the nature of how the devices work.

[Will add more explanation tomorrow... sorry it's too late for me]

sameo · 2018-02-25T23:53:33Z

@gnawux

In fact, the higher level APIs tend to be more declarative, which is application friendly and easier for operating a big application. On the other hand, the lower level APIs tend to be not so declarative, which is more efficient and close to the nature of how the devices work.

Yes, definitely agree with that. I did not mean that one is better than the other. However I do believe that our lower level APIs should/could be internal and not external. The smaller the external API, the better (for security, UX, and design reasons).

A api set that intends for bridging cri(fratki,hyperd) and oci(kata-runtime) with kata. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>

gnawux · 2018-02-26T04:10:56Z

@sameo

However I do believe that our lower level APIs should/could be internal and not external. The smaller the external API, the better (for security, UX, and design reasons).

Yes, I understand what you think about, and I agree we should keep a small set external APIs for applications. Here we have some thoughts on why we put some low-level interfaces here, and it is appreciated if you could figure out a better way for them:

The Persistence and Sandbox are the top level interfaces for customer application.

Sandbox: there are three interfaces behind the Sandbox
- VM gives the control of hypervisor to sandbox;
- Agent gives the control of containers/processes via the agent in sandbox;
- Network (multiple)
Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

A Sandbox is created with CreateSandbox API, which takes a SandboxBuilder together with sandbox configurations. The SandboxBuilder include a VMBuilder for creating VM, an AgentBuilder for creating Agent, and the Persistence for saving the Sandbox.

The Kata lib provides pre-defined builders for difference cases/drivers:

VMBuilder may support different drivers for VMs, as well as factories.
- Customer applications can also have its own implementations of VM via VMBuilder.
AgentBuilder may support generate Agent which
- Connects agent with external proxy and shim processes;
- Connects agent directly with gRPC API over vsock or yamux.

The application could provide their own implementation or wrapper for the builders, which provide the flexibility for the different use case.

Customer applications may implement their own Persistence for storing sandbox info in local filesystem, local database, or somewhere else.

gnawux · 2018-02-26T04:11:54Z

Here is the detailed document for how the proposed APIs work: https://docs.google.com/document/d/1BJeIFWykgmPIr45GRPNoawPPMr-3ekTIpJwU-g5QsHs/edit?usp=sharing

jessfraz · 2018-03-02T15:29:16Z

I think I would prefer modifying the virtcontainers API with whatever extra it needs. It seems like this is a bit over the top with regards to what it needs I would prefer minimal and then see what else is necessary from there.

sboeuf

I am not gonna comment about the content of each field of every structure introduced here but the form instead.
Virtcontainers is designed to be a simple entry point with a very simple API related to the Pod abstraction. Given that, it has been designed so that you fill the appropriate configuration structures and it will take care of what needs to be done regarding those configs.
This PR exports every details by allowing external caller to deal directly with the agent for instance, which is a bad idea IMO because this caller will have to understand the implementation details of virtcontainers in order to call the exported functions in the right sequence. And this will move more code handling into each consumer of this library, meaning more duplication between different consumers (kata-runtime, cc-runtime, frakti, and so on). This leads to more control being given to each consumer, and if for instance they need to solve the same issues, they might do that in different ways without contributing to the library.
When we've been talking about modifying things for this API, I was more expecting that for instance there is a case for network which is not currently handled and you would extend the network configuration structure so that you can implement what you need inside virtcontainers.

gnawux · 2018-03-03T00:18:28Z

@jessfraz The API is based on runV API (the Sandbox part) and with some low-level interfaces (which also exist in virtcontainers, and the debate is if we need to make them public here).

It is obvious we are not a project from the scratch. With so much existing use cases from each runtime, we put a bigger API set for taking all we considered, and we'd like to learn your and other's comments on

are these requirements valid, and
if some of them are valid, are there any better designs for the requirements.

I understand @sameo are seeking a minimal API set and don't want to expose the low-level interfaces like Agent, VM, etc. I want to hide them as well. Having discussed with @laijs, I think we'd better put all these out, and collect what do you think about the requirements behind the interfaces as we state in #32 (comment) (and in the document: https://docs.google.com/document/d/1BJeIFWykgmPIr45GRPNoawPPMr-3ekTIpJwU-g5QsHs/edit?usp=sharing).

What are the low-level interfaces we put here:

VM gives the control of hypervisor to sandbox;
Agent gives the control of containers/processes via the agent in sandbox;
Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

Why we make the interfaces public though we hate to publish the detail:

The application could either select the builders provided by kata, or provide their own implementation or wrapper for the builders, which provide the flexibility for the different use case.
- Customer applications can also have its own implementations of VM via VMBuilder. VMBuilder may support different drivers for VMs, as well as factories.
- AgentBuilder may support generate Agent which
  - Connects agent with external proxy and shim processes;
  - Connects agent directly with gRPC API over vsock or yamux.
Customer applications may implement their own Persistence for storing sandbox info in local filesystem, local database, or somewhere else.

sameo · 2018-03-05T14:29:49Z

@gnawux

What are the low-level interfaces we put here:

VM gives the control of hypervisor to sandbox;

One of the key point of virtcontainers is to abstract this from the caller. If the VM lifecycle is handled outside of virtcontainers, we leave much of the complexity in the caller's hand. That's not what we want imho. We want to make the caller's life simple, and provide them with a container/sandbox API that's easy to use and close to what they're already familiar with. We want to make hardware virtualized containers just containers and hide the VM management away from them.

Agent gives the control of containers/processes via the agent in sandbox;

If virtcontainers is missing containers management APIs, I'd be happy to consider extending the API. But we need to know which ones they are.
Providing a complete, low level agent API is basically saying that we can't define a proper container level API and are passing that up to the caller. Eventually we let the caller do a myriad of things that we don't control or understand, and that we can't validate at all.

Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

So I see something a little concerning with this kind if API: We're basically letting the caller define its own persistent data storage format while in my opinion, the storage format is a part of the API. It may make sense to let callers implement their own backend driver but that should live at the transport/IO level: ReadData, WriteData, etc...virtcontainers should maintain the storage format and layout to be able to move from one backend to the other transparently.

That being said, do we have practical (i.e. existing) examples of a need for using anything else than a POSIX filesystem? Do we realistically want to use something like etcd or an SQL database to store virtcontainers data? I usually prefer to extend an API to support a real life use case rather than extending it up front and having to maintain it in the long term.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>

laijs · 2018-03-07T04:10:43Z

@ALL The methods in the agent interface are changed to be private.

It can be reviewed in the second commit:
f491d42

sboeuf · 2018-03-07T17:32:12Z

@laijs thanks for the change.
More globally, I think we might miss things in virtcontainers such as being able to add a new network interface on an existing pod, and there is no pod or container lifecycle event related to such a thing. For this kind of case, I would expect the virtcontainers API to be extended such as:

// api.go
...

func AddNetwork(podID, interfaceName string) error {
        ...
}

...

This example shows that you could try highlight what is really missing by extending the API, instead of trying to redefine everything.

bergwolf · 2018-03-08T00:45:23Z

@sameo

That being said, do we have practical (i.e. existing) examples of a need for using anything else than a POSIX filesystem? Do we realistically want to use something like etcd or an SQL database to store virtcontainers data? I usually prefer to extend an API to support a real life use case rather than extending it up front and having to maintain it in the long term.

It is quit practical to let the runtime library callers decide where/how to store the persistent data. For one thing, a database provides much better ACID support than a filesystem which is important in a production environment. For another thing, the persistent data might be part of a larger transaction that spawns beyond the knowledge of the kata runtime. And yes, runv has been providing such API since the early days and frakti/hyperd are using databases to store the persistent data.

WeiZhang555 · 2018-03-08T06:18:51Z

I would suggest that we can provide a plugin framework to let everything with many choices pluggable, so that for something like persistence, we provide some choices like database and raw file, or something like an etcd plugin

gnawux · 2018-03-08T06:48:12Z

@WeiZhang555 yes, I agree. The proposed low-level interfaces are designed for plugin. It is better to explicitly split plugin interfaces from the customer-facing interface.

sameo · 2018-03-09T15:05:14Z

api/storage.go

+	Source        string
+	Fstype        string
+	MoutOptions   []string
+}


So Storage represents e.g. an RBD or ceph volume attached to a VM/Pod? It's a pod resource that can be then mounted in one or more containers? Is that a correct description?
And so sandbox.AddStorage() is the part that attaches the storage to the VM/Pod, without mounting it in any container? Is it mounted somewhere in the pod mountns ?

Yes, your description is correct. Just to be clear, Storage is not limited to RBD. sandbox.AddStorage() is supposed to call the hypervisor hotplug code. It won't mount the storage in any container because sandbox.AddStorage() can be called before sandbox.AddContainer thus there might be no containers in the sandbox at all.

So do you see any reason why it could not be handled as part of the sandbox/pod creation step? Or is addStorage potentially being called outside of the pod/sandbox creation step?

I'm not sure if I understand your question. In CRI, sandbox creation does not include all container specs. sandbox.AddStorage is supposed to be called when handling the CRI CreateContainer() request.

Right, so CRI CreateContainer() will eventually call into virtcontainers' CreateContainer() or pod.CreateContainer(): Can we not export sandbox.AddStorage() and make it an internal call of virtcontainers CreateContainer() or pod.CreateContainer()?

As I mentioned above to reply to @sboeuf, exporting these hotplug APIs will allow upper layer to prepare rootfs/volumes/nics in parallel that can save us hundreds of milliseconds when creating a new container.

Another reason to not reject AddStorage is that we will need to export AddNic anyway to support CNI even in the kata runtime CLI case, assuming @WeiZhang555 still wants the kata-cli interface subcommand similar to runv interface he implemented in runv. IOW, the hotplug APIs are not only useful for CRI native support. They are useful for kata-cli as well.

egernst · 2018-03-29T05:03:18Z

With the merge of https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md, can this be closed, @laijs @bergwolf ?

WeiZhang555 · 2018-04-04T09:47:34Z

I'm closing this because it's not needed any more as @egernst , feel free to reopen it if I'm wrong. @laijs @bergwolf

This is the first commit related to the vendoring of all dependencies. This will prevent from any breakage in case a dependency would break the backward compatibility. Fixes kata-containers#32 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>

amshinde added the review label Feb 14, 2018

laijs force-pushed the libary-api branch from 791082e to 27a7dea Compare February 14, 2018 09:50

laijs changed the title ~~[WIP] api: a quick sketch for kata libary api~~ [WIP] api: an early sketch for kata libary api Feb 14, 2018

jodh-intel reviewed Feb 15, 2018

View reviewed changes

sameo mentioned this pull request Feb 22, 2018

Runtime merge proposal #33

Closed

bergwolf reviewed Feb 22, 2018

View reviewed changes

laijs mentioned this pull request Feb 23, 2018

virtcontainers: Initial import #41

Merged

gnawux mentioned this pull request Feb 23, 2018

Import virtcontainers into Kata Containers runtime #40

Closed

sameo reviewed Feb 23, 2018

View reviewed changes

[WIP] api: an early sketch for kata libary api

cc1f063

A api set that intends for bridging cri(fratki,hyperd) and oci(kata-runtime) with kata. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>

laijs force-pushed the libary-api branch from 27a7dea to cc1f063 Compare February 26, 2018 03:02

egernst mentioned this pull request Feb 27, 2018

working scenarios of kata containers kata-containers/documentation#16

Closed

sboeuf reviewed Mar 2, 2018

View reviewed changes

make agent's method private

f491d42

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>

laijs force-pushed the libary-api branch from 8f4a39c to f491d42 Compare March 7, 2018 04:05

laijs mentioned this pull request Mar 8, 2018

runtime API revisit #46

Closed

sameo reviewed Mar 9, 2018

View reviewed changes

bergwolf mentioned this pull request Mar 14, 2018

design: add api design doc kata-containers/documentation#27

Merged

caoruidong mentioned this pull request Apr 3, 2018

virtcontainers: define and implement pod based external API #49

Closed

WeiZhang555 closed this Apr 4, 2018

amshinde removed the review label Apr 4, 2018

WeiZhang555 mentioned this pull request Sep 30, 2018

[RFC] baseline persist data for live-upgrade #803

Closed

Conversation

laijs commented Feb 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jodh-intel commented Feb 15, 2018

Uh oh!

sameo commented Feb 20, 2018

Uh oh!

sameo commented Feb 21, 2018

Uh oh!

bergwolf commented Feb 22, 2018

Uh oh!

bergwolf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sameo commented Feb 23, 2018

Uh oh!

bergwolf commented Feb 23, 2018

Uh oh!

sameo commented Feb 23, 2018

Uh oh!

sameo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

`CreatePod`

`DeletePod`

`StartPod`

`StopPod`

`RunPod`

`ListPod`

`StatusPod`

`PausePod`

`ResumePod`

`CreateContainer`

`DeleteContainer`

`StartContainer`

`StopContainer`

`EnterContainer`

`StatusContainer`

`KillContainer`

`ProcessListContainer`

laijs Mar 5, 2018 •

edited

Loading