Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

[WIP] api: an early sketch for kata libary api#32

Closed
laijs wants to merge 2 commits intokata-containers:masterfrom
laijs:libary-api
Closed

[WIP] api: an early sketch for kata libary api#32
laijs wants to merge 2 commits intokata-containers:masterfrom
laijs:libary-api

Conversation

@laijs
Copy link
Copy Markdown
Contributor

@laijs laijs commented Feb 14, 2018

A api set that intends for bridging cri(fratki,hyperd) and
oci(kata-runtime) with kata.

Signed-off-by: Lai Jiangshan jiangshanlai@gmail.com

@laijs laijs changed the title [WIP] api: a quick sketch for kata libary api [WIP] api: an early sketch for kata libary api Feb 14, 2018
api/agent.go Outdated

type Agent interface {
Config() AgentConfig
Realese() (json.RawMessage, error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Release().

api/agent.go Outdated
type Agent interface {
Config() AgentConfig
Realese() (json.RawMessage, error)
Destory()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Destroy().

api/agent.go Outdated
PauseSync() error
Unpause() error

APIVersion() (uint32, error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this use semantic versioning (https://semver.org/) as that would be more flexible for clients I think:

APIVersion() (string, error)

api/agent.go Outdated
DestroySandbox() error
UpdateRoute(r []Route) error
UpdateInterface(ifs []Interface) error
//OnlineCpuMem() error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or uncomment?

api/network.go Outdated
Bridge string
Ip string
Mac string
Mtu uint64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer IP, MAC and MTU here as they are all abbreviations.

api/provision.go Outdated

// add cpu/memory/storage/network and other devices.
// if the vm is not started, Device might be added on boot instead of hutpluging
AddDeivce(Device) (kata.Device, error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddDevice()

api/provision.go Outdated
// the resulted device as normal block device (without dirver field).
AddStorage(Storage) (Storage, error)
RemoveStorage(Storage) error
AddNic(NetowrkDevice) (Interface, error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddNIC(NetworkDevice)?

api/provision.go Outdated
AddStorage(Storage) (Storage, error)
RemoveStorage(Storage) error
AddNic(NetowrkDevice) (Interface, error)
RemoveNic(NetowrkDevice) error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveNIC(NetworkDevice)?

api/provision.go Outdated
Build(VMConfig, AgentConfig, QosConfig) (VM, error)
BuildFromReleased(json.RawMessage) (VM, error)
BuildFromSnapshot(Snapshot, VMConfig, AgentConfig, QosConfig) (VM, error)
Cappabilities() BuilderCappabilities
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capabilities() BuilderCapabilities.

type ContainerConfig struct {
Id string

UGI *UserGroupInfo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe UsersAndGroups would make this a little clearer.

@jodh-intel
Copy link
Copy Markdown

Thanks for putting this together @laijs. Just a few initial typo corrections mostly.

@sameo
Copy link
Copy Markdown

sameo commented Feb 20, 2018

@laijs If we go with proposal #33 , this PR will have to be redefined as the current virtcontainers API has a lot of overlap with what what you're proposing here.

@sameo
Copy link
Copy Markdown

sameo commented Feb 21, 2018

@laijs @gnawux The virtcontainers 1.0 API is now documented here:

https://github.com/containers/virtcontainers/blob/master/documentation/api/1.0/api.md

@sameo sameo mentioned this pull request Feb 22, 2018
@bergwolf
Copy link
Copy Markdown
Member

@sameo I thought if the virtcontainers API has a lot of overlap with the API in this PR, it will be quite easy for the virtcontainers framework to support the API defined here. Why do you say it has to be redefined?

Copy link
Copy Markdown
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laijs what about the persist API and VM Associate? They are needed if we want to reconnect to a running vm.


HyperstartCtlPath, HyperstartStreamPath string
KataPath string
KataVsockCid, KataVsockPort uint32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also need documentation especially KataPath which is not a self-explaining name.

@sameo
Copy link
Copy Markdown

sameo commented Feb 23, 2018

@bergwolf

I thought if the virtcontainers API has a lot of overlap with the API in this PR, it will be quite easy for the virtcontainers framework to support the API defined here. Why do you say it has to be redefined?

I think there's some confusion here. There is indeed a lot of overlap between what's proposed here and the current virtcontainers API. So are you proposing that we implement the kata containers runtime API on top of the virtcontainers API, or would the (potentially modified) virtcontainers API become the runtime API? Given the overlap between the 2, I think the first option would mostly be redundant.

@bergwolf
Copy link
Copy Markdown
Member

@sameo I certainly do not want to see two sets of APIs in the kata runtime. I think the goal of the API discussion here is to have a unified API to support both kata command line and CRI native implementations.

There are already many overlaps between @laijs's PR and the current virtconatiners API. So please comment on what the gaps you have in mind (both design-wise and implementation-wise) and we can work together to improve the API definition to a state that all stakeholders are satisfied. When we all agree that the API is good enough, we should be able to implement the API with virtcontainers framework and use it to support both use cases.

@sameo
Copy link
Copy Markdown

sameo commented Feb 23, 2018

@bergwolf I'll do an analysis of what's currently missing from the current virtcontainers API to implement something that would satisfy this PR, at least from what I understand from it.

Copy link
Copy Markdown

@sameo sameo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the virtcontainers API, this PR is more imperative. It exposes more structures and interfaces to the caller while virtcontainers provides a declarative only API.

Assuming this PR describes what Frakti would need to consume the Kata Containers runtime API, I wonder how useful exporting the agent, networking, persistence is? Keeping those as internal interfaces would already get us closer in terms of convergence.

// proxies, shim should be implemented via AgentBuilder (and Agent.CreateContainer() Agent.ExecProcess())
type AgentBuilder interface {
Build(AgentConfig) Agent
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the agent API, virtcontainers is restricting the runtime API to the agent configuration structure:

// HyperConfig is a structure storing information needed for
// hyperstart agent initialization.
type HyperConfig struct {
	SockCtlName string
	SockTtyName string
}

// KataAgentConfig is a structure storing information needed
// to reach the Kata Containers agent.
type KataAgentConfig struct {
	GRPCSocket string
}

The agent interface is internal as we believe it should not be implemented by the runtime, but should be an internal virtcontainers implementation. The runtime implementer should only specify which agent type and config it's going to be using. For the record, here is our internal agent interface:

// agent is the virtcontainers agent interface.
// Agents are running in the guest VM and handling
// communications between the host and guest.
type agent interface {
	// init is used to pass agent specific configuration to the agent implementation.
	// agent implementations also will typically start listening for agent events from
	// init().
	// After init() is called, agent implementations should be initialized and ready
	// to handle all other Agent interface methods.
	init(pod *Pod, config interface{}) error

	// capabilities should return a structure that specifies the capabilities
	// supported by the agent.
	capabilities() capabilities

	// createPod will tell the agent to perform necessary setup for a Pod.
	createPod(pod *Pod) error

	// exec will tell the agent to run a command in an already running container.
	exec(pod *Pod, c Container, cmd Cmd) (*Process, error)

	// startPod will tell the agent to start all containers related to the Pod.
	startPod(pod Pod) error

	// stopPod will tell the agent to stop all containers related to the Pod.
	stopPod(pod Pod) error

	// createContainer will tell the agent to create a container related to a Pod.
	createContainer(pod *Pod, c *Container) (*Process, error)

	// startContainer will tell the agent to start a container related to a Pod.
	startContainer(pod Pod, c *Container) error

	// stopContainer will tell the agent to stop a container related to a Pod.
	stopContainer(pod Pod, c Container) error

	// killContainer will tell the agent to send a signal to a
	// container related to a Pod. If all is true, all processes in
	// the container will be sent the signal.
	killContainer(pod Pod, c Container, signal syscall.Signal, all bool) error

	// processListContainer will list the processes running inside the container
	processListContainer(pod Pod, c Container, options ProcessListOptions) (ProcessList, error)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent package and the sandbox package will be different packages. Therefore, Agent and its Methods have to be exposed so that the sandbox package can use the agent package.

ID() string
Add(InterfaceConfig) (InterfaceDeive, error)
Remove(InterfaceDeive) error
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again the virtcontainers approach is to let the runtime API consumer configure the network through a relatively simple networking configuration structure:

// NetworkConfig is the network configuration related to a network.
type NetworkConfig struct {
	NetNSPath         string
	NumInterfaces     int
	InterworkingModel NetInterworkingModel
}

const (
	// NetXConnectDefaultModel Ask to use DefaultNetInterworkingModel
	NetXConnectDefaultModel NetInterworkingModel = iota

	// NetXConnectBridgedModel uses a linux bridge to interconnect
	// the container interface to the VM. This is the
	// safe default that works for most cases except
	// macvlan and ipvlan
	NetXConnectBridgedModel

	// NetXConnectMacVtapModel can be used when the Container network
	// interface can be bridged using macvtap
	NetXConnectMacVtapModel

	// NetXConnectEnlightenedModel can be used when the Network plugins
	// are enlightened to create VM native interfaces
	// when requested by the runtime
	// This will be used for vethtap, macvtap, ipvtap
	NetXConnectEnlightenedModel

	// NetXConnectInvalidModel is the last item to check valid values by IsValid()
	NetXConnectInvalidModel
)

and again the network interface is internal. We believe the runtime should only specify which inter networking model it wants to use, and let virtcontainers handle that. After all both OCI and CRI do not include networking specification and the whole pod/container networking setup is offloaded outside the actual container runtime.
For the record, our internal network interface is defined as:

// network is the virtcontainers network interface.
// Container network plugins are used to setup virtual network
// between VM netns and the host network physical interface.
type network interface {
	// init initializes the network, setting a new network namespace.
	init(config NetworkConfig) (string, bool, error)

	// run runs a callback function in a specified network namespace.
	run(networkNSPath string, cb func() error) error

	// add adds all needed interfaces inside the network namespace.
	add(pod Pod, config NetworkConfig, netNsPath string, netNsCreated bool) (NetworkNamespace, error)

	// remove unbridges and deletes TAP interfaces. It also removes virtual network
	// interfaces and deletes the network namespace.
	remove(pod Pod, networkNS NetworkNamespace) error
}

ListSandboxs()[]SandboxStatus
ListContainers()[]ContainerStatus
SandboxOfContainer(string)string
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @laijs pointed out, the equivalent for this in virtcontainers is the resourceStorage interface:

// resourceStorage is the virtcontainers resources (configuration, state, etc...)
// storage interface.
// The default resource storage implementation is filesystem.
type resourceStorage interface {
	// Create all resources for a pod
	createAllResources(pod Pod) error

	// Resources URIs functions return both the URI
	// for the actual resource and the URI base.
	containerURI(podID, containerID string, resource podResource) (string, string, error)
	podURI(podID string, resource podResource) (string, string, error)

	// Pod resources
	storePodResource(podID string, resource podResource, data interface{}) error
	deletePodResources(podID string, resources []podResource) error
	fetchPodConfig(podID string) (PodConfig, error)
	fetchPodState(podID string) (State, error)
	fetchPodNetwork(podID string) (NetworkNamespace, error)
	storePodNetwork(podID string, networkNS NetworkNamespace) error

	// Hypervisor resources
	fetchHypervisorState(podID string, state interface{}) error
	storeHypervisorState(podID string, state interface{}) error

	// Agent resources
	fetchAgentState(podID string, state interface{}) error
	storeAgentState(podID string, state interface{}) error

	// Container resources
	storeContainerResource(podID, containerID string, resource podResource, data interface{}) error
	deleteContainerResources(podID, containerID string, resources []podResource) error
	fetchContainerConfig(podID, containerID string) (ContainerConfig, error)
	fetchContainerState(podID, containerID string) (State, error)
	fetchContainerProcess(podID, containerID string) (Process, error)
	storeContainerProcess(podID, containerID string, process Process) error
	fetchContainerMounts(podID, containerID string) ([]Mount, error)
	storeContainerMounts(podID, containerID string, mounts []Mount) error
	fetchContainerDevices(podID, containerID string) ([]Device, error)
	storeContainerDevices(podID, containerID string, devices []Device) error
}

that is defined under the (I agree, misnamed) filesystem.go. The filesystem structure implements that interface.

I agree part of this API should be exported, but here again not down to the persistence/resourceStorage interface level, but to the persistence/resourceStorage type instead. In other words, allowing the runtime API caller to select which persistence backend it wants to use and configure it.


// to be filled
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A combination of those virtcontainers structures are equivalent to the above one:

// Resources describes VM resources configuration.
type Resources struct {
	// VCPUs is the number of available virtual CPUs.
	VCPUs uint

	// Memory is the amount of available memory in MiB.
	Memory uint
}

// HypervisorConfig is the hypervisor configuration.
type HypervisorConfig struct {
	// KernelPath is the guest kernel host path.
	KernelPath string

	// ImagePath is the guest image host path.
	ImagePath string

	// FirmwarePath is the bios host path
	FirmwarePath string

	// MachineAccelerators are machine specific accelerators
	MachineAccelerators string

	// HypervisorPath is the hypervisor executable host path.
	HypervisorPath string

	// DisableBlockDeviceUse disallows a block device from being used.
	DisableBlockDeviceUse bool

	// KernelParams are additional guest kernel parameters.
	KernelParams []Param

	// HypervisorParams are additional hypervisor parameters.
	HypervisorParams []Param

	// HypervisorMachineType specifies the type of machine being
	// emulated.
	HypervisorMachineType string

	// Debug changes the default hypervisor and kernel parameters to
	// enable debug output where available.
	Debug bool

	// DefaultVCPUs specifies default number of vCPUs for the VM.
	// Pod configuration VMConfig.VCPUs overwrites this.
	DefaultVCPUs uint32

	// DefaultMem specifies default memory size in MiB for the VM.
	// Pod configuration VMConfig.Memory overwrites this.
	DefaultMemSz uint32

	// DefaultBridges specifies default number of bridges for the VM.
	// Bridges can be used to hot plug devices
	DefaultBridges uint32

	// MemPrealloc specifies if the memory should be pre-allocated
	MemPrealloc bool

	// HugePages specifies if the memory should be pre-allocated from huge pages
	HugePages bool

	// Realtime Used to enable/disable realtime
	Realtime bool

	// Mlock is used to control memory locking when Realtime is enabled
	// Realtime=true and Mlock=false, allows for swapping out of VM memory
	// enabling higher density
	Mlock bool

	// DisableNestingChecks is used to override customizations performed
	// when running on top of another VMM.
	DisableNestingChecks bool

	// customAssets is a map of assets.
	// Each value in that map takes precedence over the configured assets.
	// For example, if there is a value for the "kernel" key in this map,
	// it will be used for the pod's kernel path instead of KernelPath.
	customAssets map[assetType]*asset
}

OutboundPeak string

// to be filled
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtcontainers does not have this kind of setting, that is a potential gap. Could you please explain where that is used and what it's for?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network and disk speed limits are necessary in a cloud service deployment.

Path string
ReadOnly bool
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtcontainer's Mount:

// Mount describes a container mount.
type Mount struct {
	Source      string
	Destination string

	// Type specifies the type of filesystem to mount.
	Type string

	// Options list all the mount options of the filesystem.
	Options []string

	// HostPath used to store host side bind mount path
	HostPath string

	// ReadOnly specifies if the mount should be read only or not
	ReadOnly bool
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that Storage is a pod level resource and Mount in ContainerConfig can reference it. With such separation, the kata runtime can implement the AddStorage() API.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf could you elaborate on how Mount would reference a Storage? Who decides that a container's mount is referring to a specific storage definition?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is referenced by letting OCI.Mount.Source == pb.Storage.Mountpoint (or a subdirectory). The kata runtime API caller needs to craft proper OCI.Mount.Source to establish the mapping. It is a bit awkward than the original runv API but the change is needed because we want to use libcontainer in kata agent.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not need to be specified at the API level, this is very OCI specific. The kata agent implementation in virtcontainers kata_agent.go can take care of translating whatever is expected by the Kata agent itself.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf we want to pass the OCI spec around through the API to better help kata runtime communicate with libcontainer. The OCI spec field in the agent's API is there for the same purpose -- so that we do not convert from OCI spec to kata spec and then back the OCI spec. Although we do need to make necessary modifications, the most of the OCI spec gets through untouched.

OTOH, the main purpose to have sandbox.AddStorage() is to support explicit hotplug and let upper layer prepare container resources (nics/volumes/rootfs etc.) in parallel. If we eventually decide not to pass the OCI spec around, we still need an Mount field in kata runtime's container spec to properly reference Storage.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to describe a volume (or a mount) in a generic way, so that it can be used not only by the kata agent implementation.
But I am not saying we're not saving the OCI spec, we actually do that using annotations.
I understand that you want to parallelize container resources preparation but I guess this could be done from virtcontainers if it was implemented the right way, don't you think ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that the container rootfs is prepared outside of the runtime library. By letting upper layer initiate the preparation in parallel, we can save hundreds of milliseconds when starting a container (actual time of saving depends on which storage is used though).

If we hide them in sandbox/container creation flow as in current virtcontainers API, we can do nothing but waiting when preparing the container rootfs.

I can live with OCI spec annotation that is really not different from passing through API. IMHO it gives better type checking to pass it through API. Just my two cents though.

api/sandbox.go Outdated

Containers map[string]Container
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtcontainers does not export the Pod structure directly to the top level API, but the API caller should use the PodConfig structure instead:

// PodConfig is a Pod configuration.
type PodConfig struct {
	ID string

	Hostname string

	// Field specific to OCI specs, needed to setup all the hooks
	Hooks Hooks

	// VMConfig is the VM configuration to set for this pod.
	VMConfig Resources

	HypervisorType   HypervisorType
	HypervisorConfig HypervisorConfig

	AgentType   AgentType
	AgentConfig interface{}

	ProxyType   ProxyType
	ProxyConfig ProxyConfig

	ShimType   ShimType
	ShimConfig interface{}

	NetworkModel  NetworkModel
	NetworkConfig NetworkConfig

	// Volumes is a list of shared volumes between the host and the Pod.
	Volumes []Volume

	// Containers describe the list of containers within a Pod.
	// This list can be empty and populated by adding containers
	// to the Pod a posteriori.
	Containers []ContainerConfig

	// Annotations keys must be unique strings and must be name-spaced
	// with e.g. reverse domain notation (org.clearlinux.key).
	Annotations map[string]string
}


OciProcess ocispecs.Process
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the virtcontainers equivalent for this one is Cmd:

// Cmd represents a command to execute in a running container.
type Cmd struct {
	Args    []string
	Envs    []EnvVar
	WorkDir string

	// Note that these fields *MUST* remain as strings.
	//
	// The reason being that we want runtimes to be able to support CLI
	// operations like "exec --user=". That option allows the
	// specification of a user (either as a string username or a numeric
	// UID), and may optionally also include a group (groupame or GID).
	//
	// Since this type is the interface to allow the runtime to specify
	// the user and group the workload can run as, these user and group
	// fields cannot be encoded as integer values since that would imply
	// the runtime itself would need to perform a UID/GID lookup on the
	// user-specified username/groupname. But that isn't practically
	// possible given that to do so would require the runtime to access
	// the image to allow it to interrogate the appropriate databases to
	// convert the username/groupnames to UID/GID values.
	//
	// Note that this argument applies solely to the _runtime_ supporting
	// a "--user=" option when running in a "standalone mode" - there is
	// no issue when the runtime is called by a container manager since
	// all the user and group mapping is handled by the container manager
	// and specified to the runtime in terms of UID/GID's in the
	// configuration file generated by the container manager.
	User                string
	PrimaryGroup        string
	SupplementaryGroups []string

	Interactive     bool
	Console         string
	Detach          bool
	NoNewPrivileges bool
	Capabilities    LinuxCapabilities
}

func (sandbox *Sandbox) Resume() error {
}
func (sandbox *Sandbox) Save(path string) error {
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The virtcontainers containers and pod API implement a large part of the above API:

CreatePod

// CreatePod is the virtcontainers pod creation entry point.
// CreatePod creates a pod and its containers. It does not start them.
func CreatePod(podConfig PodConfig) (VCPod, error)

DeletePod

// DeletePod is the virtcontainers pod deletion entry point.
// DeletePod will stop an already running container and then delete it.
func DeletePod(podID string) (VCPod, error)

StartPod

// StartPod is the virtcontainers pod starting entry point.
// StartPod will talk to the given hypervisor to start an existing
// pod and all its containers.
func StartPod(podID string) (VCPod, error)

StopPod

// StopPod is the virtcontainers pod stopping entry point.
// StopPod will talk to the given agent to stop an existing pod
// and destroy all containers within that pod.
func StopPod(podID string) (VCPod, error)

RunPod

// RunPod is the virtcontainers pod running entry point.
// RunPod creates a pod and its containers and then it starts them.
func RunPod(podConfig PodConfig) (VCPod, error)

ListPod

// ListPod is the virtcontainers pod listing entry point.
func ListPod() ([]PodStatus, error)

StatusPod

// StatusPod is the virtcontainers pod status entry point.
func StatusPod(podID string) (PodStatus, error)

PausePod

// PausePod is the virtcontainers pausing entry point which pauses an
// already running pod.
func PausePod(podID string) (VCPod, error)

ResumePod

// ResumePod is the virtcontainers resuming entry point which resumes
// (or unpauses) and already paused pod.
func ResumePod(podID string) (VCPod, error)

CreateContainer

// CreateContainer is the virtcontainers container creation entry point.
// CreateContainer creates a container on a given pod.
func CreateContainer(podID string, containerConfig ContainerConfig) (VCPod, VCContainer, error)

DeleteContainer

// DeleteContainer is the virtcontainers container deletion entry point.
// DeleteContainer deletes a Container from a Pod. If the container is running,
// it needs to be stopped first.
func DeleteContainer(podID, containerID string) (VCContainer, error)

StartContainer

// StartContainer is the virtcontainers container starting entry point.
// StartContainer starts an already created container.
func StartContainer(podID, containerID string) (VCContainer, error)

StopContainer

// StopContainer is the virtcontainers container stopping entry point.
// StopContainer stops an already running container.
func StopContainer(podID, containerID string) (VCContainer, error)

EnterContainer

// EnterContainer is the virtcontainers container command execution entry point.
// EnterContainer enters an already running container and runs a given command.
func EnterContainer(podID, containerID string, cmd Cmd) (VCPod, VCContainer, *Process, error)

StatusContainer

// StatusContainer is the virtcontainers container status entry point.
// StatusContainer returns a detailed container status.
func StatusContainer(podID, containerID string) (ContainerStatus, error)

KillContainer

// KillContainer is the virtcontainers entry point to send a signal
// to a container running inside a pod. If all is true, all processes in
// the container will be sent the signal.
func KillContainer(podID, containerID string, signal syscall.Signal, all bool) error

ProcessListContainer

// ProcessListContainer is the virtcontainers entry point to list
// processes running inside a container
func ProcessListContainer(podID, containerID string, options ProcessListOptions) (ProcessList, error)

Copy link
Copy Markdown
Contributor Author

@laijs laijs Mar 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo These APIs are just wrappers over Sandbox & Persistence APIs, there are too much oci-runtime-commandline specific. What we need is just to expose the Sandbox & Persistence APIs. (The Sandbox APIs are the APIs that take Sandbox as the receiver, not sandbox name).

Source string
Fstype string
MoutOption []string
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Storage is partly covered by virtcontainers Mount but some fields are missing, and the name might be misleading. A few questions:

  • In which context is this structure used?
  • How is it different than a generic container Mount
  • What is the Driver field about? Do you have one example to share?

@gnawux
Copy link
Copy Markdown
Member

gnawux commented Feb 23, 2018

Compared to the virtcontainers API, this PR is more imperative. It exposes more structures and interfaces to the caller while virtcontainers provides a declarative only API.

@sameo when talking about "imperative" and "declarative", I assume you are talking about style or flavor rather than which is better or worse.

Firstly, the two concepts are relative, there is no absolute declarative/imperative in most of the cases. Actually, comparing to kubernetes APIs, which is a typical declarative API set, CRI is a more "imperative" interface -- for example, having launch PodSandbox, it will create containers in sandbox independently, which implies the devices may be added to a sandbox after the RunPodSandbox operation.

In fact, the higher level APIs tend to be more declarative, which is application friendly and easier for operating a big application. On the other hand, the lower level APIs tend to be not so declarative, which is more efficient and close to the nature of how the devices work.

[Will add more explanation tomorrow... sorry it's too late for me]

@sameo
Copy link
Copy Markdown

sameo commented Feb 25, 2018

@gnawux

In fact, the higher level APIs tend to be more declarative, which is application friendly and easier for operating a big application. On the other hand, the lower level APIs tend to be not so declarative, which is more efficient and close to the nature of how the devices work.

Yes, definitely agree with that. I did not mean that one is better than the other. However I do believe that our lower level APIs should/could be internal and not external. The smaller the external API, the better (for security, UX, and design reasons).

A api set that intends for bridging cri(fratki,hyperd) and
oci(kata-runtime) with kata.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
@gnawux
Copy link
Copy Markdown
Member

gnawux commented Feb 26, 2018

@sameo

However I do believe that our lower level APIs should/could be internal and not external. The smaller the external API, the better (for security, UX, and design reasons).

Yes, I understand what you think about, and I agree we should keep a small set external APIs for applications. Here we have some thoughts on why we put some low-level interfaces here, and it is appreciated if you could figure out a better way for them:

The Persistence and Sandbox are the top level interfaces for customer application.

  • Sandbox: there are three interfaces behind the Sandbox
    • VM gives the control of hypervisor to sandbox;
    • Agent gives the control of containers/processes via the agent in sandbox;
    • Network (multiple)
  • Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

A Sandbox is created with CreateSandbox API, which takes a SandboxBuilder together with sandbox configurations. The SandboxBuilder include a VMBuilder for creating VM, an AgentBuilder for creating Agent, and the Persistence for saving the Sandbox.

The Kata lib provides pre-defined builders for difference cases/drivers:

  • VMBuilder may support different drivers for VMs, as well as factories.
    • Customer applications can also have its own implementations of VM via VMBuilder.
  • AgentBuilder may support generate Agent which
    • Connects agent with external proxy and shim processes;
    • Connects agent directly with gRPC API over vsock or yamux.

The application could provide their own implementation or wrapper for the builders, which provide the flexibility for the different use case.

Customer applications may implement their own Persistence for storing sandbox info in local filesystem, local database, or somewhere else.

@gnawux
Copy link
Copy Markdown
Member

gnawux commented Feb 26, 2018

Here is the detailed document for how the proposed APIs work: https://docs.google.com/document/d/1BJeIFWykgmPIr45GRPNoawPPMr-3ekTIpJwU-g5QsHs/edit?usp=sharing

@jessfraz
Copy link
Copy Markdown

jessfraz commented Mar 2, 2018

I think I would prefer modifying the virtcontainers API with whatever extra it needs. It seems like this is a bit over the top with regards to what it needs I would prefer minimal and then see what else is necessary from there.

Copy link
Copy Markdown

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not gonna comment about the content of each field of every structure introduced here but the form instead.
Virtcontainers is designed to be a simple entry point with a very simple API related to the Pod abstraction. Given that, it has been designed so that you fill the appropriate configuration structures and it will take care of what needs to be done regarding those configs.
This PR exports every details by allowing external caller to deal directly with the agent for instance, which is a bad idea IMO because this caller will have to understand the implementation details of virtcontainers in order to call the exported functions in the right sequence. And this will move more code handling into each consumer of this library, meaning more duplication between different consumers (kata-runtime, cc-runtime, frakti, and so on). This leads to more control being given to each consumer, and if for instance they need to solve the same issues, they might do that in different ways without contributing to the library.
When we've been talking about modifying things for this API, I was more expecting that for instance there is a case for network which is not currently handled and you would extend the network configuration structure so that you can implement what you need inside virtcontainers.

@gnawux
Copy link
Copy Markdown
Member

gnawux commented Mar 3, 2018

@jessfraz The API is based on runV API (the Sandbox part) and with some low-level interfaces (which also exist in virtcontainers, and the debate is if we need to make them public here).

It is obvious we are not a project from the scratch. With so much existing use cases from each runtime, we put a bigger API set for taking all we considered, and we'd like to learn your and other's comments on

  • are these requirements valid, and
  • if some of them are valid, are there any better designs for the requirements.

I understand @sameo are seeking a minimal API set and don't want to expose the low-level interfaces like Agent, VM, etc. I want to hide them as well. Having discussed with @laijs, I think we'd better put all these out, and collect what do you think about the requirements behind the interfaces as we state in #32 (comment) (and in the document: https://docs.google.com/document/d/1BJeIFWykgmPIr45GRPNoawPPMr-3ekTIpJwU-g5QsHs/edit?usp=sharing).

What are the low-level interfaces we put here:

  • VM gives the control of hypervisor to sandbox;
  • Agent gives the control of containers/processes via the agent in sandbox;
  • Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

Why we make the interfaces public though we hate to publish the detail:

  • The application could either select the builders provided by kata, or provide their own implementation or wrapper for the builders, which provide the flexibility for the different use case.
    • Customer applications can also have its own implementations of VM via VMBuilder. VMBuilder may support different drivers for VMs, as well as factories.
    • AgentBuilder may support generate Agent which
      • Connects agent with external proxy and shim processes;
      • Connects agent directly with gRPC API over vsock or yamux.
  • Customer applications may implement their own Persistence for storing sandbox info in local filesystem, local database, or somewhere else.

@sameo
Copy link
Copy Markdown

sameo commented Mar 5, 2018

@gnawux

What are the low-level interfaces we put here:

VM gives the control of hypervisor to sandbox;

One of the key point of virtcontainers is to abstract this from the caller. If the VM lifecycle is handled outside of virtcontainers, we leave much of the complexity in the caller's hand. That's not what we want imho. We want to make the caller's life simple, and provide them with a container/sandbox API that's easy to use and close to what they're already familiar with. We want to make hardware virtualized containers just containers and hide the VM management away from them.

Agent gives the control of containers/processes via the agent in sandbox;

If virtcontainers is missing containers management APIs, I'd be happy to consider extending the API. But we need to know which ones they are.
Providing a complete, low level agent API is basically saying that we can't define a proper container level API and are passing that up to the caller. Eventually we let the caller do a myriad of things that we don't control or understand, and that we can't validate at all.

Persistence: is a resource storage engine, which could store the persistence data of sandboxes, and could be used for reconnecting to a running Sandbox.

So I see something a little concerning with this kind if API: We're basically letting the caller define its own persistent data storage format while in my opinion, the storage format is a part of the API. It may make sense to let callers implement their own backend driver but that should live at the transport/IO level: ReadData, WriteData, etc...virtcontainers should maintain the storage format and layout to be able to move from one backend to the other transparently.

That being said, do we have practical (i.e. existing) examples of a need for using anything else than a POSIX filesystem? Do we realistically want to use something like etcd or an SQL database to store virtcontainers data? I usually prefer to extend an API to support a real life use case rather than extending it up front and having to maintain it in the long term.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
@laijs
Copy link
Copy Markdown
Contributor Author

laijs commented Mar 7, 2018

@ALL The methods in the agent interface are changed to be private.

It can be reviewed in the second commit:
f491d42

@sboeuf
Copy link
Copy Markdown

sboeuf commented Mar 7, 2018

@laijs thanks for the change.
More globally, I think we might miss things in virtcontainers such as being able to add a new network interface on an existing pod, and there is no pod or container lifecycle event related to such a thing. For this kind of case, I would expect the virtcontainers API to be extended such as:

// api.go
...

func AddNetwork(podID, interfaceName string) error {
        ...
}

...

This example shows that you could try highlight what is really missing by extending the API, instead of trying to redefine everything.

@bergwolf
Copy link
Copy Markdown
Member

bergwolf commented Mar 8, 2018

@sameo

That being said, do we have practical (i.e. existing) examples of a need for using anything else than a POSIX filesystem? Do we realistically want to use something like etcd or an SQL database to store virtcontainers data? I usually prefer to extend an API to support a real life use case rather than extending it up front and having to maintain it in the long term.

It is quit practical to let the runtime library callers decide where/how to store the persistent data. For one thing, a database provides much better ACID support than a filesystem which is important in a production environment. For another thing, the persistent data might be part of a larger transaction that spawns beyond the knowledge of the kata runtime. And yes, runv has been providing such API since the early days and frakti/hyperd are using databases to store the persistent data.

@WeiZhang555
Copy link
Copy Markdown
Member

I would suggest that we can provide a plugin framework to let everything with many choices pluggable, so that for something like persistence, we provide some choices like database and raw file, or something like an etcd plugin

@gnawux
Copy link
Copy Markdown
Member

gnawux commented Mar 8, 2018

@WeiZhang555 yes, I agree. The proposed low-level interfaces are designed for plugin. It is better to explicitly split plugin interfaces from the customer-facing interface.

@laijs laijs mentioned this pull request Mar 8, 2018
Source string
Fstype string
MoutOptions []string
}
Copy link
Copy Markdown

@sameo sameo Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Storage represents e.g. an RBD or ceph volume attached to a VM/Pod? It's a pod resource that can be then mounted in one or more containers? Is that a correct description?
And so sandbox.AddStorage() is the part that attaches the storage to the VM/Pod, without mounting it in any container? Is it mounted somewhere in the pod mountns ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your description is correct. Just to be clear, Storage is not limited to RBD. sandbox.AddStorage() is supposed to call the hypervisor hotplug code. It won't mount the storage in any container because sandbox.AddStorage() can be called before sandbox.AddContainer thus there might be no containers in the sandbox at all.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you see any reason why it could not be handled as part of the sandbox/pod creation step? Or is addStorage potentially being called outside of the pod/sandbox creation step?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand your question. In CRI, sandbox creation does not include all container specs. sandbox.AddStorage is supposed to be called when handling the CRI CreateContainer() request.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so CRI CreateContainer() will eventually call into virtcontainers' CreateContainer() or pod.CreateContainer(): Can we not export sandbox.AddStorage() and make it an internal call of virtcontainers CreateContainer() or pod.CreateContainer()?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above to reply to @sboeuf, exporting these hotplug APIs will allow upper layer to prepare rootfs/volumes/nics in parallel that can save us hundreds of milliseconds when creating a new container.

Another reason to not reject AddStorage is that we will need to export AddNic anyway to support CNI even in the kata runtime CLI case, assuming @WeiZhang555 still wants the kata-cli interface subcommand similar to runv interface he implemented in runv. IOW, the hotplug APIs are not only useful for CRI native support. They are useful for kata-cli as well.

@egernst
Copy link
Copy Markdown
Member

egernst commented Mar 29, 2018

@WeiZhang555
Copy link
Copy Markdown
Member

I'm closing this because it's not needed any more as @egernst , feel free to reopen it if I'm wrong. @laijs @bergwolf

@WeiZhang555 WeiZhang555 closed this Apr 4, 2018
@amshinde amshinde removed the review label Apr 4, 2018
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
This is the first commit related to the vendoring of all dependencies.
This will prevent from any breakage in case a dependency would break
the backward compatibility.

Fixes kata-containers#32

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants