[WIP] api: an early sketch for kata libary api#32
[WIP] api: an early sketch for kata libary api#32laijs wants to merge 2 commits intokata-containers:masterfrom
Conversation
api/agent.go
Outdated
|
|
||
| type Agent interface { | ||
| Config() AgentConfig | ||
| Realese() (json.RawMessage, error) |
api/agent.go
Outdated
| type Agent interface { | ||
| Config() AgentConfig | ||
| Realese() (json.RawMessage, error) | ||
| Destory() |
api/agent.go
Outdated
| PauseSync() error | ||
| Unpause() error | ||
|
|
||
| APIVersion() (uint32, error) |
There was a problem hiding this comment.
How about making this use semantic versioning (https://semver.org/) as that would be more flexible for clients I think:
APIVersion() (string, error)
api/agent.go
Outdated
| DestroySandbox() error | ||
| UpdateRoute(r []Route) error | ||
| UpdateInterface(ifs []Interface) error | ||
| //OnlineCpuMem() error |
api/network.go
Outdated
| Bridge string | ||
| Ip string | ||
| Mac string | ||
| Mtu uint64 |
There was a problem hiding this comment.
I'd prefer IP, MAC and MTU here as they are all abbreviations.
api/provision.go
Outdated
|
|
||
| // add cpu/memory/storage/network and other devices. | ||
| // if the vm is not started, Device might be added on boot instead of hutpluging | ||
| AddDeivce(Device) (kata.Device, error) |
api/provision.go
Outdated
| // the resulted device as normal block device (without dirver field). | ||
| AddStorage(Storage) (Storage, error) | ||
| RemoveStorage(Storage) error | ||
| AddNic(NetowrkDevice) (Interface, error) |
api/provision.go
Outdated
| AddStorage(Storage) (Storage, error) | ||
| RemoveStorage(Storage) error | ||
| AddNic(NetowrkDevice) (Interface, error) | ||
| RemoveNic(NetowrkDevice) error |
api/provision.go
Outdated
| Build(VMConfig, AgentConfig, QosConfig) (VM, error) | ||
| BuildFromReleased(json.RawMessage) (VM, error) | ||
| BuildFromSnapshot(Snapshot, VMConfig, AgentConfig, QosConfig) (VM, error) | ||
| Cappabilities() BuilderCappabilities |
There was a problem hiding this comment.
Capabilities() BuilderCapabilities.
| type ContainerConfig struct { | ||
| Id string | ||
|
|
||
| UGI *UserGroupInfo |
There was a problem hiding this comment.
Maybe UsersAndGroups would make this a little clearer.
|
Thanks for putting this together @laijs. Just a few initial typo corrections mostly. |
|
@laijs @gnawux The virtcontainers 1.0 API is now documented here: https://github.com/containers/virtcontainers/blob/master/documentation/api/1.0/api.md |
|
@sameo I thought if the virtcontainers API has a lot of overlap with the API in this PR, it will be quite easy for the virtcontainers framework to support the API defined here. Why do you say it has to be redefined? |
|
|
||
| HyperstartCtlPath, HyperstartStreamPath string | ||
| KataPath string | ||
| KataVsockCid, KataVsockPort uint32 |
There was a problem hiding this comment.
These also need documentation especially KataPath which is not a self-explaining name.
I think there's some confusion here. There is indeed a lot of overlap between what's proposed here and the current virtcontainers API. So are you proposing that we implement the kata containers runtime API on top of the virtcontainers API, or would the (potentially modified) virtcontainers API become the runtime API? Given the overlap between the 2, I think the first option would mostly be redundant. |
|
@sameo I certainly do not want to see two sets of APIs in the kata runtime. I think the goal of the API discussion here is to have a unified API to support both kata command line and CRI native implementations. There are already many overlaps between @laijs's PR and the current virtconatiners API. So please comment on what the gaps you have in mind (both design-wise and implementation-wise) and we can work together to improve the API definition to a state that all stakeholders are satisfied. When we all agree that the API is good enough, we should be able to implement the API with virtcontainers framework and use it to support both use cases. |
|
@bergwolf I'll do an analysis of what's currently missing from the current virtcontainers API to implement something that would satisfy this PR, at least from what I understand from it. |
sameo
left a comment
There was a problem hiding this comment.
Compared to the virtcontainers API, this PR is more imperative. It exposes more structures and interfaces to the caller while virtcontainers provides a declarative only API.
Assuming this PR describes what Frakti would need to consume the Kata Containers runtime API, I wonder how useful exporting the agent, networking, persistence is? Keeping those as internal interfaces would already get us closer in terms of convergence.
| // proxies, shim should be implemented via AgentBuilder (and Agent.CreateContainer() Agent.ExecProcess()) | ||
| type AgentBuilder interface { | ||
| Build(AgentConfig) Agent | ||
| } |
There was a problem hiding this comment.
For the agent API, virtcontainers is restricting the runtime API to the agent configuration structure:
// HyperConfig is a structure storing information needed for
// hyperstart agent initialization.
type HyperConfig struct {
SockCtlName string
SockTtyName string
}
// KataAgentConfig is a structure storing information needed
// to reach the Kata Containers agent.
type KataAgentConfig struct {
GRPCSocket string
}The agent interface is internal as we believe it should not be implemented by the runtime, but should be an internal virtcontainers implementation. The runtime implementer should only specify which agent type and config it's going to be using. For the record, here is our internal agent interface:
// agent is the virtcontainers agent interface.
// Agents are running in the guest VM and handling
// communications between the host and guest.
type agent interface {
// init is used to pass agent specific configuration to the agent implementation.
// agent implementations also will typically start listening for agent events from
// init().
// After init() is called, agent implementations should be initialized and ready
// to handle all other Agent interface methods.
init(pod *Pod, config interface{}) error
// capabilities should return a structure that specifies the capabilities
// supported by the agent.
capabilities() capabilities
// createPod will tell the agent to perform necessary setup for a Pod.
createPod(pod *Pod) error
// exec will tell the agent to run a command in an already running container.
exec(pod *Pod, c Container, cmd Cmd) (*Process, error)
// startPod will tell the agent to start all containers related to the Pod.
startPod(pod Pod) error
// stopPod will tell the agent to stop all containers related to the Pod.
stopPod(pod Pod) error
// createContainer will tell the agent to create a container related to a Pod.
createContainer(pod *Pod, c *Container) (*Process, error)
// startContainer will tell the agent to start a container related to a Pod.
startContainer(pod Pod, c *Container) error
// stopContainer will tell the agent to stop a container related to a Pod.
stopContainer(pod Pod, c Container) error
// killContainer will tell the agent to send a signal to a
// container related to a Pod. If all is true, all processes in
// the container will be sent the signal.
killContainer(pod Pod, c Container, signal syscall.Signal, all bool) error
// processListContainer will list the processes running inside the container
processListContainer(pod Pod, c Container, options ProcessListOptions) (ProcessList, error)
}There was a problem hiding this comment.
The agent package and the sandbox package will be different packages. Therefore, Agent and its Methods have to be exposed so that the sandbox package can use the agent package.
| ID() string | ||
| Add(InterfaceConfig) (InterfaceDeive, error) | ||
| Remove(InterfaceDeive) error | ||
| } |
There was a problem hiding this comment.
Here again the virtcontainers approach is to let the runtime API consumer configure the network through a relatively simple networking configuration structure:
// NetworkConfig is the network configuration related to a network.
type NetworkConfig struct {
NetNSPath string
NumInterfaces int
InterworkingModel NetInterworkingModel
}
const (
// NetXConnectDefaultModel Ask to use DefaultNetInterworkingModel
NetXConnectDefaultModel NetInterworkingModel = iota
// NetXConnectBridgedModel uses a linux bridge to interconnect
// the container interface to the VM. This is the
// safe default that works for most cases except
// macvlan and ipvlan
NetXConnectBridgedModel
// NetXConnectMacVtapModel can be used when the Container network
// interface can be bridged using macvtap
NetXConnectMacVtapModel
// NetXConnectEnlightenedModel can be used when the Network plugins
// are enlightened to create VM native interfaces
// when requested by the runtime
// This will be used for vethtap, macvtap, ipvtap
NetXConnectEnlightenedModel
// NetXConnectInvalidModel is the last item to check valid values by IsValid()
NetXConnectInvalidModel
)and again the network interface is internal. We believe the runtime should only specify which inter networking model it wants to use, and let virtcontainers handle that. After all both OCI and CRI do not include networking specification and the whole pod/container networking setup is offloaded outside the actual container runtime.
For the record, our internal network interface is defined as:
// network is the virtcontainers network interface.
// Container network plugins are used to setup virtual network
// between VM netns and the host network physical interface.
type network interface {
// init initializes the network, setting a new network namespace.
init(config NetworkConfig) (string, bool, error)
// run runs a callback function in a specified network namespace.
run(networkNSPath string, cb func() error) error
// add adds all needed interfaces inside the network namespace.
add(pod Pod, config NetworkConfig, netNsPath string, netNsCreated bool) (NetworkNamespace, error)
// remove unbridges and deletes TAP interfaces. It also removes virtual network
// interfaces and deletes the network namespace.
remove(pod Pod, networkNS NetworkNamespace) error
}| ListSandboxs()[]SandboxStatus | ||
| ListContainers()[]ContainerStatus | ||
| SandboxOfContainer(string)string | ||
| } |
There was a problem hiding this comment.
As @laijs pointed out, the equivalent for this in virtcontainers is the resourceStorage interface:
// resourceStorage is the virtcontainers resources (configuration, state, etc...)
// storage interface.
// The default resource storage implementation is filesystem.
type resourceStorage interface {
// Create all resources for a pod
createAllResources(pod Pod) error
// Resources URIs functions return both the URI
// for the actual resource and the URI base.
containerURI(podID, containerID string, resource podResource) (string, string, error)
podURI(podID string, resource podResource) (string, string, error)
// Pod resources
storePodResource(podID string, resource podResource, data interface{}) error
deletePodResources(podID string, resources []podResource) error
fetchPodConfig(podID string) (PodConfig, error)
fetchPodState(podID string) (State, error)
fetchPodNetwork(podID string) (NetworkNamespace, error)
storePodNetwork(podID string, networkNS NetworkNamespace) error
// Hypervisor resources
fetchHypervisorState(podID string, state interface{}) error
storeHypervisorState(podID string, state interface{}) error
// Agent resources
fetchAgentState(podID string, state interface{}) error
storeAgentState(podID string, state interface{}) error
// Container resources
storeContainerResource(podID, containerID string, resource podResource, data interface{}) error
deleteContainerResources(podID, containerID string, resources []podResource) error
fetchContainerConfig(podID, containerID string) (ContainerConfig, error)
fetchContainerState(podID, containerID string) (State, error)
fetchContainerProcess(podID, containerID string) (Process, error)
storeContainerProcess(podID, containerID string, process Process) error
fetchContainerMounts(podID, containerID string) ([]Mount, error)
storeContainerMounts(podID, containerID string, mounts []Mount) error
fetchContainerDevices(podID, containerID string) ([]Device, error)
storeContainerDevices(podID, containerID string, devices []Device) error
}that is defined under the (I agree, misnamed) filesystem.go. The filesystem structure implements that interface.
I agree part of this API should be exported, but here again not down to the persistence/resourceStorage interface level, but to the persistence/resourceStorage type instead. In other words, allowing the runtime API caller to select which persistence backend it wants to use and configure it.
|
|
||
| // to be filled | ||
| } | ||
|
|
There was a problem hiding this comment.
A combination of those virtcontainers structures are equivalent to the above one:
// Resources describes VM resources configuration.
type Resources struct {
// VCPUs is the number of available virtual CPUs.
VCPUs uint
// Memory is the amount of available memory in MiB.
Memory uint
}
// HypervisorConfig is the hypervisor configuration.
type HypervisorConfig struct {
// KernelPath is the guest kernel host path.
KernelPath string
// ImagePath is the guest image host path.
ImagePath string
// FirmwarePath is the bios host path
FirmwarePath string
// MachineAccelerators are machine specific accelerators
MachineAccelerators string
// HypervisorPath is the hypervisor executable host path.
HypervisorPath string
// DisableBlockDeviceUse disallows a block device from being used.
DisableBlockDeviceUse bool
// KernelParams are additional guest kernel parameters.
KernelParams []Param
// HypervisorParams are additional hypervisor parameters.
HypervisorParams []Param
// HypervisorMachineType specifies the type of machine being
// emulated.
HypervisorMachineType string
// Debug changes the default hypervisor and kernel parameters to
// enable debug output where available.
Debug bool
// DefaultVCPUs specifies default number of vCPUs for the VM.
// Pod configuration VMConfig.VCPUs overwrites this.
DefaultVCPUs uint32
// DefaultMem specifies default memory size in MiB for the VM.
// Pod configuration VMConfig.Memory overwrites this.
DefaultMemSz uint32
// DefaultBridges specifies default number of bridges for the VM.
// Bridges can be used to hot plug devices
DefaultBridges uint32
// MemPrealloc specifies if the memory should be pre-allocated
MemPrealloc bool
// HugePages specifies if the memory should be pre-allocated from huge pages
HugePages bool
// Realtime Used to enable/disable realtime
Realtime bool
// Mlock is used to control memory locking when Realtime is enabled
// Realtime=true and Mlock=false, allows for swapping out of VM memory
// enabling higher density
Mlock bool
// DisableNestingChecks is used to override customizations performed
// when running on top of another VMM.
DisableNestingChecks bool
// customAssets is a map of assets.
// Each value in that map takes precedence over the configured assets.
// For example, if there is a value for the "kernel" key in this map,
// it will be used for the pod's kernel path instead of KernelPath.
customAssets map[assetType]*asset
}| OutboundPeak string | ||
|
|
||
| // to be filled | ||
| } |
There was a problem hiding this comment.
virtcontainers does not have this kind of setting, that is a potential gap. Could you please explain where that is used and what it's for?
There was a problem hiding this comment.
network and disk speed limits are necessary in a cloud service deployment.
| Path string | ||
| ReadOnly bool | ||
| } | ||
|
|
There was a problem hiding this comment.
virtcontainer's Mount:
// Mount describes a container mount.
type Mount struct {
Source string
Destination string
// Type specifies the type of filesystem to mount.
Type string
// Options list all the mount options of the filesystem.
Options []string
// HostPath used to store host side bind mount path
HostPath string
// ReadOnly specifies if the mount should be read only or not
ReadOnly bool
}There was a problem hiding this comment.
The idea is that Storage is a pod level resource and Mount in ContainerConfig can reference it. With such separation, the kata runtime can implement the AddStorage() API.
There was a problem hiding this comment.
@bergwolf could you elaborate on how Mount would reference a Storage? Who decides that a container's mount is referring to a specific storage definition?
There was a problem hiding this comment.
It is referenced by letting OCI.Mount.Source == pb.Storage.Mountpoint (or a subdirectory). The kata runtime API caller needs to craft proper OCI.Mount.Source to establish the mapping. It is a bit awkward than the original runv API but the change is needed because we want to use libcontainer in kata agent.
There was a problem hiding this comment.
This does not need to be specified at the API level, this is very OCI specific. The kata agent implementation in virtcontainers kata_agent.go can take care of translating whatever is expected by the Kata agent itself.
There was a problem hiding this comment.
@sboeuf we want to pass the OCI spec around through the API to better help kata runtime communicate with libcontainer. The OCI spec field in the agent's API is there for the same purpose -- so that we do not convert from OCI spec to kata spec and then back the OCI spec. Although we do need to make necessary modifications, the most of the OCI spec gets through untouched.
OTOH, the main purpose to have sandbox.AddStorage() is to support explicit hotplug and let upper layer prepare container resources (nics/volumes/rootfs etc.) in parallel. If we eventually decide not to pass the OCI spec around, we still need an Mount field in kata runtime's container spec to properly reference Storage.
There was a problem hiding this comment.
The idea is to describe a volume (or a mount) in a generic way, so that it can be used not only by the kata agent implementation.
But I am not saying we're not saving the OCI spec, we actually do that using annotations.
I understand that you want to parallelize container resources preparation but I guess this could be done from virtcontainers if it was implemented the right way, don't you think ?
There was a problem hiding this comment.
The difference is that the container rootfs is prepared outside of the runtime library. By letting upper layer initiate the preparation in parallel, we can save hundreds of milliseconds when starting a container (actual time of saving depends on which storage is used though).
If we hide them in sandbox/container creation flow as in current virtcontainers API, we can do nothing but waiting when preparing the container rootfs.
I can live with OCI spec annotation that is really not different from passing through API. IMHO it gives better type checking to pass it through API. Just my two cents though.
api/sandbox.go
Outdated
|
|
||
| Containers map[string]Container | ||
| } | ||
|
|
There was a problem hiding this comment.
virtcontainers does not export the Pod structure directly to the top level API, but the API caller should use the PodConfig structure instead:
// PodConfig is a Pod configuration.
type PodConfig struct {
ID string
Hostname string
// Field specific to OCI specs, needed to setup all the hooks
Hooks Hooks
// VMConfig is the VM configuration to set for this pod.
VMConfig Resources
HypervisorType HypervisorType
HypervisorConfig HypervisorConfig
AgentType AgentType
AgentConfig interface{}
ProxyType ProxyType
ProxyConfig ProxyConfig
ShimType ShimType
ShimConfig interface{}
NetworkModel NetworkModel
NetworkConfig NetworkConfig
// Volumes is a list of shared volumes between the host and the Pod.
Volumes []Volume
// Containers describe the list of containers within a Pod.
// This list can be empty and populated by adding containers
// to the Pod a posteriori.
Containers []ContainerConfig
// Annotations keys must be unique strings and must be name-spaced
// with e.g. reverse domain notation (org.clearlinux.key).
Annotations map[string]string
}|
|
||
| OciProcess ocispecs.Process | ||
| } | ||
|
|
There was a problem hiding this comment.
I think the virtcontainers equivalent for this one is Cmd:
// Cmd represents a command to execute in a running container.
type Cmd struct {
Args []string
Envs []EnvVar
WorkDir string
// Note that these fields *MUST* remain as strings.
//
// The reason being that we want runtimes to be able to support CLI
// operations like "exec --user=". That option allows the
// specification of a user (either as a string username or a numeric
// UID), and may optionally also include a group (groupame or GID).
//
// Since this type is the interface to allow the runtime to specify
// the user and group the workload can run as, these user and group
// fields cannot be encoded as integer values since that would imply
// the runtime itself would need to perform a UID/GID lookup on the
// user-specified username/groupname. But that isn't practically
// possible given that to do so would require the runtime to access
// the image to allow it to interrogate the appropriate databases to
// convert the username/groupnames to UID/GID values.
//
// Note that this argument applies solely to the _runtime_ supporting
// a "--user=" option when running in a "standalone mode" - there is
// no issue when the runtime is called by a container manager since
// all the user and group mapping is handled by the container manager
// and specified to the runtime in terms of UID/GID's in the
// configuration file generated by the container manager.
User string
PrimaryGroup string
SupplementaryGroups []string
Interactive bool
Console string
Detach bool
NoNewPrivileges bool
Capabilities LinuxCapabilities
}| func (sandbox *Sandbox) Resume() error { | ||
| } | ||
| func (sandbox *Sandbox) Save(path string) error { | ||
| } |
There was a problem hiding this comment.
The virtcontainers containers and pod API implement a large part of the above API:
CreatePod
// CreatePod is the virtcontainers pod creation entry point.
// CreatePod creates a pod and its containers. It does not start them.
func CreatePod(podConfig PodConfig) (VCPod, error)DeletePod
// DeletePod is the virtcontainers pod deletion entry point.
// DeletePod will stop an already running container and then delete it.
func DeletePod(podID string) (VCPod, error)StartPod
// StartPod is the virtcontainers pod starting entry point.
// StartPod will talk to the given hypervisor to start an existing
// pod and all its containers.
func StartPod(podID string) (VCPod, error)StopPod
// StopPod is the virtcontainers pod stopping entry point.
// StopPod will talk to the given agent to stop an existing pod
// and destroy all containers within that pod.
func StopPod(podID string) (VCPod, error)RunPod
// RunPod is the virtcontainers pod running entry point.
// RunPod creates a pod and its containers and then it starts them.
func RunPod(podConfig PodConfig) (VCPod, error)ListPod
// ListPod is the virtcontainers pod listing entry point.
func ListPod() ([]PodStatus, error)StatusPod
// StatusPod is the virtcontainers pod status entry point.
func StatusPod(podID string) (PodStatus, error)PausePod
// PausePod is the virtcontainers pausing entry point which pauses an
// already running pod.
func PausePod(podID string) (VCPod, error)ResumePod
// ResumePod is the virtcontainers resuming entry point which resumes
// (or unpauses) and already paused pod.
func ResumePod(podID string) (VCPod, error)CreateContainer
// CreateContainer is the virtcontainers container creation entry point.
// CreateContainer creates a container on a given pod.
func CreateContainer(podID string, containerConfig ContainerConfig) (VCPod, VCContainer, error)DeleteContainer
// DeleteContainer is the virtcontainers container deletion entry point.
// DeleteContainer deletes a Container from a Pod. If the container is running,
// it needs to be stopped first.
func DeleteContainer(podID, containerID string) (VCContainer, error)StartContainer
// StartContainer is the virtcontainers container starting entry point.
// StartContainer starts an already created container.
func StartContainer(podID, containerID string) (VCContainer, error)StopContainer
// StopContainer is the virtcontainers container stopping entry point.
// StopContainer stops an already running container.
func StopContainer(podID, containerID string) (VCContainer, error)EnterContainer
// EnterContainer is the virtcontainers container command execution entry point.
// EnterContainer enters an already running container and runs a given command.
func EnterContainer(podID, containerID string, cmd Cmd) (VCPod, VCContainer, *Process, error)StatusContainer
// StatusContainer is the virtcontainers container status entry point.
// StatusContainer returns a detailed container status.
func StatusContainer(podID, containerID string) (ContainerStatus, error)KillContainer
// KillContainer is the virtcontainers entry point to send a signal
// to a container running inside a pod. If all is true, all processes in
// the container will be sent the signal.
func KillContainer(podID, containerID string, signal syscall.Signal, all bool) errorProcessListContainer
// ProcessListContainer is the virtcontainers entry point to list
// processes running inside a container
func ProcessListContainer(podID, containerID string, options ProcessListOptions) (ProcessList, error)There was a problem hiding this comment.
@sameo These APIs are just wrappers over Sandbox & Persistence APIs, there are too much oci-runtime-commandline specific. What we need is just to expose the Sandbox & Persistence APIs. (The Sandbox APIs are the APIs that take Sandbox as the receiver, not sandbox name).
| Source string | ||
| Fstype string | ||
| MoutOption []string | ||
| } |
There was a problem hiding this comment.
This Storage is partly covered by virtcontainers Mount but some fields are missing, and the name might be misleading. A few questions:
- In which context is this structure used?
- How is it different than a generic container Mount
- What is the
Driverfield about? Do you have one example to share?
@sameo when talking about "imperative" and "declarative", I assume you are talking about style or flavor rather than which is better or worse. Firstly, the two concepts are relative, there is no absolute declarative/imperative in most of the cases. Actually, comparing to kubernetes APIs, which is a typical declarative API set, CRI is a more "imperative" interface -- for example, having launch In fact, the higher level APIs tend to be more declarative, which is application friendly and easier for operating a big application. On the other hand, the lower level APIs tend to be not so declarative, which is more efficient and close to the nature of how the devices work. [Will add more explanation tomorrow... sorry it's too late for me] |
Yes, definitely agree with that. I did not mean that one is better than the other. However I do believe that our lower level APIs should/could be internal and not external. The smaller the external API, the better (for security, UX, and design reasons). |
A api set that intends for bridging cri(fratki,hyperd) and oci(kata-runtime) with kata. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Yes, I understand what you think about, and I agree we should keep a small set external APIs for applications. Here we have some thoughts on why we put some low-level interfaces here, and it is appreciated if you could figure out a better way for them: The
A Sandbox is created with The Kata lib provides pre-defined builders for difference cases/drivers:
The application could provide their own implementation or wrapper for the builders, which provide the flexibility for the different use case. Customer applications may implement their own Persistence for storing sandbox info in local filesystem, local database, or somewhere else. |
|
Here is the detailed document for how the proposed APIs work: https://docs.google.com/document/d/1BJeIFWykgmPIr45GRPNoawPPMr-3ekTIpJwU-g5QsHs/edit?usp=sharing |
|
I think I would prefer modifying the virtcontainers API with whatever extra it needs. It seems like this is a bit over the top with regards to what it needs I would prefer minimal and then see what else is necessary from there. |
sboeuf
left a comment
There was a problem hiding this comment.
I am not gonna comment about the content of each field of every structure introduced here but the form instead.
Virtcontainers is designed to be a simple entry point with a very simple API related to the Pod abstraction. Given that, it has been designed so that you fill the appropriate configuration structures and it will take care of what needs to be done regarding those configs.
This PR exports every details by allowing external caller to deal directly with the agent for instance, which is a bad idea IMO because this caller will have to understand the implementation details of virtcontainers in order to call the exported functions in the right sequence. And this will move more code handling into each consumer of this library, meaning more duplication between different consumers (kata-runtime, cc-runtime, frakti, and so on). This leads to more control being given to each consumer, and if for instance they need to solve the same issues, they might do that in different ways without contributing to the library.
When we've been talking about modifying things for this API, I was more expecting that for instance there is a case for network which is not currently handled and you would extend the network configuration structure so that you can implement what you need inside virtcontainers.
|
@jessfraz The API is based on runV API (the It is obvious we are not a project from the scratch. With so much existing use cases from each runtime, we put a bigger API set for taking all we considered, and we'd like to learn your and other's comments on
I understand @sameo are seeking a minimal API set and don't want to expose the low-level interfaces like What are the low-level interfaces we put here:
Why we make the interfaces public though we hate to publish the detail:
|
One of the key point of virtcontainers is to abstract this from the caller. If the VM lifecycle is handled outside of virtcontainers, we leave much of the complexity in the caller's hand. That's not what we want imho. We want to make the caller's life simple, and provide them with a container/sandbox API that's easy to use and close to what they're already familiar with. We want to make hardware virtualized containers just containers and hide the VM management away from them.
If virtcontainers is missing containers management APIs, I'd be happy to consider extending the API. But we need to know which ones they are.
So I see something a little concerning with this kind if API: We're basically letting the caller define its own persistent data storage format while in my opinion, the storage format is a part of the API. It may make sense to let callers implement their own backend driver but that should live at the transport/IO level: That being said, do we have practical (i.e. existing) examples of a need for using anything else than a POSIX filesystem? Do we realistically want to use something like etcd or an SQL database to store virtcontainers data? I usually prefer to extend an API to support a real life use case rather than extending it up front and having to maintain it in the long term. |
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
|
@laijs thanks for the change. // api.go
...
func AddNetwork(podID, interfaceName string) error {
...
}
...This example shows that you could try highlight what is really missing by extending the API, instead of trying to redefine everything. |
It is quit practical to let the runtime library callers decide where/how to store the persistent data. For one thing, a database provides much better ACID support than a filesystem which is important in a production environment. For another thing, the persistent data might be part of a larger transaction that spawns beyond the knowledge of the kata runtime. And yes, runv has been providing such API since the early days and frakti/hyperd are using databases to store the persistent data. |
|
I would suggest that we can provide a plugin framework to let everything with many choices pluggable, so that for something like persistence, we provide some choices like database and raw file, or something like an etcd plugin |
|
@WeiZhang555 yes, I agree. The proposed low-level interfaces are designed for plugin. It is better to explicitly split plugin interfaces from the customer-facing interface. |
| Source string | ||
| Fstype string | ||
| MoutOptions []string | ||
| } |
There was a problem hiding this comment.
So Storage represents e.g. an RBD or ceph volume attached to a VM/Pod? It's a pod resource that can be then mounted in one or more containers? Is that a correct description?
And so sandbox.AddStorage() is the part that attaches the storage to the VM/Pod, without mounting it in any container? Is it mounted somewhere in the pod mountns ?
There was a problem hiding this comment.
Yes, your description is correct. Just to be clear, Storage is not limited to RBD. sandbox.AddStorage() is supposed to call the hypervisor hotplug code. It won't mount the storage in any container because sandbox.AddStorage() can be called before sandbox.AddContainer thus there might be no containers in the sandbox at all.
There was a problem hiding this comment.
So do you see any reason why it could not be handled as part of the sandbox/pod creation step? Or is addStorage potentially being called outside of the pod/sandbox creation step?
There was a problem hiding this comment.
I'm not sure if I understand your question. In CRI, sandbox creation does not include all container specs. sandbox.AddStorage is supposed to be called when handling the CRI CreateContainer() request.
There was a problem hiding this comment.
Right, so CRI CreateContainer() will eventually call into virtcontainers' CreateContainer() or pod.CreateContainer(): Can we not export sandbox.AddStorage() and make it an internal call of virtcontainers CreateContainer() or pod.CreateContainer()?
There was a problem hiding this comment.
As I mentioned above to reply to @sboeuf, exporting these hotplug APIs will allow upper layer to prepare rootfs/volumes/nics in parallel that can save us hundreds of milliseconds when creating a new container.
Another reason to not reject AddStorage is that we will need to export AddNic anyway to support CNI even in the kata runtime CLI case, assuming @WeiZhang555 still wants the kata-cli interface subcommand similar to runv interface he implemented in runv. IOW, the hotplug APIs are not only useful for CRI native support. They are useful for kata-cli as well.
|
With the merge of https://github.com/kata-containers/documentation/blob/master/design/kata-api-design.md, can this be closed, @laijs @bergwolf ? |
This is the first commit related to the vendoring of all dependencies. This will prevent from any breakage in case a dependency would break the backward compatibility. Fixes kata-containers#32 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
A api set that intends for bridging cri(fratki,hyperd) and
oci(kata-runtime) with kata.
Signed-off-by: Lai Jiangshan jiangshanlai@gmail.com