agent: Add support for ephemeral volumes#236
Conversation
|
Corresponding PR in |
mount.go
Outdated
| absSource, err = filepath.EvalSymlinks(source) | ||
| if err != nil { | ||
| return grpcStatus.Errorf(codes.Internal, "Could not resolve symlink for source %v", source) | ||
| if fsType == typeTmpFs { |
There was a problem hiding this comment.
Since we have more than a couple cases now for the fsType now, I wonder if it'd warrant throwing this into a case now? WDYT?
There was a problem hiding this comment.
Yes, switch case makes more sense here.
grpc.go
Outdated
| if len(splitSourceSlice) > 1 { | ||
| k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2] | ||
| if k8sStorageType == "emphemeral" { | ||
| // Only mount tmpfs once on the host, and reuse the the mount destination |
There was a problem hiding this comment.
Just my own ignorance likely -- can you clarify "mount temps once on the host" ? The agent is running in the guest.
There was a problem hiding this comment.
Maybe we shouldn't mount tmpfs in the host. And direct mount it in the guest vm
There was a problem hiding this comment.
@egernst Sorry I will update that comment in the code to avoid the confusion. It should read, Only mount tmpfs once inside the VM, and reuse the the mount destination
Ephemeral Volumes live and die with the pod. They are created and mounted on the host by kublet and then binded mounted to every container in the pod that wants to use the volume. In case of kata, the pod is our VM Sandbox. So the volume has to be confined within the VM context. This tmpfs has to be created inside the VM, hence you see the code to create tmpfs in the agent. This way we increase the isolation provided by kata to the containers. The containers that want to share the data using ephemeral volumes won't have to leak their data to the host because the backing volume stays within VM memory.
@jshachm By the time control comes to kata, kublet has already provision tmpfs on the host. We can choose to ingore that and provision our tmpfs inside the guest as this patching achieving to do.
jodh-intel
left a comment
There was a problem hiding this comment.
Can you create some tests for this change (either unit tests here, or maybe raise an issue in https://github.com/kata-containers/tests for a different type of test)?
grpc.go
Outdated
| splitSourceSlice := strings.Split(ociMnt.Source, "/") | ||
| if len(splitSourceSlice) > 1 { | ||
| k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2] | ||
| if k8sStorageType == "emphemeral" { |
There was a problem hiding this comment.
- Is this a reliable test? This appears to be looking for the word "emphemeral" in the
ociMnt.Sourcepath (and that "magic tag" might change at some point. - This block uses deep indentation. You could save one level by restructuring slightly:
if k8sStorageType != "emphemeral" { continue } ...
There was a problem hiding this comment.
@jodh-intel Is it very reliable to check for k8sStorageType == "emphemeral" condition. The string emphemeral is introduced by this patch, https://github.com/kata-containers/runtime/pull/307/files#diff-8a56f5a1fa5971cc43e650ffa24f1e2bR376
That patch in runtime loops over the given OCI mounts to look for the mounts used by k8s for ephemeral storage. It then replaces those mount paths on the host with the mount paths in the guest by adding "/ephemeral" to the path. So it's very much in our control the existance of that string. Right now without this patch, an ephemeral volume is passed to guested via 9p.
e.g. A typical ephemeral volume path looks like, /var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume
But this resides on the host. We want that volume to be provisioned inside our VM. We can keep the same path or we can make it simpler since we have the total control on what happens inside the sandbox VM. So I replace that original source path with much simpler "/ephemeral". Inside the VM that above path becomes, /ephemeral/cache-volume
There was a problem hiding this comment.
@jodh-intel I will add some unit test cases.
grpc.go
Outdated
| for idx, ociMnt := range ociSpec.Mounts { | ||
| splitSourceSlice := strings.Split(ociMnt.Source, "/") | ||
| if len(splitSourceSlice) > 1 { | ||
| k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2] |
There was a problem hiding this comment.
You could create a variable for len(splitSourceSlice) as you've scanned the string twice here.
|
Instead of checking k8sStorageType, we should add an ephemeral storage driver type and let ociSpec.Mounts just reference it. For example, Let req.stroage have And in ociSpec.Mounts, let ociSpec.Mounts[i].Mount have The main change is in addStorages, we create the tmpfs mountpoint at |
|
@bergwolf Sounds good, I will give it a shot and update the PR. |
|
@harche just to clarify about ephemeral volumes. Do you expect anything to be present on the host at a location like I ask because if this is basically an empty volume, then great, a simple volume backed by tmpfs on the guest will do the trick. Just want to make sure we're on the same page ! |
|
@sboeuf There isn't any data initially. Ephemeral volumes are used only for sharing the data between the containers of the same pod. So the volume is empty to begin with unless a container from the respective pod writes something in it. |
|
@harche okay thanks for clarification ;) |
|
@sboeuf @bergwolf @jodh-intel I have updated this PR as well as corresponding PR in runtime. |
|
@harche Thanks for the change. It |
caoruidong
left a comment
There was a problem hiding this comment.
LGTM. Once CI is green, it can be merged.
mount.go
Outdated
| _, err = commonStorageHandler(storage) | ||
| return "", err | ||
|
|
||
| } else { |
There was a problem hiding this comment.
You can delete this else to make CI happy.
Codecov Report
@@ Coverage Diff @@
## master #236 +/- ##
==========================================
+ Coverage 43.42% 43.84% +0.42%
==========================================
Files 14 14
Lines 2091 2178 +87
==========================================
+ Hits 908 955 +47
- Misses 1070 1103 +33
- Partials 113 120 +7
Continue to review full report at Codecov.
|
|
Thanks @harche! lgtm |
sboeuf
left a comment
There was a problem hiding this comment.
One comment!
Looks good to me otherwise.
mount.go
Outdated
|
|
||
| func ephemeralStorageHandler(storage pb.Storage, s *sandbox) (string, error) { | ||
| if _, err := os.Stat(storage.MountPoint); os.IsNotExist(err) { | ||
| os.MkdirAll(storage.MountPoint, os.ModePerm) |
There was a problem hiding this comment.
@harche You do not even need to check if the directory exists before calling MkdirAll. From the docs, "If path is already a directory, MkdirAll does nothing and returns nil."
There was a problem hiding this comment.
@amshinde Well I think you need to check this because in case it does exist, you don't need to call into commonStorageHandler().
There was a problem hiding this comment.
and why are we not calling then commonStorageHandler() then? It may happen that the directory exists, but we skip mounting altogether.
There was a problem hiding this comment.
@amshinde the tmpfs mount point has to be mounted on the guest only once. Then rest of the containers will bind mount that directory.
e.g. If all the containers need to have an ephemeral volume in the pod they will bind mount, say /sharedMount directory on the VM. But before that this /sharedMount has to be mounted on the VM backed by tmpfs (but only once)
The above code is tested with a k8s yaml with 3 containers sharing the mount point.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
annotations:
io.kubernetes.cri.untrusted-workload: "true"
labels:
app: myapp
spec:
containers:
# Application Container
- name: myapp-container
volumeMounts:
- mountPath: /cache
name: cache-volume
imagePullPolicy: Never
image: app_image:latest
command: ['sh', '-c', 'cat /cache/test && sleep 3600']
initContainers:
# Copy Data container
- name: data-container
volumeMounts:
- mountPath: /cache
name: cache-volume
imagePullPolicy: Never
image: data_image:latest
command: ['sh', '-c', 'cp /test /cache/']
# Append Data container
- name: init-append-data
volumeMounts:
- mountPath: /cache
name: cache-volume
imagePullPolicy: Never
image: test_image:latest
command: ['sh', '-c', 'echo test >> /cache/test']
volumes:
- name: cache-volume
emptyDir:
medium: "Memory"
``
There was a problem hiding this comment.
@amshinde It's is intentionally skipped mounting if dir exists because then every single container end up pointing towards a seperate tmpfs volume and they don't end up sharing the volume. So only once (for the first time) you have to make sure the dir exists and then mount it on guest backed by tmpfs. The corresponding enteries in all container's OCI spec already point towards bind mounting that dir.
There was a problem hiding this comment.
@harche Thanks for the explanation, my question was based on the assumption that you would be adding storage just once for the first container.
|
Adding dnm label until the mkdir check is added. |
| if fsType != type9pFs { | ||
| var err error | ||
|
|
||
| var err error |
There was a problem hiding this comment.
nit: you dont need to declare err here, since you use a new instance in the inner scope.
mount.go
Outdated
|
|
||
| func ephemeralStorageHandler(storage pb.Storage, s *sandbox) (string, error) { | ||
| if _, err := os.Stat(storage.MountPoint); os.IsNotExist(err) { | ||
| os.MkdirAll(storage.MountPoint, os.ModePerm) |
There was a problem hiding this comment.
@harche You do not even need to check if the directory exists before calling MkdirAll. From the docs, "If path is already a directory, MkdirAll does nothing and returns nil."
mount.go
Outdated
| var err error | ||
| switch fsType { | ||
| case typeTmpFs, type9pFs: | ||
| if err := createDestinationDir(destination); err != nil { |
There was a problem hiding this comment.
@harche Are you not already creating the mount destination directory in the "ephemeral" storage handler?
There was a problem hiding this comment.
Yes, I will update the PR. Thanks.
|
@harche CI is failing,can you take a look so that this can be merged. |
|
@harche please fix the unit tests. |
Ephemeral volumes needs to mounted inside VM and not passed as 9pfs mounts. Fixes : kata-containers#235 Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
Ephemeral volumes needs to mounted inside VM
and not passed as 9pfs mounts.
Signed-off-by: Harshal Patil harshal.patil@in.ibm.com