Skip to content

fix corrupted mount issue when driver daemonset restarted#117

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
andyzhangx:corrupted-mount
Feb 26, 2020
Merged

fix corrupted mount issue when driver daemonset restarted#117
k8s-ci-robot merged 1 commit into
kubernetes-sigs:masterfrom
andyzhangx:corrupted-mount

Conversation

@andyzhangx

@andyzhangx andyzhangx commented Feb 26, 2020

Copy link
Copy Markdown
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR together with an k8s upstream PR(kubernetes/kubernetes#88569) would fix the corrupted mount issue when fuse based CSI driver daemonset is restarted on the node:
after daemonset is restarted, original blobfuse mount is broken, this PR would handle broken mount in both NodeStage and NodePublish

  • detect the broken mount path
  • unmount broken mount path
  • remount mount path

Which issue(s) this PR fixes:

Fixes #115

Special notes for your reviewer:
Main fix is in ensureMountPoint func:

func (d *Driver) ensureMountPoint(target string) error {
	notMnt, err := d.mounter.IsLikelyNotMountPoint(target)
	if err != nil && !os.IsNotExist(err) {
		if IsCorruptedDir(target) {
			notMnt = false
			klog.Warningf("detected corrupted mount for targetPath [%s]", target)
		} else {
			return err
		}
	}
...

main code exection logic for this PR:

I0226 03:37:10.191769       1 utils.go:112] GRPC call: /csi.v1.Node/NodeStageVolume
I0226 03:37:10.191800       1 utils.go:113] GRPC request: volume_id:"andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:"skuName" value:"Standard_LRS" > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1582552362270-8081-blobfuse.csi.azure.com" >
I0226 03:37:10.191810       1 nodeserver.go:105] NodeStageVolume: called with args {VolumeId:andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 PublishContext:map[] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER >  Secrets:map[] VolumeContext:map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
W0226 03:37:10.191901       1 nodeserver.go:241] detected corrupted mount for targetPath [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount]
W0226 03:37:10.191917       1 nodeserver.go:255] ReadDir /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount failed with open /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount: transport endpoint is not connected, unmount this directory
I0226 03:37:10.191925       1 mount_linux.go:209] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount
I0226 03:37:10.674579       1 nodeserver.go:141] target /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount
fstype ext4

volumeId andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338
context map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com]
mountflags []
mountOptions [--use-https=true]
args /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount --tmp-path=/mnt/andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 --container-name=pvc-0433847e-03fd-422f-b053-5534510eb338 --use-https=true

I0226 03:37:10.813063       1 utils.go:113] GRPC request: volume_id:"andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount" target_path:"/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:"skuName" value:"Standard_LRS" > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1582552362270-8081-blobfuse.csi.azure.com" >
I0226 03:37:10.813074       1 nodeserver.go:39] NodePublishVolume: called with args {VolumeId:andy-1180alpha5#fuse9e6bc1063ad742c8a12#pvc-0433847e-03fd-422f-b053-5534510eb338 PublishContext:map[] StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount TargetPath:/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount VolumeCapability:mount:<fs_type:"ext4" > access_mode:<mode:MULTI_NODE_MULTI_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1582552362270-8081-blobfuse.csi.azure.com] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
W0226 03:37:10.813124       1 nodeserver.go:241] detected corrupted mount for targetPath [/var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount]
W0226 03:37:10.813142       1 nodeserver.go:255] ReadDir /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount failed with open /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount: transport endpoint is not connected, unmount this directory
I0226 03:37:10.813152       1 mount_linux.go:209] Unmounting /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount
I0226 03:37:10.819437       1 nodeserver.go:69] NodePublishVolume: mounting /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount at /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount with mountOptions: [bind]
I0226 03:37:10.819461       1 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount])
I0226 03:37:10.821022       1 mount_linux.go:142] Mounting cmd (mount) with arguments ([-o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount])
I0226 03:37:10.824341       1 nodeserver.go:76] NodePublishVolume: mount /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0433847e-03fd-422f-b053-5534510eb338/globalmount at /var/lib/kubelet/pods/8a2a3fdd-f52c-460b-b270-d7cc00ae7ed5/volumes/kubernetes.io~csi/pvc-0433847e-03fd-422f-b053-5534510eb338/mount successfully

Release note:

fix corrupted mount issue when driver deamonset restarted

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 26, 2020
@andyzhangx andyzhangx changed the title fix corrupted mount issue when driver deamonset restarted fix corrupted mount issue when driver daemonset restarted Feb 26, 2020
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 26, 2020
@andyzhangx andyzhangx requested a review from ZeroMagic February 26, 2020 03:56
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2020
Comment thread Makefile
Comment thread deploy/example/blobfuse-deployment.yaml Outdated
fix: corrupt mount path

doc: add two blobfuse deployments

test: fix comment

revert utils/mount depenency
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 26, 2020
@ZeroMagic

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2020
@k8s-ci-robot k8s-ci-robot merged commit 0030109 into kubernetes-sigs:master Feb 26, 2020
@hhstu

hhstu commented Dec 22, 2023

Copy link
Copy Markdown

@andyzhangx Is this support subpath?

@andyzhangx

Copy link
Copy Markdown
Member Author

@andyzhangx Is this support subpath?

@hhstu the mount path won't be broken now since we are using blobfuse-proxy by default: https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/deploy/blobfuse-proxy

@hhstu

hhstu commented Dec 23, 2023

Copy link
Copy Markdown

@andyzhangx ok,Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

restart csi-blobfuse-node daemonset would make current blobfuse mount unavailable

4 participants