Skip to content

Fix fd leak of shim log#3266

Merged
crosbymichael merged 1 commit intocontainerd:masterfrom
darfux:fix_shim_log_fd_leak
May 9, 2019
Merged

Fix fd leak of shim log#3266
crosbymichael merged 1 commit intocontainerd:masterfrom
darfux:fix_shim_log_fd_leak

Conversation

@darfux
Copy link
Contributor

@darfux darfux commented May 8, 2019

Open shim log with the flag O_RDWR will cause the Read() block forever even if the pipe has been closed on the shim side. Then the io.Copy() would never return and lead to a fd leak.

It happens on both shim v1 and v2 but may only have effect on shim v2. Because when using shim v1 , the fd leak is on containerd-shim which will exit when the runtime shim exit. But under shim v2 the shim log is opened by the containerd process so the fd won't be closed until containerd exit.

Run a pod using shim v2 and lsof on containerd:

COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF       NODE NAME
container 51419 root   12u  FIFO                8,8      0t0  351013697 /run/containerd/io.containerd.runtime.v2.task/k8s.io/cd7e64d216e7376e51463f3d9d20571ae2ae9eb009b13a315193bdf9b7133389/log
container 51419 root   13u  FIFO                8,8      0t0  351013697 /run/containerd/io.containerd.runtime.v2.task/k8s.io/cd7e64d216e7376e51463f3d9d20571ae2ae9eb009b13a315193bdf9b7133389/log

Then remove the pod, lsof again:

COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF       NODE NAME
container 51419 root   12u  FIFO                8,8      0t0  351013697 /run/containerd/io.containerd.runtime.v2.task/k8s.io/.cd7e64d216e7376e51463f3d9d20571ae2ae9eb009b13a315193bdf9b7133389/log (deleted)
container 51419 root   13u  FIFO                8,8      0t0  351013697 /run/containerd/io.containerd.runtime.v2.task/k8s.io/.cd7e64d216e7376e51463f3d9d20571ae2ae9eb009b13a315193bdf9b7133389/log (deleted)

Signed-off-by: Li Yuxuan liyuxuan04@baidu.com

@codecov-io
Copy link

codecov-io commented May 8, 2019

Codecov Report

Merging #3266 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3266   +/-   ##
=======================================
  Coverage   44.56%   44.56%           
=======================================
  Files         113      113           
  Lines       12194    12194           
=======================================
  Hits         5434     5434           
  Misses       5928     5928           
  Partials      832      832
Flag Coverage Δ
#linux 48.54% <ø> (ø) ⬆️
#windows 39.8% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d68b593...cf6e008. Read the comment docs.

@Ace-Tang
Copy link
Contributor

Ace-Tang commented May 8, 2019

Looks good to me

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpuguy83
Copy link
Member

cpuguy83 commented May 8, 2019

I feel like these are rdwr to prevent broken pipe errors... but this is the shim log itself, so I'm not positive on that.

@fuweid
Copy link
Member

fuweid commented May 9, 2019

Hi @cpuguy83 , if we use rdwr for shim log, we need to close the pipe during cleanup shim. I think the change might be easier.

@darfux
Copy link
Contributor Author

darfux commented May 9, 2019

@fuweid I think @cpuguy83 may talk about the condition under shim v1. The code below will redirect shim output to the opened shim log fd so we can not simply set it to read only. I'll re-push the commit to only patch the function of shim v2 at first and see if there's a better way under shim v1 :)

cmd, err := newCommand(binary, daemonAddress, debug, config, f, stdoutLog, stderrLog)

@darfux darfux force-pushed the fix_shim_log_fd_leak branch from 7be6024 to 84a92ab Compare May 9, 2019 10:20
@fuweid
Copy link
Member

fuweid commented May 9, 2019

@fuweid I think @cpuguy83 may talk about the condition under shim v1. The code below will redirect shim output to the opened shim log fd so we can not simply set it to read only. I'll re-push the commit to only patch the function of shim v2 at first and see if there's a better way under shim v1 :)

I was thinking the v1 change has been covered by test case. Thanks @darfux

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM(again)

@darfux darfux changed the title Fix fd leak of shim log Fix fd leak of shim v2 log May 9, 2019
@darfux darfux force-pushed the fix_shim_log_fd_leak branch from 84a92ab to dea7391 Compare May 9, 2019 11:57
@darfux
Copy link
Contributor Author

darfux commented May 9, 2019

@fuweid I think @cpuguy83 may talk about the condition under shim v1. The code below will redirect shim output to the opened shim log fd so we can not simply set it to read only. I'll re-push the commit to only patch the function of shim v2 at first and see if there's a better way under shim v1 :)

I was thinking the v1 change has been covered by test case. Thanks @darfux

@fuweid As you said we couldn't believe the test case always🤣. After debugging I find that shim v1 has fd leak as well due to a typo. And the test case doesn't get correct info from lsof so it passed. I have fixed both of them in the new push.

@darfux darfux changed the title Fix fd leak of shim v2 log Fix fd leak of shim log May 9, 2019
@darfux darfux force-pushed the fix_shim_log_fd_leak branch from dea7391 to 6be24a5 Compare May 9, 2019 12:08
Open shim v2 log with the flag `O_RDWR` will cause the `Read()` block
forever even if the pipe has been closed on the shim side. Then the
`io.Copy()` would never return and lead to a fd leak.
Fix typo when closing shim v1 log which causes the `stdouLog` leak.
Update `numPipes` function in test case to get the opened FIFO
correctly.

Signed-off-by: Li Yuxuan <liyuxuan04@baidu.com>
@darfux darfux force-pushed the fix_shim_log_fd_leak branch from 6be24a5 to cf6e008 Compare May 9, 2019 12:22
Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; great catch on the typo in v1 shim! @crosbymichael might want to comment on the lsof; could there be different versions of lsof that have changed the output?

@crosbymichael
Copy link
Member

LGTM

@crosbymichael crosbymichael merged commit 0485499 into containerd:master May 9, 2019
@darfux darfux deleted the fix_shim_log_fd_leak branch May 9, 2019 15:43
@estesp
Copy link
Member

estesp commented May 10, 2019

I just added cherry-pick labels to this PR; I assume this is a pretty important bug fix that should go back to release/1.1 and release/1.2. If for any reason I'm not correct, feel free to remove the labels.

@fuweid
Copy link
Member

fuweid commented May 10, 2019

@estesp agree to cherry-pick.

I have created one for 1.2.x. But I am not sure it can pass the CI. PTAL

#3273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants