Skip to content

Conversation

@kolyshkin
Copy link
Contributor

The TestExecInTTY test case is sometimes failing like this:

execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\r\n 7 root 0:00 ps\r\n"

or this:

execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\n 7 root 0:00 ps\n"

(this is easy to repro with go test -run TestExecInTTY -count 1000).

This is caused by a race between

  • an Init() (in this case it is is (*linuxSetnsInit.Init(), but
    (*linuxStandardInit).Init() is no different in this regard),
    which creates a pty pair, sends pty master to runc, and execs
    the container process,

and

  • a parent runc process, which receives the pty master fd and calls
    ClearONLCR() on it.

One way of fixing it would be to add a synchronization mechanism
between these two, so Init() won't exec the process until the parent
sets the flag. This seems excessive, though, as we can just move
the ClearONLCR() call to Init(), putting it right after console.NewPty().

Note that bug was only seen in the TestExecInTTY test case, but
from looking at the code it seems like it can also happen during
runc run or runc exec.

While at it:

  • simplify/improve the test case;
  • make it repeat 300 times for better chances to catch the bug.

Fixes: #2425


// Repeat to increase chances to catch a race; see
// https://github.com/opencontainers/runc/issues/2425.
for i := 0; i < 300; i++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, this commit is totally optional -- I am not sure myself I want it, since the bug is fixed and it takes about 5s on my laptop for this test now.

The TestExecInTTY test case is sometimes failing like this:

> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\r\n 7 root 0:00 ps\r\n"

or this:

> execin_test.go:332: unexpected carriage-return in output "PID USER TIME COMMAND\r\n 1 root 0:00 cat\n 7 root 0:00 ps\n"

(this is easy to repro with `go test -run TestExecInTTY -count 1000`).

This is caused by a race between

 - an Init() (in this case it is is (*linuxSetnsInit.Init(), but
   (*linuxStandardInit).Init() is no different in this regard),
   which creates a pty pair, sends pty master to runc, and execs
   the container process,

and

 - a parent runc process, which receives the pty master fd and calls
   ClearONLCR() on it.

One way of fixing it would be to add a synchronization mechanism
between these two, so Init() won't exec the process until the parent
sets the flag. This seems excessive, though, as we can just move
the ClearONLCR() call to Init(), putting it right after console.NewPty().

Note that bug only happens in the TestExecInTTY test case, but
from looking at the code it seems like it can happen in runc run
or runc exec, too.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
There's no need to check err for nil.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Simplify the tty code by using 1 goroutine instead of 2.

Improve error reporting by wrapping the errors.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is to increase the chance to hit the recently fixed race.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

@cyphar PTAL

@kolyshkin
Copy link
Contributor Author

@AkihiroSuda @mrunalp PTAL

This fixes a (now) frequent test failure which I am tired of seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestExecInTTY: execin_test.go:351: unexpected carriage-return in output

3 participants