Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 17, 2017

A short high-level description of this patch set:
Patch 1: update Dockerfiles to use Debian Stretch as a base
Patch 2: disable devmapper graph driver for static build
Patches 3-5: various CI fixes
Patches 6-7: improve error reporting for devmapper graph driver
Patches 8-10: fix 'make all' on some platforms (unrelated but here for the sake of CI)

I believe this is now ready for inclusion. Here is the description of all patches:

Update Dockerfiles to use Debian stretch

The main gain here is that they all use exactly the same distro; previously
arm64 was using Ubuntu Xenial because Debian jessie was too old.

Does not seem that we can change any of the downloaded dependencies still,
as eg libseccomp is still not the version we are using.

devmapper gd: disable for static build

Static build with devmapper is impossible now since libudev is required
and no static version of libudev is available (as static libraries are
not supported by systemd which udev is part of).

This should not hurt anyone as "[t]he primary user of static builds
is the Editions, and docker in docker via the containers, and none
of those use device mapper".

Also, since the need for static libdevmapper is gone, there is no need
to self-compile libdevmapper -- let's use the one from Debian Stretch.

TestRunSeccompProfileAllow32Bit: fix

Since the update to Debian Stretch, this test fails. The reason is dynamic
binary, which requires i386 ld.so for loading (and apparently it is no longer
installed by default):

root@09d4b17:/go/src/github.com/docker/docker# file exit32-test
exit32-test: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=a0d3d6cb59788453b983f65f8dc6ac52920147b6, stripped
root@09d4b17:/go/src/github.com/docker/docker# ls -l /lib/ld-linux.so.2
ls: cannot access '/lib/ld-linux.so.2': No such file or directory

To fix, just add -static.

Interestingly, ldd can'f figure it out.

root@a324f8e:/go/src/github.com/docker/docker# ldd exit32-test
not a dynamic executable

Other tools (e.g. objdump) also show it's a dynamic binary.

While at it, remove the extra "id" argument (a copy-paste error I
guess).

devmapper: show dmesg if mount fails

If mount fails, the reason might be right there in the kernel log ring buffer.
Let's include it in the error message, it might be of great help.

Dockerfiles: fix test-docker-py

Presumably after switch to debian-stretch as a base, the following
errors happens in Jenkins:

10:48:03 ---> Making bundle: test-docker-py (in
bundles/17.06.0-dev/test-docker-py)
10:48:03 ---> Making bundle: .integration-daemon-start (in
bundles/17.06.0-dev/test-docker-py)
10:48:03 Using test binary docker
10:48:03 # DOCKER_EXPERIMENTAL is set: starting daemon with experimental
features enabled!
10:48:03 /etc/init.d/apparmor: 130: /etc/init.d/apparmor:
systemd-detect-virt: not found
10:48:03 Starting AppArmor profiles:Warning from stdin (line 1):
/sbin/apparmor_parser: cannot use or update cache, disable, or
force-complain via stdin
10:48:03 Warning failed to create cache: (null)
10:48:03 .
10:48:03 INFO: Waiting for daemon to start...
10:48:03 Starting dockerd
10:48:05 .
10:48:06 Traceback (most recent call last):
10:48:06 File
"/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 320, in
_importconftest
10:48:06 mod = conftestpath.pyimport()
10:48:06 File
"/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 662, in
pyimport
10:48:06 import(modname)
10:48:06 File "/docker-py/tests/integration/conftest.py", line 6, in

10:48:06 import docker.errors
10:48:06 File "/docker-py/docker/init.py", line 2, in
10:48:06 from .api import APIClient
10:48:06 File "/docker-py/docker/api/init.py", line 2, in
10:48:06 from .client import APIClient
10:48:06 File "/docker-py/docker/api/client.py", line 6, in
10:48:06 import requests
10:48:06 ImportError: No module named requests
10:48:06 ERROR: could not load /docker-py/tests/integration/conftest.py
10:48:06

and

00:38:55 File "/docker-py/docker/transport/ssladapter.py", line 21, in

00:38:55 from backports.ssl_match_hostname import match_hostname
00:38:55 ImportError: No module named backports.ssl_match_hostname
00:38:55 ERROR: could not load /docker-py/tests/integration/conftest.py

To fix, install the missing python modules.

devmapper: tell why xfs is not supported

Instead of providing a generic message listing all possible reasons
why xfs is not available on the system, let's be specific.

devmapper: don't create too new xfs

Since the update to Debian Stretch, devmapper unit test fails. One
reason is, the combination of somewhat old (less than 3.16) kernel and
relatively new xfsprogs leads to creating a filesystem which is not supported
by the kernel:

[12206.467518] XFS (dm-1): Superblock has unknown read-only compatible features (0x1) enabled.
[12206.472046] XFS (dm-1): Attempted to mount read-only compatible filesystem read-write.
Filesystem can only be safely mounted read only.
[12206.472079] XFS (dm-1): SB validate failed with error 22.

Ideally, that would be automatically and implicitly handled by xfsprogs.
In real life, we have to take care about it here. Sigh.

overlay gd: fix build for 32-bit ARM

This commit reverts a hunk of commit 2f5f0af ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).

The exact error on armhf is this:

19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix

Fix test-docker-py on some arches

When running 'make all' on armhf, I got this:

---> Making bundle: .integration-daemon-start (in bundles/17.06.0-dev/test-docker-py)
Using test binary docker
INFO: Waiting for daemon to start...
Starting dockerd
.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line
320, in _importconftest
mod = conftestpath.pyimport()
File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line
662, in pyimport
import(modname)
File "/docker-py/tests/integration/conftest.py", line 6, in
import docker.errors
File "/docker-py/docker/init.py", line 2, in
from .api import APIClient
File "/docker-py/docker/api/init.py", line 2, in
from .client import APIClient
File "/docker-py/docker/api/client.py", line 11, in
from .build import BuildApiMixin
File "/docker-py/docker/api/build.py", line 6, in
from .. import auth
File "/docker-py/docker/auth.py", line 6, in
import dockerpycreds
ImportError: No module named dockerpycreds
ERROR: could not load /docker-py/tests/integration/conftest.py

The fix for this was already provided by commit 0ec8f56 and
commit c7c9235, but for some reason it did not made its way
to Dockerfiles for all architectures.

While at it, remove excessive comments.

TestLogsFollowSlowStdoutConsumer: fix for slow ARM

We run our CI on Scaleway C1 machine, which is pretty slow,
including I/O. This test was failing on it, as it tried to
write 100000 lines of log very fast, and the loggerCloseTimeout
(defined and used in container/monitor.go) prevents the
daemon to finish writing it within this time frame,

Reducing the size to 150000 characters (75000 lines) should
help avoiding hitting it, without compromising the test case
itself.

Alternatively, we could have increased the timeout further. It was
originally set to 1s (commit b6a4267) and later increased 10x
(commit c0391bf). Please let me know if you want me to go that way.

@kolyshkin kolyshkin changed the title [WIP] try using libeudev [WIP] disable devmapper for static_build Aug 18, 2017
@kolyshkin
Copy link
Contributor Author

This is weird, I got repeated test failure with DockerSuite.TestRunSeccompProfileAllow32Bit on both experimental and janky jobs. Here it is:

10:45:11 
10:45:11 ----------------------------------------------------------------------
10:45:11 FAIL: docker_cli_run_unix_test.go:1057: DockerSuite.TestRunSeccompProfileAllow32Bit
10:45:11 
10:45:11 docker_cli_run_unix_test.go:1061:
10:45:11     icmd.RunCommand(dockerBinary, "run", "syscall-test", "exit32-test", "id").Assert(c, icmd.Success)
10:45:11 /go/src/github.com/docker/docker/pkg/testutil/cmd/command.go:64:
10:45:11     t.Fatalf("at %s:%d - %s", filepath.Base(file), line, err.Error())
10:45:11 ... Error: at docker_cli_run_unix_test.go:1061 - 
10:45:11 Command:  /usr/local/bin/docker run syscall-test exit32-test id
10:45:11 ExitCode: 1
10:45:11 Error:    exit status 1
10:45:11 Stdout:   
10:45:11 Stderr:   standard_init_linux.go:187: exec user process caused "no such file or directory"
10:45:11 
10:45:11 
10:45:11 Failures:
10:45:11 ExitCode was 1 expected 0
10:45:11 Expected no error
10:45:11 
10:45:11 
10:45:11 
10:45:11 ----------------------------------------------------------------------

@justincormack
Copy link
Contributor

Thats weird. I really can't see how that could be affecting it at all...

@tophj-ibm
Copy link
Contributor

hmm that's the error it throws when trying to run another architecture's image

@kolyshkin
Copy link
Contributor Author

OK, latest CI failures are caused by github.com issues, such as
14:13:37 stderr: fatal: unable to access 'https://github.com/moby/moby.git/': Failed to connect to github.com port 443: Connection timed out

@kolyshkin
Copy link
Contributor Author

OK, I have fixed the 32-bit vs seccomp issue by the last patch

15:50:08 PASS: docker_cli_run_unix_test.go:1057: DockerSuite.TestRunSeccompProfileAllow32Bit 1.963s

...but am having a devmapper unit tests failures, i guess, similar to ones that @justincormack saw, and have no idea how to address it :(

@kolyshkin
Copy link
Contributor Author

Looks like the issue is caused by devmapper gd thinking that udev sync is not working, and therefore refusing to run. I have checked that the libudev library being used is indeed built with udev_sync support.

Currently trying to test with udev_sync check disabled (i.e. an error becomes a warning).

@kolyshkin kolyshkin force-pushed the libeudev branch 4 times, most recently from 5a73227 to 8ae3f5a Compare August 22, 2017 08:19
@kolyshkin kolyshkin requested a review from tianon as a code owner August 22, 2017 08:19
@kolyshkin kolyshkin force-pushed the libeudev branch 2 times, most recently from 94d64bb to 9415259 Compare August 22, 2017 13:07
@kolyshkin
Copy link
Contributor Author

Debug in last patch (dmesg | tail -5) shown some xfs-related errors in devmapper.

13:23:10 --- FAIL: TestDevmapperCreateEmpty (0.07s)
13:23:10 	assertions.go:226: 
                          
	Error Trace:	graphtest_unix.go:98
13:23:10 		
	            	
			devmapper_test.go:74
13:23:10 		
	Error:      	Received unexpected error:
13:23:10 		
	            	devmapper: Error mounting '/dev/mapper/docker-0:41-2200527-empty' on '/tmp/docker-graphtest-485442700/devicemapper/mnt/empty': invalid argument
13:23:10 		
	            	               Use of these features in this kernel is at your own risk!
13:23:10 		
	            	[12206.467518] XFS (dm-1): Superblock has unknown read-only compatible features (0x1) enabled.
13:23:10 		
	            	[12206.472046] XFS (dm-1): Attempted to mount read-only compatible filesystem read-write.
13:23:10 		
	            	               Filesystem can only be safely mounted read only.
13:23:10 		
	            	[12206.472079] XFS (dm-1): SB validate failed with error 22.

@kolyshkin kolyshkin force-pushed the libeudev branch 7 times, most recently from 2bb81db to 42a4a33 Compare August 24, 2017 10:23
@kolyshkin kolyshkin changed the title [WIP] disable devmapper for static_build [WIP] update Dockerfiles to use Debian Stretch Aug 24, 2017
@tophj-ibm
Copy link
Contributor

I restarted the experimental job and it's here https://jenkins.dockerproject.org/job/Docker-PRs-experimental/36482/ Not sure why it didn't trigger the icon, but it should update once it's done.

@kolyshkin
Copy link
Contributor Author

ARM failure of the day is this:

15:20:08 /usr/local/go/pkg/tool/linux_arm/link: running gcc failed: fork/exec /usr/bin/gcc: cannot allocate memory

This is Scaleway C1 with 2GB of RAM, and sometimes it's not enough. One way to mitigate would be to configure some swap (say 1G), as currently there's none

@yongtang
Copy link
Member

@kolyshkin need a rebase.

@kolyshkin
Copy link
Contributor Author

@kolyshkin need a rebase

I know (it was my patch that broke it).

justincormack and others added 10 commits September 17, 2017 22:02
The main gain here is that they all use exactly the same distro; previously
arm64 was using Ubuntu Xenial because Debian jessie was too old.

Does not seem that we can change any of the downloaded dependencies still,
as eg libseccomp is still not the version we are using.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Static build with devmapper is impossible now since libudev is required
and no static version of libudev is available (as static libraries are
not supported by systemd which udev is part of).

This should not hurt anyone as "[t]he primary user of static builds
is the Editions, and docker in docker via the containers, and none
of those use device mapper".

Also, since the need for static libdevmapper is gone, there is no need
to self-compile libdevmapper -- let's use the one from Debian Stretch.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since the update to Debian Stretch, this test fails. The reason is dynamic
binary, which requires i386 ld.so for loading (and apparently it is no longer
installed by default):

> root@09d4b173c3dc:/go/src/github.com/docker/docker# file exit32-test
> exit32-test: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=a0d3d6cb59788453b983f65f8dc6ac52920147b6, stripped
> root@09d4b173c3dc:/go/src/github.com/docker/docker# ls -l /lib/ld-linux.so.2
> ls: cannot access '/lib/ld-linux.so.2': No such file or directory

To fix, just add -static.

Interestingly, ldd can'f figure it out.

> root@a324f8edfcaa:/go/src/github.com/docker/docker# ldd exit32-test
>	not a dynamic executable

Other tools (e.g. objdump) also show it's a dynamic binary.

While at it, remove the extra "id" argument (a copy-paste error I
guess).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Presumably after switch to debian-stretch as a base, the following
errors happens in Jenkins:

10:48:03 ---> Making bundle: test-docker-py (in
bundles/17.06.0-dev/test-docker-py)
10:48:03 ---> Making bundle: .integration-daemon-start (in
bundles/17.06.0-dev/test-docker-py)
10:48:03 Using test binary docker
10:48:03 # DOCKER_EXPERIMENTAL is set: starting daemon with experimental
features enabled!
10:48:03 /etc/init.d/apparmor: 130: /etc/init.d/apparmor:
systemd-detect-virt: not found
10:48:03 Starting AppArmor profiles:Warning from stdin (line 1):
/sbin/apparmor_parser: cannot use or update cache, disable, or
force-complain via stdin
10:48:03 Warning failed to create cache: (null)
10:48:03 .
10:48:03 INFO: Waiting for daemon to start...
10:48:03 Starting dockerd
10:48:05 .
10:48:06 Traceback (most recent call last):
10:48:06   File
"/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 320, in
_importconftest
10:48:06     mod = conftestpath.pyimport()
10:48:06   File
"/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 662, in
pyimport
10:48:06     __import__(modname)
10:48:06   File "/docker-py/tests/integration/conftest.py", line 6, in
<module>
10:48:06     import docker.errors
10:48:06   File "/docker-py/docker/__init__.py", line 2, in <module>
10:48:06     from .api import APIClient
10:48:06   File "/docker-py/docker/api/__init__.py", line 2, in <module>
10:48:06     from .client import APIClient
10:48:06   File "/docker-py/docker/api/client.py", line 6, in <module>
10:48:06     import requests
10:48:06 ImportError: No module named requests
10:48:06 ERROR: could not load /docker-py/tests/integration/conftest.py
10:48:06

and

00:38:55   File "/docker-py/docker/transport/ssladapter.py", line 21, in
<module>
00:38:55     from backports.ssl_match_hostname import match_hostname
00:38:55 ImportError: No module named backports.ssl_match_hostname
00:38:55 ERROR: could not load /docker-py/tests/integration/conftest.py

To fix, install the missing python modules.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since the update to Debian Stretch, devmapper unit test fails. One
reason is, the combination of somewhat old (less than 3.16) kernel and
relatively new xfsprogs leads to creating a filesystem which is not supported
by the kernel:

> [12206.467518] XFS (dm-1): Superblock has unknown read-only compatible features (0x1) enabled.
> [12206.472046] XFS (dm-1): Attempted to mount read-only compatible filesystem read-write.
> Filesystem can only be safely mounted read only.
> [12206.472079] XFS (dm-1): SB validate failed with error 22.

Ideally, that would be automatically and implicitly handled by xfsprogs.
In real life, we have to take care about it here. Sigh.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If mount fails, the reason might be right there in the kernel log ring buffer.
Let's include it in the error message, it might be of great help.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of providing a generic message listing all possible reasons
why xfs is not available on the system, let's be specific.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When running 'make all' on armhf, I got this:

> ---> Making bundle: .integration-daemon-start (in bundles/17.06.0-dev/test-docker-py)
> Using test binary docker
> INFO: Waiting for daemon to start...
> Starting dockerd
> .
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line
> 320, in _importconftest
>     mod = conftestpath.pyimport()
>   File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line
> 662, in pyimport
>     __import__(modname)
>   File "/docker-py/tests/integration/conftest.py", line 6, in <module>
>     import docker.errors
>   File "/docker-py/docker/__init__.py", line 2, in <module>
>     from .api import APIClient
>   File "/docker-py/docker/api/__init__.py", line 2, in <module>
>     from .client import APIClient
>   File "/docker-py/docker/api/client.py", line 11, in <module>
>     from .build import BuildApiMixin
>   File "/docker-py/docker/api/build.py", line 6, in <module>
>     from .. import auth
>   File "/docker-py/docker/auth.py", line 6, in <module>
>     import dockerpycreds
> ImportError: No module named dockerpycreds
> ERROR: could not load /docker-py/tests/integration/conftest.py

The fix for this was already provided by commit 0ec8f56 and
commit c7c9235, but for some reason it did not made its way
to Dockerfiles for all architectures.

While at it, remove excessive comments.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit reverts a hunk of commit 2f5f0af ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).

The exact error on armhf is this:

	19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
	19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
	19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
	19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
	19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
We run our CI on Scaleway C1 machine, which is pretty slow,
including I/O. This test was failing on it, as it tried to
write 100000 lines of log very fast, and the loggerCloseTimeout
(defined and used in container/monitor.go) prevents the
daemon to finish writing it within this time frame,

Reducing the size to 150000 characters (75000 lines) should
help avoiding hitting it, without compromising the test case
itself.

Alternatively, we could have increased the timeout further. It was
originally set to 1s (commit b6a4267) and later increased 10x
(commit c0391bf). Please let me know if you want me to go that way.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@vieux
Copy link
Contributor

vieux commented Sep 18, 2017

LGTM

To avoid a zombie apocalypse, use cmd.Wait() to properly finish
the processes we spawn by Start().

Found while investigating DockerSuite.TestLogsFollowSlowStdoutConsumer
failure on ARM (see
moby#34550 (comment)).

[v2: don't expect no error from Wait() when process is killed]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@yongtang yongtang merged commit cfdac12 into moby:master Sep 19, 2017
andrewhsu pushed a commit to docker-archive/docker-ce that referenced this pull request Sep 19, 2017
To avoid a zombie apocalypse, use cmd.Wait() to properly finish
the processes we spawn by Start().

Found while investigating DockerSuite.TestLogsFollowSlowStdoutConsumer
failure on ARM (see
moby/moby#34550 (comment)).

[v2: don't expect no error from Wait() when process is killed]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 14f0a18
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.