Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

[DNM] tests: trigger CI for kernel fragments#1588

Closed
ganeshmaharaj wants to merge 1 commit intokata-containers:masterfrom
ganeshmaharaj:test-kernel-frag-ci
Closed

[DNM] tests: trigger CI for kernel fragments#1588
ganeshmaharaj wants to merge 1 commit intokata-containers:masterfrom
ganeshmaharaj:test-kernel-frag-ci

Conversation

@ganeshmaharaj
Copy link
Copy Markdown
Contributor

Depends-on: github.com/kata-containers/packaging#314
Fixes: #123141234

Signed-off-by: Ganesh Maharaj Mahalingam ganesh.mahalingam@intel.com

@devimc
Copy link
Copy Markdown

devimc commented Apr 25, 2019

/test

@egernst
Copy link
Copy Markdown
Member

egernst commented Apr 26, 2019

/test

@grahamwhaley
Copy link
Copy Markdown
Contributor

/test
as the frag PR has been updated to hopefully pass the memory limitation tests now.

@grahamwhaley
Copy link
Copy Markdown
Contributor

@ganeshmaharaj noted, on the 16.04 CI, it failed to construct the frag config, although it tried. I've no idea how/why that would happen...hmm.

00:04:29.983 # configuration written to /home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/packaging/kernel/configs/fragments/x86_64/.config
00:04:29.983 #
00:04:29.983 Value requested for CONFIG_MEMCG_SWAP not in final .config
00:04:29.983 Requested value:  CONFIG_MEMCG_SWAP=y
00:04:29.983 Actual value:     
00:04:29.983 
00:04:29.983 Value requested for CONFIG_MEMCG_SWAP_ENABLED not in final .config
00:04:29.983 Requested value:  CONFIG_MEMCG_SWAP_ENABLED=y
00:04:29.983 Actual value:     
00:04:29.983 
00:04:29.983 Value requested for CONFIG_ARCH_MEMORY_PROBE not in final .config
00:04:29.983 Requested value:  CONFIG_ARCH_MEMORY_PROBE=n
00:04:29.983 Actual value:     # CONFIG_ARCH_MEMORY_PROBE is not set
00:04:29.983 INFO: Generated config file can be found in /home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/packaging/kernel/configs/fragments/x86_64/.config
00:04:29.984 ERROR: Failed to construct requested .config file
00:04:29.984 ERROR: failed to find default config 

@grahamwhaley
Copy link
Copy Markdown
Contributor

@ganeshmaharaj just for ref, the 18.04 CI passed here. hmmm.

@devimc
Copy link
Copy Markdown

devimc commented Apr 29, 2019

/retest
kata-containers/packaging#314 was updated

@grahamwhaley
Copy link
Copy Markdown
Contributor

/test
There is a while mix of things in the CI here - some CRI-O fails, some memory hotplug fails, and some CIs perfectly happy. I'm just going to spin the wheel here again whilst I try some tests out locally as well.

@grahamwhaley
Copy link
Copy Markdown
Contributor

@ganeshmaharaj - your commit (not the merge message, the actual commit), has a dep in it for packaging PR 461 still - I think that is messing up the fragment depends merge - I suspect you need to either drop that dep or change that dep to 314, and remove from the merge request - iyswim?
Let's get that updated and re-spin the CI ...

@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

/retest

@grahamwhaley
Copy link
Copy Markdown
Contributor

Heh heh, metrics CI failed, in a good way - looks like maybe we shrank the memory footprint?

00:17:02.454 Report Summary:
00:17:02.454 +-----+----------------------+-------+-------+--------+-------+-------+--------+-------+------+-----+
00:17:02.454 | P/F |         NAME         |  FLR  | MEAN  |  CEIL  |  GAP  |  MIN  |  MAX   |  RNG  | COV  | ITS |
00:17:02.517 +-----+----------------------+-------+-------+--------+-------+-------+--------+-------+------+-----+
00:17:02.517 | P   | boot-times           | 95.0% | 97.1% | 105.0% | 10.0% | 88.7% | 108.5% | 22.3% | 4.3% |  20 |
00:17:02.517 | *F* | memory-footprint     | 95.0% | 92.6% | 105.0% | 10.0% | 92.6% | 92.6%  | 0.0%  | 0.0% |   1 |
00:17:02.517 | *F* | memory-footprint-ksm | 95.0% | 90.2% | 105.0% | 10.0% | 90.2% | 90.2%  | 0.0%  | 0.0% |   1 |
00:17:02.517 +-----+----------------------+-------+-------+--------+-------+-------+--------+-------+------+-----+

I'll go check on nemu and vsocks CIs, just in case that is the frags upsetting them...

@grahamwhaley
Copy link
Copy Markdown
Contributor

vsocks CI failed due to the current openshift image move change that we know about.
nemu had some sort of test timeout, so I've re-nudged a build on that one, and we'll see if it is repeatable.

@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

/retest

@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

ganeshmaharaj commented May 6, 2019

The new set of failures are such.

Other than the two below, almost all others are failing cause of low entropy. With the new kernel disabled most devices that we do not need, does that contribute to the low entropy level?

jenkins-ci-ARM-ubuntu-18-04


16:55:20 Install Qemu
17:00:09 package github.com/qemu/qemu: no Go files in 
/home/jenkins/workspace/kata-containers-runtime-ARM-18.04-PR/go/src/github.com/qemu/qemu
17:00:09 can't load package: package github.com/kata-containers/packaging: no Go files in /home/jenkins/workspace/kata-containers-runtime-ARM-18.04-PR/go/src/github.com/kata-containers/packaging
17:00:09 ~/workspace/kata-containers-runtime-ARM-18.04-PR/go/src/github.com/qemu/qemu ~/workspace/kata-containers-runtime-ARM-18.04-PR/go/src/github.com/kata-containers/tests
17:00:11 Already on 'master'
17:00:11 Your branch is up to date with 'origin/master'.
17:00:11 error: The following untracked working tree files would be overwritten by checkout:
17:00:11 	slirp/COPYRIGHT
17:00:11 Please move or remove them before you switch branches.
17:00:11 Aborting
17:00:12 Build step 'Execute shell' marked build as failure

jenkins-ci-centos-7-4-q-35

=== RUN   TestHostNetworkingRequested
--- FAIL: TestHostNetworkingRequested (0.01s)
    assertions.go:239: 
                          
	Error Trace:	network_test.go:99
        
	Error:      	Should be true

@grahamwhaley
Copy link
Copy Markdown
Contributor

/test
let's see if we get past the entropy now I dropped RANDOM_TRUST from the frag PR for now...

@grahamwhaley
Copy link
Copy Markdown
Contributor

fc failed

I think hit a potentially spurious grpc error, I'll spin it:

Stderr: docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.

vsocks failed

Lots of 'terminated' items - going to respin

testing: warning: no tests to run
PASS
./hack/test-utils.sh: line 53:  7874 Terminated              keepalive "sudo PATH=${PATH} ${ROOT}/_output/containerd ${CONTAINERD_FLAGS}" ${RESTART_WAIT_PERIOD} &> ${report_dir}/containerd.log

rhel7 failed

I think this is maybe bogus, and is a left over artifact from a jenkins job being renamed (maybe to the vsocks one??).

@grahamwhaley
Copy link
Copy Markdown
Contributor

a new fc ci fail I've not seen before:

• Failure [32.980 seconds]
check yamux IO timeout
/tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:45
  pause, wait and unpause a container
  /tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:67
    check yamux IO connection
    /tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:68
      should keep alive [It]
      /tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:69

      Expected
          <int>: 0
      to equal
          <int>: 125

      /tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:71
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

Summarizing 1 Failure:

[Fail] check yamux IO timeout pause, wait and unpause a container check yamux IO connection [It] should keep alive 
/tmp/jenkins/workspace/kata-containers-runtime-centos-7-4-PR-firecracker/go/src/github.com/kata-containers/tests/integration/docker/pause_test.go:71

Ran 1 of 249 Specs in 171.115 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 248 Skipped --- FAIL: TestIntegration (171.15s)
FAIL

going to spin it again....

@grahamwhaley
Copy link
Copy Markdown
Contributor

ah, looks like I need to fix the config file merge conflict on the frags PR... will do that now, and re-fire everything, again....

@grahamwhaley
Copy link
Copy Markdown
Contributor

/test

@grahamwhaley
Copy link
Copy Markdown
Contributor

/test
rebased (CIs here were looking good though)

@ganeshmaharaj ganeshmaharaj force-pushed the test-kernel-frag-ci branch 2 times, most recently from d514fba to 73bff0e Compare May 13, 2019 15:47
@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

/retest

Depends-on: github.com/kata-containers/packaging#314
Fixes: #123141234

Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
@ganeshmaharaj ganeshmaharaj force-pushed the test-kernel-frag-ci branch from 73bff0e to e36a500 Compare May 13, 2019 16:57
@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

/retest
Now the sha has been updated and all the bits should work and test the kernel. Just ran a single test build on ubuntu and everything was fine. Hence this trigger.

@grahamwhaley
Copy link
Copy Markdown
Contributor

Hmm, we got an fc CI fail, but it is a different fail from before (they look similar, only as the tests end up reporting 'exit code 125', which is a pretty generic error for 'the container failed to run' (we see the grpc error a little way above).
Slightly worrying, if we look at the fc CI build history (http://jenkins.katacontainers.io/job/kata-containers-runtime-centos-7-4-PR-firecracker/), then it actually looks fairly stable - the other recent fails I looked at were vaild (bad code format fails etc.). Given this seems to be a variable/sporadic fail, it could be hard to narrow down... let me see if I can make it happen locally at all.

@amshinde
Copy link
Copy Markdown
Member

amshinde commented Jun 3, 2019

@ganeshmaharaj Do you need still need this PR?

@grahamwhaley
Copy link
Copy Markdown
Contributor

Yes, it is the only way to fully test the parallel fragment PR over in the packaging repo (as the packaging repo does not run the full test suite)

@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

@amshinde as @grahamwhaley mentioned, this is our conduit to test the kernel fragment changes. Once we a knock out other things from the pipeline, we should be able to get back onto this and start fixing it around. Would be nice to leave this here for now.

@egernst egernst added the do-not-merge PR has problems or depends on another label Jun 11, 2019
@egernst
Copy link
Copy Markdown
Member

egernst commented Jun 11, 2019

/retest

@grahamwhaley
Copy link
Copy Markdown
Contributor

no need to retest @egernst - the PR is stuck waiting for somebody to have time to reproduce and diagnose why the firecracker CI seems to be less stable when using the kernel fragments.
Also, would be nice to test and get SRIOV support into the fragments as well.

@egernst
Copy link
Copy Markdown
Member

egernst commented Jun 12, 2019

ok, slapping needs-help on the original PR then.

@ganeshmaharaj
Copy link
Copy Markdown
Contributor Author

Dropping this patch in favour of #1900

@ganeshmaharaj ganeshmaharaj deleted the test-kernel-frag-ci branch July 22, 2019 14:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

do-not-merge PR has problems or depends on another

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants