Merged
Conversation
For KV store tests, we execute cilium-agent as a standalone binary, in which case logs are printed to stdout. For the Consul KV store, however, we never append stdout to system logs, so logs for that test are not preserved. The Etcd test was fixed in 8864c72, but the Consul test was apparently overlooked. Fixes: f0741a6 ("Tests: Ginkgo test framework") Signed-off-by: Paul Chaignon <paul@cilium.io>
As for most tests, at the end of RuntimeKVStoreTest, we validate the logs
don't contain any worrisome messages with:
vm.ValidateNoErrorsOnLogs(CurrentGinkgoTestDescription().Duration)
CurrentGinkgoTestDescription().Duration includes the execution of all
ginkgo.BeforeEach [1]. In RuntimeKVStoreTest, one of our
ginkgo.BeforeEach stops the Cilium systemd service (because we run
cilium-agent as a standalone binary in the test itself). Stopping Cilium
can result in worrisome messages in the logs e.g., if the compilation of
BPF programs is terminated abruptly. This in turn makes the tests fail
once in a while.
To fix this, we can replace CurrentGinkgoTestDescription().Duration with
our own "time counter" that doesn't include any of the
ginkgo.BeforeEach executions.
To validate this fix, I ran the whole RuntimeKVStoreTest with this
change 60 times locally and 60 times in the CI (#12419). The tests
passed all 120 times. Before applying the fix, the Consul test would
fail ~1/30 times, both locally and in CI.
1 - https://github.com/onsi/ginkgo/blob/9c254cb251dc962dc20ca91d0279c870095cfcf9/internal/spec/spec.go#L132-L134
Fixes: #11895
Fixes: 5185789 ("Test: Checks for deadlocks panics in logs per each test.")
Related: #12419
Signed-off-by: Paul Chaignon <paul@cilium.io>
Member
Author
|
retest-runtime |
christarazi
approved these changes
Jul 10, 2020
Member
christarazi
left a comment
There was a problem hiding this comment.
Well done, hell of a find!
Member
Author
|
Runtime tests are passing and this pull request only modifies I have set a reminder to backport this in a week, once we have seen it green in master for a long enough time. |
Member
Author
|
The KVStore tests did not fail again during the past week, so I think we can now backport. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
RuntimeKVStoreTesttests (both Consul and Etcd) have been failing for a while because ourValidateNoErrorsInLogscheck sometimes finds errors in the logs. This pull request fixes it.The first commit fixes the Consul test for which no Cilium logs were registered.
Wait... what?! Then how is
ValidateNoErrorsInLogsfailing if we don't even print the logs??Second commit's message:
I am not marking this for backport just yet. I'd like to see it run a couple time successfully in master before. It wouldn't be the first time we erroneously think we fixed a flake.
Fixes: #11895
Fixes: #3211
Fixes: #1733