Coverage testing and docker flow fixes by oschaaf · Pull Request #358 · envoyproxy/nighthawk

oschaaf · 2020-06-12T22:31:44Z

Following some digging into stability of our coverage testing, creating this PR for testing CI
with some fixes / enhancements, as well as covering a few more lines in our
CLI tooling exception paths. There seems to be something racy going on where sometimes the coverage originating from our python integration tests is not counted.

Eventually I learned that things might stabilise when we switch the toolchain to clang 10. We follow Envoy's lead on that, so for now stopping further efforts on this.

Description of the changes in this PR:

Increase timeouts of ASAN/TSAN runs
Make envoy_build_sha.sh, which apparently broke in one of the Envoy updates
Cover some of the outer exception handling paths for Nighthawk's CLIs
Integration test for nighthawk_output_transform
Add coverage for code lines where it seemed trivial to do so (e.g. NullStatistic)
Fix run_nighthawk_bazel_coverage.sh to work on my machine
Stricter bash options in some scripts + changes to make that work

Prerequisites

Needs integration-tests: tweaks to stabilize sanitizer runs #357 to go in first.

Leaving some notes/learnings here:

there seems to be something racy going on where sometimes the coverage originating from
our python integration tests is not counted. This is relatively infrequent in CI.
- On my dev machine this structurally happens. This might be a toolchain/version issue.
- On my dev machine, using the docker flow, I have difficulties: the docker container exits when running the full test suite, without leaving a trace why. bash traps in the container don't fire, no useful logging from docker. Eventually I was able to get past bazel fetching its deps, and then build just the python integration tests. As long as a single test is involved, things seem to work / the container gets to finish. It's remarkable that CI doesn't have this problem. Coverage structurally looks as expected in this scenario.
The link step is super memory hungry. Separating that step might be a ci perf opportunity. Coverage is the slowest CI job we have.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Experiment: see if adding back the tests brings back counting coverage for some of the lines in the .h files that went missing. Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf · 2020-06-15T18:02:10Z

Now that #357 is in, readying this one up for review. This should further stabilize CI.

mum4k

Thank you very much for improving the stability, this has been hitting us on multiple PRs. As an overall comment - can we try to enumerate the fixes we are making here? It makes it a bit hard to review the PR and relate the various changes to intentions.

mum4k · 2020-06-17T20:44:50Z

test/integration/test_grpc_service.py

  assertEqual(counters["requestsource.internal.upstream_rq_200"], 4)
+
+
+def run_service_with_args(args):


Can we add documentation for this helper? E.g.

def run_service_with_args(args): """Executes the Nighthawk service with the provided arguments. Args: args: A string, the command line arguments for the service, e.g. "--foo --bar". Returns: A tuple in the form (exit_code, output), where exit_code is the code the Nighthawk service terminated with and the output is its standard output. """

Comment about adding documentation applies to all new non-test public functions. We could consider making some of them private instead.

I made added this comment, slightly modified, to run_service_with_args. And then made this function and others that reuse it private.

mum4k · 2020-06-17T20:47:23Z

test/integration/test_grpc_service.py

+
+def test_grpc_service_help():
+  (exit_code, output) = run_service_with_args("--help")
+  assert (exit_code == 0)


Since these are tests, we should likely use self.assertTrue instead of assert. I think assert just terminates the execution while self.assertFoo correctly records the test failure using the test framework.

Well, we're using pytest, I think this ought te be ok. quoting from the docs:

Due to pytest’s detailed assertion introspection, only plain assert statements are used.

Posting the output of a sample failure (introduced on purpose):

def test_output_transform_help(): (exit_code, output) = _run_output_transform_with_args("--help") > assert (exit_code == 1) E assert 0 == 1 E -0 E +1 exit_code = 0 output = ('WARNING: Perftools heap leak checker is active -- Performance may suffer\n' '\n' 'USAGE: \n' .... <snip> ...

[1] https://docs.pytest.org/en/stable/

Having said that, we do have some helpers in utilities.py, like assertEqual. Switched to use that, for consistency with the rest of the test code.

mum4k · 2020-06-17T20:49:47Z

test/integration/test_output_transform.py

+#!/usr/bin/env python3
+import pytest
+
+from test.integration.utility import *


Can we import specific modules or packages explicitly? It helps when debugging.

Amended this in some places. But we may want to track this as tech debt in an issue, because this occurs in other places as well. I filed #371 for that (as tech-debt)

oschaaf · 2020-06-18T15:15:21Z

Thank you very much for improving the stability, this has been hitting us on multiple PRs. As an overall comment - can we try to enumerate the fixes we are making here? It makes it a bit hard to review the PR and relate the various changes to intentions.

Ah yes, sure: I updated the PR description to list what I did here. Let me know if that looks better.

I will address the other comments later today.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf · 2020-06-18T19:04:51Z

@mum4k thanks, I left some remarks, and also pushed f0596d4 which hopefully addresses your comments.

mum4k · 2020-06-18T21:08:21Z

test/integration/test_grpc_service.py


 from test.integration.integration_test_fixtures import http_test_server_fixture
-from test.integration.utility import *
+from test.integration.utility import (isSanitizerRun, assertEqual, assertIn, assertGreaterEqual,


Looks like this now imports functions from the utility module. What we really should be importing is the utility module itself.

from test.integration import utility

And then calls will look like.
utility.assertEqual

Ideally utility would have a better more descriptive name, but we can address that separately. The Python style we are following internally suggests to only import packages or modules, that way it is easy to say where some function is defined. If the reader just sees "assertEqual", they have to do some digging.

That makes sense, in the benchmark PR I noticed pep8 had trouble with the * imports too.
Changed what was relevant to this PR in 66f6585
Filed #371 earlier to track doing a pass over
the all the other code, to change the rest.

Thank you for filing the issue.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf added 21 commits June 11, 2020 21:04

ci-sanitizer-runs: deflake a couple of tests

a8b53b1

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Remove stale comment

3993422

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

slow down test_grpc_service_happy_flow

2da7cbb

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

fix expectation in _do_tls_configuration_test

ef3ffda

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

minimize diff

0d24955

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Minimize diff pt II

53b2b59

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Back out accidentally left in change

071b941

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Fix format issue

ede3a9d

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

quick test

7280a32

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Tweaks to coverage

3e191ac

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

docker flow fixes

f6d7d99

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Coverage: separate build and coverage execution steps

7c5856f

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Coverage: NUM_CPUS 3->5

8700f80

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Back out increased parallelism

512b52e

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

coverage: restore ipv4 only test option

6b96503

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Add some essential cli / coverage tests

3a23c52

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Add deps to py_test

f389a3f

Experiment: see if adding back the tests brings back counting coverage for some of the lines in the .h files that went missing. Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Increase test_timeout for asan runs

2375d6d

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Tweak sanitizer test timeouts

fe61b40

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Fix test line, bump coverage threshold

2b6c0e3

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Merge remote-tracking branch 'upstream/master' into coverage

b2ff059

oschaaf marked this pull request as ready for review June 15, 2020 18:02

oschaaf added P1 waiting-for-review A PR waiting for a review. labels Jun 15, 2020

oschaaf mentioned this pull request Jun 15, 2020

Coverage flakes #362

Closed

mum4k self-requested a review June 17, 2020 20:40

mum4k self-assigned this Jun 17, 2020

mum4k requested changes Jun 17, 2020

View reviewed changes

mum4k removed the waiting-for-review A PR waiting for a review. label Jun 17, 2020

mum4k added the waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. label Jun 17, 2020

Review feedback

f0596d4

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Jun 18, 2020

mum4k reviewed Jun 18, 2020

View reviewed changes

mum4k added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Jun 18, 2020

oschaaf added 2 commits June 18, 2020 23:57

Merge remote-tracking branch 'upstream/master' into coverage

9b0c236

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Review feedback: change imports

66f6585

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Jun 18, 2020

mum4k mentioned this pull request Jun 19, 2020

Replace occurrences of "importing * from .." in python code #371

Closed

mum4k approved these changes Jun 19, 2020

View reviewed changes

mum4k merged commit 41d340a into envoyproxy:master Jun 19, 2020

		assertEqual(counters["requestsource.internal.upstream_rq_200"], 4)


		def run_service_with_args(args):

Conversation

oschaaf commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the changes in this PR:

Prerequisites

Uh oh!

oschaaf commented Jun 15, 2020

Uh oh!

mum4k left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oschaaf commented Jun 18, 2020

Uh oh!

oschaaf commented Jun 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oschaaf commented Jun 12, 2020 •

edited

Loading