cmd/swarm/swarm-smoke: improve smoke tests by nonsense · Pull Request #1337 · ethersphere/swarm

nonsense · 2019-04-11T09:19:46Z

This PR is modifying the smoke test upload and sync so that it is more deterministic.

It includes:

Remove tuid (test unique identifier) - this is no longer needed as every job on Kubernetes has a unique identifier and we can easily filter on Kibana based on it.
Remove trackTimeout - it was used for an internal API that we don't need timeout for.
Add only-upload option, which just uploads a file and doesn't try to download it - useful when you want to check where chunks are synced.
Adds a bunch of measurements, so that we know how many chunks are synced across a deployment for a given upload.
Uses waitToSync() instead of stupid time.Sleep() to determine if syncing is complete. Also adds APIs to Swarm that we need for this functionality.
Removed option to run fetch from all nodes - we don't really run smoke tests with simulatenous fetch from all nodes anymore due to caching on nodes. We might want to revise this in the future, but for now we are always in --single mode.

acud

Minor comments, otherwise LGTM

cmd/swarm/swarm-smoke/main.go

cmd/swarm/swarm-smoke/upload_and_sync.go

swarm/network/kademlia.go

janos · 2019-04-11T10:15:51Z

cmd/swarm/swarm-smoke/upload_and_sync.go

+func trackChunks(testData []byte, submitMetrics bool) error {
 	addrs, err := getAllRefs(testData)
 	if err != nil {
+		log.Error("cannot get refs", "err", err.Error())


This error is already passed to the caller of this function. Do we need to log it also, since it is to the caller to decide what to do with it and the error is already logged.

If we don't log it here, sometimes it is difficult to know which RPC call actually failed. This is a problem with bigger files, when you reach size of frame size and various limitations.

We can add more descriptive error messages where this function is called and error logged. There should not be the same error in two log lines.

You do have the line number when this is logged, so you know which one actually emitted it.

The point is that whenever trackChunks fails (which doesn't have an effect on the smoke tests, but has an effect on your debugging efforts), you want to know exactly where and why it failed while reviewing the logs.

Bottom line - you are right that we might log the same error twice, but we don't really care about that level of detail, as long as we have enough information in the logs to determine what happened.

I would be wrong if I say that I do not care about logging the same error twice. This is something that I consider a bad practice. If that is needed, then there is something wrong with our error propagation. I do not think that skipping on this principle contributes to the code cleanup and better architecture that we want to do.

Logging is very important, but it also should be correct, as someone can question why the same error happened twice during debugging. Or why the error needs double logging. If you really feel that all existing log lines are required, then that needs to be explained with comments to avoid that someone removes some of them in the future, breaking your expectations.

removed the log line.

Anton, please, do not think that I want to make you do something like "remove double logging, to make it more difficult to find where this error is coming from" in your latest commit 476c30c.

I also want to have cleaner code and for that we should agree on some principles. My opinion is that the function caller should decide if it should log the error or pass it annotated to its caller. But, not both as it may result in double logging which can be confusing. By skipping on this types of things we will just end up with more confusion in the code. That is why I asked at least for the comment.

I would also like to opinions from others @frncmx @justelad. If the general understanding is that this is a non-issue I am happy to accept it.

@janos you are missing the forest for the trees. We have more than 10 comments about a single log line in the swarm-smoke binary, the output of which only developers read. We are spending way too much time on the wrong thing.

The lifetime of this code will probably be short, since we will want to add much more functionality once we get to the point where the actual bugs in the swarm are addressed. Whether we log something once or twice in this test doesn't matter so much.

I really don't want to continue to have a discussion about a single log line, or whether this function has the best possible design - it is serving its purpose for the time being, and if it stops doing that in the near future - we will remove it and write a new one. This is most probably what's going to happen anyway, because tracing syncing chunk by chunk scales only so much - once we start running tests with bigger files, this won't be feasible in its current form.

I'd much rather we focus on why we have hundreds of lines of dead code and unit tests along them in the codebase, rather than whether we log an error once or twice.

cmd/swarm/swarm-smoke/upload_and_sync.go

nonsense · 2019-04-11T11:12:41Z

@justelad addressed your comments.

…rror is coming from

acud

LGTM @nonsense; @janos I'm also aware that maybe some of the logging in the PR is maybe not according to our convention but I trust @nonsense to know where he needs these log-lines in order to trace easily and to allow him to have a faster feedback loop in his debugging endeavors.

More scrutiny can be applied later after we iterate over this a few times and fix the bugs.

janos

LGTM 👍 and sorry for discussion about coding standards, it should not be a part of this PR.

nonsense · 2019-04-11T15:16:09Z

@janos @justelad thank for reviews and putting up with this ugly PR. Let's think how we can make the integration test suite (which swarm-smoke) is part of, once we integrate the fixes to the bugs we are already aware of.

I agree with @janos that now that swarm-smoke is becoming more and more part of our formal stack to verify if Swarm is working and behaving as expected, we should begin to raise the quality standards for it.

nonsense · 2019-04-11T15:16:59Z

Merging to swarm-rather-stable as failure in Travis are independent of this PR.

cmd/swarm/swarm-smoke: improve smoke tests (#1337) swarm/network: remove dead code (#1339) swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)

nonsense added 2 commits April 11, 2019 11:14

cmd/swarm/swarm-smoke: improve smoke tests

223c824

swarm/api, swarm/network: APIs for smoke tests

8fd8f65

nonsense force-pushed the improved-smoke-test branch from 00d7e0c to 8fd8f65 Compare April 11, 2019 09:27

nonsense requested review from acud and janos April 11, 2019 09:28

nonsense added the ready for review label Apr 11, 2019

acud suggested changes Apr 11, 2019

View reviewed changes

janos reviewed Apr 11, 2019

View reviewed changes

nonsense added 3 commits April 11, 2019 13:07

remove Kademlia APIs

5067cc0

remove spammy log line

94fc3fc

typo

5b4d42e

nonsense added 2 commits April 11, 2019 14:25

fix kademlia hive string

4c4c1c5

remove double logging, to make it more difficult to find where this e…

476c30c

…rror is coming from

acud approved these changes Apr 11, 2019

View reviewed changes

janos approved these changes Apr 11, 2019

View reviewed changes

nonsense merged commit 36c8d85 into ethersphere:swarm-rather-stable Apr 11, 2019

nonsense added a commit that referenced this pull request May 10, 2019

cmd/swarm/swarm-smoke: improve smoke tests (#1337)

7e9d668

Conversation

nonsense commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nonsense commented Apr 11, 2019

Uh oh!

acud left a comment

Choose a reason for hiding this comment

Uh oh!

janos left a comment

Choose a reason for hiding this comment

Uh oh!

nonsense commented Apr 11, 2019

Uh oh!

nonsense commented Apr 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nonsense commented Apr 11, 2019 •

edited

Loading