Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

cmd/swarm/swarm-smoke: improve smoke tests#1337

Merged
nonsense merged 7 commits intoethersphere:swarm-rather-stablefrom
nonsense:improved-smoke-test
Apr 11, 2019
Merged

cmd/swarm/swarm-smoke: improve smoke tests#1337
nonsense merged 7 commits intoethersphere:swarm-rather-stablefrom
nonsense:improved-smoke-test

Conversation

@nonsense
Copy link
Copy Markdown
Contributor

@nonsense nonsense commented Apr 11, 2019

This PR is modifying the smoke test upload and sync so that it is more deterministic.

It includes:

  1. Remove tuid (test unique identifier) - this is no longer needed as every job on Kubernetes has a unique identifier and we can easily filter on Kibana based on it.
  2. Remove trackTimeout - it was used for an internal API that we don't need timeout for.
  3. Add only-upload option, which just uploads a file and doesn't try to download it - useful when you want to check where chunks are synced.
  4. Adds a bunch of measurements, so that we know how many chunks are synced across a deployment for a given upload.
  5. Uses waitToSync() instead of stupid time.Sleep() to determine if syncing is complete. Also adds APIs to Swarm that we need for this functionality.
  6. Removed option to run fetch from all nodes - we don't really run smoke tests with simulatenous fetch from all nodes anymore due to caching on nodes. We might want to revise this in the future, but for now we are always in --single mode.

Copy link
Copy Markdown
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise LGTM

func trackChunks(testData []byte, submitMetrics bool) error {
addrs, err := getAllRefs(testData)
if err != nil {
log.Error("cannot get refs", "err", err.Error())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is already passed to the caller of this function. Do we need to log it also, since it is to the caller to decide what to do with it and the error is already logged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't log it here, sometimes it is difficult to know which RPC call actually failed. This is a problem with bigger files, when you reach size of frame size and various limitations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more descriptive error messages where this function is called and error logged. There should not be the same error in two log lines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do have the line number when this is logged, so you know which one actually emitted it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that whenever trackChunks fails (which doesn't have an effect on the smoke tests, but has an effect on your debugging efforts), you want to know exactly where and why it failed while reviewing the logs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bottom line - you are right that we might log the same error twice, but we don't really care about that level of detail, as long as we have enough information in the logs to determine what happened.

I would be wrong if I say that I do not care about logging the same error twice. This is something that I consider a bad practice. If that is needed, then there is something wrong with our error propagation. I do not think that skipping on this principle contributes to the code cleanup and better architecture that we want to do.

Logging is very important, but it also should be correct, as someone can question why the same error happened twice during debugging. Or why the error needs double logging. If you really feel that all existing log lines are required, then that needs to be explained with comments to avoid that someone removes some of them in the future, breaking your expectations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the log line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anton, please, do not think that I want to make you do something like "remove double logging, to make it more difficult to find where this error is coming from" in your latest commit 476c30c.

I also want to have cleaner code and for that we should agree on some principles. My opinion is that the function caller should decide if it should log the error or pass it annotated to its caller. But, not both as it may result in double logging which can be confusing. By skipping on this types of things we will just end up with more confusion in the code. That is why I asked at least for the comment.

I would also like to opinions from others @frncmx @justelad. If the general understanding is that this is a non-issue I am happy to accept it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janos you are missing the forest for the trees. We have more than 10 comments about a single log line in the swarm-smoke binary, the output of which only developers read. We are spending way too much time on the wrong thing.

The lifetime of this code will probably be short, since we will want to add much more functionality once we get to the point where the actual bugs in the swarm are addressed. Whether we log something once or twice in this test doesn't matter so much.

I really don't want to continue to have a discussion about a single log line, or whether this function has the best possible design - it is serving its purpose for the time being, and if it stops doing that in the near future - we will remove it and write a new one. This is most probably what's going to happen anyway, because tracing syncing chunk by chunk scales only so much - once we start running tests with bigger files, this won't be feasible in its current form.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather we focus on why we have hundreds of lines of dead code and unit tests along them in the codebase, rather than whether we log an error once or twice.

@nonsense
Copy link
Copy Markdown
Contributor Author

@justelad addressed your comments.

Copy link
Copy Markdown
Contributor

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @nonsense; @janos I'm also aware that maybe some of the logging in the PR is maybe not according to our convention but I trust @nonsense to know where he needs these log-lines in order to trace easily and to allow him to have a faster feedback loop in his debugging endeavors.

More scrutiny can be applied later after we iterate over this a few times and fix the bugs.

Copy link
Copy Markdown
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 and sorry for discussion about coding standards, it should not be a part of this PR.

@nonsense
Copy link
Copy Markdown
Contributor Author

@janos @justelad thank for reviews and putting up with this ugly PR. Let's think how we can make the integration test suite (which swarm-smoke) is part of, once we integrate the fixes to the bugs we are already aware of.

I agree with @janos that now that swarm-smoke is becoming more and more part of our formal stack to verify if Swarm is working and behaving as expected, we should begin to raise the quality standards for it.

@nonsense
Copy link
Copy Markdown
Contributor Author

Merging to swarm-rather-stable as failure in Travis are independent of this PR.

@nonsense nonsense merged commit 36c8d85 into ethersphere:swarm-rather-stable Apr 11, 2019
nonsense added a commit that referenced this pull request May 10, 2019
cmd/swarm/swarm-smoke: improve smoke tests (#1337)

swarm/network: remove dead code (#1339)

swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
nonsense added a commit that referenced this pull request May 10, 2019
cmd/swarm/swarm-smoke: improve smoke tests (#1337)

swarm/network: remove dead code (#1339)

swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants