Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Apr 23, 2019

Send txs immediately to all peers, just like blocks are solved immediately on regtest.

Overall, the tests run three times as fast for me now, because they no longer wait for the poisson tx send to timeout.

This is only a regtest change and it is possible to disable it with -tx_relay_force_flush=0 should specific testing needs require it.

Alternative to:

@sipa
Copy link
Member

sipa commented Apr 23, 2019

I don't think we want to do this generically, as it will prevent testing the normal relay logic?

@maflcko
Copy link
Member Author

maflcko commented Apr 23, 2019

This preserves the normal relay logic except that the delay is always zero (for inbound and outbound peers). Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

@naumenkogs
Copy link
Contributor

Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

It's not fundamentally impossible though, is it? I think remember using part of some stuck PR (Luke's?) to test both directions.

Anyway, could you motivate this change a bit? I see some description in #15858, but it is closed now :)

@fanquake fanquake added the P2P label Apr 23, 2019
@maflcko
Copy link
Member Author

maflcko commented Apr 23, 2019

Added a commit to make it a regtest command line flag

@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15759 ([p2p] Add 2 outbound blocks-only connections by sdaftuar)
  • #14856 (net: remove more CConnman globals (theuni) by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@gmaxwell
Copy link
Contributor

I really don't like special casing the tests like this, the more we do this the more tests become useless just for show things-- and, of course, the special casing code itself can contribute to bugs too. Wouldn't it be better to run some of these tests in parallel so that their latency is less relevant?

@AM5800
Copy link

AM5800 commented Apr 24, 2019

@gmaxwell but if there are no tests for delay, what is the point of keeping delay in tests at all? If it breaks, no one will notice it anyway.

Also on:

Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

How about starting 3 nodes like this: node0 -> node1 -> node2 and then generating txs on node1.
One then can measure how long it took node0 and node2 to receive those transactions (by polling for mempool contents or even by logs).

@AM5800
Copy link

AM5800 commented Apr 24, 2019

By the way, @MarcoFalke, if your intent is to only speedup several tests in which p2p is not test subject - you can just whitelist peers in such tests.

@maflcko
Copy link
Member Author

maflcko commented Apr 24, 2019

I'd rather not whitelist peers in tests. Not only is it tedious to pass in the command line flag, but also it affects DoS scores and some other things, I believe.

@jnewbery
Copy link
Contributor

Concept ACK adding this as default in the functional tests.

I don't think we want to do this generically, as it will prevent testing the normal relay logic

I can't think of any cases in the functional tests where we do anything except submit a transaction and wait for it to propagate. Running with m_tx_relay_force_flush will achieve exactly the same except that wait will be much shorter.

Notably, I don't think there are any tests in the functional test suite that explicitly test the poisson distribution for tx relay. Really, that should be tested at the unit test level. We could add a single functional test with m_tx_relay_force_flush disabled if we want a higher-level test that simply tests that tx relay is not completely broken.

I really don't like special casing the tests like this

In general I agree, but the change here is so minor that I think it's ok. I'm much more concerned about other shortcomings in our functional test suite, such as not having peers outbound peers or peers on non-local network addresses.

Wouldn't it be better to run some of these tests in parallel

That's already the default

@sdaftuar
Copy link
Member

My instinct here is that our design goal should be to make the p2p connections in regtest be more like mainnet, and not less. We already have a problem that in regtest, we don't exercise our networking code enough (as mentioned in #14210 -- an issue I opened after I got the impression that many people were unaware of our code-coverage deficiences with respect to p2p behavior). So adding more special cases seems like it's going in the wrong direction.

On the other hand it would be great for the tests to run faster. My suggestion would be to adopt an approach more like #15858 was trying to do, where we leave bitcoind unchanged but choose to make certain tests opt-out of testing the p2p behavior. I think that also makes it easier to reason about what a test is testing, if you can see clearly in the test what parts of the code are being exercised (rather than implicitly relying on some non-standard behaviors we've opted bitcoind into in the test framework).

I only briefly looked at #15858 so far but I think it seemed to be going in the right direction. Having individual tests be modified one-by-one to opt into a faster behavior seems fine to me; reviewers can determine whether the code coverage change is logical or not based on what the test is doing and what the rest of the test suite already covers. The only caveat I'd add is that I think we should make it more explicit whether a call to sync_mempools() is exercising the p2p sync logic or not, to make the tests more readable and understandable.

As an aside, I think someone should just add a test that tests poisson delays -- that is currently possible in our test framework (inbounds can be tested via mininode, outbounds can be tested with multiple bitcoind's because there's no difference in our p2p logic between addnode peers and regular outbound peers, I believe).

So I'm a weak concept NACK on adding functionality to bitcoind that allows changing the relay delays; I'm a strong concept NACK on defaulting all our functional tests into special-cased behavior (if we were to go with this approach, I think we should only opt-in tests that we choose to exercise non-standard behavior on, rather than take a blanket approach).

Regarding parallelization, I think the point is that perhaps developers can immediately start saving their own time (and achieve the same benefits of this PR) by just increasing the number of simultaneous jobs they run with? I often run test_runner.py with the defaults myself, and this PR has reminded me that I could cut the time down significantly by adding more parallelization, so I will try to remember that going forward.

@maflcko
Copy link
Member Author

maflcko commented Apr 24, 2019

The response to "the tests run slow" shouldn't be "just run more of them in parallel". We might have users or developers that would like to run the tests on weak hardware (e.g. dual core cpu or hdd). Those constraints set a limit on how many tests can be run in parallel. Some of the tests might have random sleeps in them, but you can't rely on those because they are -- after all -- random.

It already takes about an hour to compile from scratch and run all tests on a dual core cpu with hdd. I don't see how compilation could be sped up any time soon (it would probably require getting rid of the boost hpp includes and the serialize.h includes, neither trivially possible). So having the tests run faster with a low risk setting like this seems like a nice way to kick the can down the road.

Because the flag can be set on the command line, it shouldn't hinder anyone from increasing the test coverage of our tx relay code.

I do see the argument that regtest should be as close to mainnet as possible, but there is also a reason to have some regtest values differ from mainnet, when they aid testing.

@jnewbery
Copy link
Contributor

jnewbery commented Apr 24, 2019

🙁 I much prefer this approach to #15858 and don't see why adding 30 lines of gnarly topo-sort code in the test framework is in any way preferable to adding the one-line change:

            bool fSendTrickle = pto->fWhitelisted;
            bool fSendTrickle = connman->m_tx_relay_force_flush || pto->fWhitelisted;

in SendMessages(). If the concern is whether the faster sync behaviour should be default of not, then it's equally possible to make either change default to on or off.

The only change in this PR is one of timing. The other PR completely changes how transactions are relayed between nodes on the test network, so in fact this PR is results in less difference in behaviour from a 'real' network. I can easily come up with scenarios where a regression in tx relay would be caught with this change, but hidden by the change in #15858 which basically removes p2p tx relay between nodes under test.

So my preference would be to use this PR, but default to off to satisfy the concerns of @sipa @gmaxwell and @sdaftuar. It can be explicitly enabled in the tests that @MarcoFalke picked out in 15881.

@gmaxwell
Copy link
Contributor

gmaxwell commented Apr 24, 2019

The response to "the tests run slow" shouldn't be "just run more of them in parallel"

Why not? At least to the extent that slowness is due to sleeps in the protocol design then running them in parallel is a perfectly reasonable response to it.

but if there are no tests for delay, what is the point of keeping delay in tests at all? If it breaks, no one will notice it anyway.

Because they are part of the system. The tests (for the most part) are system tests. Even if there is not a specific test for the delay behaviour in most possible ways that it could break, if it broke, it would break the test. Consider, if a rare crasher bug, say a divide by zero, were introduced in PoissonNextSend then currently the tests could eventually catch it even though there is no direct test-- but if that code was never run it could never be caught by the tests.

Also, having additional conditional behaviour in the codebase complicates the code -- opportunities for confusion, time wasted fixing bugs (or testing code) which is specific to the test environment. What happens when some bool sign sense gets flipped and the delay skipping gets turned on in mainnet? Or when there is a race in getdata handling that doesn't work without delays? Additional conditional test only code also frustrates efforts to improve the code by hitting high multiple-condition-decision branch coverage (because of dead production code in tests, and dead test code in coverage).

Three times faster is no joke. But the further we make the tested system less like the actual system the less useful the tests become. We could make them infinity faster by removing them completely...

@jnewbery 's one line version is a lot more attractive. And while I wouldn't seen any problem if some particularly tx-relay-serailization-delay sensitive tests used it, we really shouldn't be doing the bulk of our system testing that way simply because its different from production. We should be trying to minimize the difference to production... What we could do is identify tests where the p2p behaviour is irrelevant to the test itself and the source of a large delay and bypass it for just those tests. However, we should take care that doing so doesn't leave some part of the production system unused in the tests (e.g. we shouldn't do this to all tests that relay transactions...).

I feel like maybe we've lost the distinction between system and unit testing, and we're asking tests which are structurally system tests to work and perform like unit tests. In general I'm not the biggest fan of unit testing and prefer to achieve as much tests via system tests as possible, but when performance considerations are causing us to test increasingly more modified versions of the system it might make sense to split things up and stop trying to use the whole system when what we're trying to do is a unit test.

@kostyantyn
Copy link
Contributor

It looks that the main concern is that this PR introduces a new case condition and if we misapply it, some part of the code (concretely PoissonNextSend) might not be executed when it's expected to be called or another way around.

What if we instead of enabling/disabling the delay, we pass a parameter to the function (actually PoissonNextSend already accepts average_interval_seconds but seems it's better to introduce another one). Then we are sure that this code is executed, every operand is taken into account, we write unit test for values we use in testnet/mainnet, and we can set the desired parameter for regtest that will cancel out the delay.

@instagibbs
Copy link
Member

@kostyantyn yes I think if this kind of approach is to be taken at all, it should be parameterized and shortened by default.

@gmaxwell
Copy link
Contributor

@kostyantyn The "unit test" aspect is only part of it. In complex systems many interesting bugs emerge out of interactions of multiple parts. If all the tests significantly change the delays then conditions that are only reachable in the presence of delays won't get hit. Testing both with and without delays is probably better, but if only one is done it should be kept closer to actual use.

@maflcko maflcko force-pushed the 1904-txRelayRegtest branch from 44441db to 4fd2061 Compare April 26, 2019 19:45
@jonasschnelli
Copy link
Contributor

I tend to agree with @gmaxwell, @sdaftuar, @sipa.
Optimizing tests for performance by reducing test-authenticity seems suboptimal to me.

We might have users or developers that would like to run the tests on weak hardware (e.g. dual core cpu or hdd). Those constraints set a limit on how many tests can be run in parallel.

IMO the last important property a test should have is beeing fast (not saying its completely irrelevant).

@maflcko
Copy link
Member Author

maflcko commented Apr 30, 2019

Devs are less likely to run the tests (or a specific test) if it takes too long to run. We might as well remove slow tests if only one or two devs run them on high-end hardware. Long tests are excluded on travis and I doubt that a bunch of developers on different architectures pass in --extended.

Moreover, our unit or functional tests are horribly unsuitable to find actual issues with the p2p code in a scenario such as mainnet. The transaction workload is trivial and highly synthetic, not at all comparable to mainet.

I do see the argument of dead code paths when there is no delay by default, so I'll rework this pull to default to the mainnet setting unless a test opts in to the 0-delay. Testing both settings a bit might actually lead to more test coverage, as race conditions might be easier to trigger with a 0-delay.

@maflcko maflcko force-pushed the 1904-txRelayRegtest branch from 4fd2061 to 60d70a5 Compare April 30, 2019 19:50
@maflcko maflcko force-pushed the 1904-txRelayRegtest branch from 60d70a5 to fa96670 Compare April 30, 2019 20:01
@maflcko maflcko force-pushed the 1904-txRelayRegtest branch from fa96670 to fadb718 Compare April 30, 2019 20:02
@jnewbery
Copy link
Contributor

Devs are less likely to run the tests (or a specific test) if it takes too long to run.

Totally agree with this. I doubt anyone runs the dbcache test for example.

When rebasing a PR with many commits, I would like to run the entire test suite over every commit to make sure that no intermediate commits break anything. Having very long running tests precludes this.

our unit or functional tests are horribly unsuitable to find actual issues with the p2p code in a scenario such as mainnet. The transaction workload is trivial and highly synthetic, not at all comparable to mainet.

Again, very much agree with this. My view is that the functional tests test functionality. If we want to catch p2p issues we should build a stress test and a soak test environment, and also run a bunch of nodes on mainnet with heavy diags/logging/analytics/etc.

@maflcko maflcko closed this Apr 30, 2019
@maflcko maflcko deleted the 1904-txRelayRegtest branch April 30, 2019 22:55
@laanwj
Copy link
Member

laanwj commented May 1, 2019

Devs are less likely to run the tests (or a specific test) if it takes too long to run. We might as well remove slow tests if only one or two devs run them on high-end hardware. Long tests are excluded on travis and I doubt that a bunch of developers on different architectures pass in --extended.

IMO fast tests are very good. It encourages running the tests more often, makes Travis faster, and usually indicates the tests don't spend a lot of time on waiting and polling loops, or doing unnecessary stuff that doesn't increase test coverage and causes potential random errors (and ever higher parallelism increases this latter risk by affecting delays randomly).

So in this case: yes, normal transaction propagation should be tested, but also, this is only necessary in part of the tests that deal with testing the P2P network. In the rest it's allowable to make some shortcuts, like disabling or speeding up randomized behavior.

What I don't like (in this particular approach) is that bitcoind accumulates more test-specific behavior and functions, which make it harder to maintain and which (as @gmaxwell says) blurs the line between unit and functional tests, so I personally prefer #15858's appraoch. That's not a hard NACK on this though. I was just missing the context when seeing that PR closed.

@maflcko maflcko restored the 1904-txRelayRegtest branch May 1, 2019 17:38
@maflcko
Copy link
Member Author

maflcko commented May 1, 2019

I personally prefer #15858's appraoch

That other approach is going to bypass p2p tx relay completely. So is guaranteed to decrease test coverage.

This pull request is a one-line patch to net_processing (off by default) so that both scenarios are tested. This is increasing test coverage, as I explained above.

fanquake added a commit that referenced this pull request Nov 7, 2019
0e7c90e test: speed up wallet_avoidreuse.py (Jon Atack)
6d50b26 test: add logging to wallet_avoidreuse.py (Jon Atack)

Pull request description:

  Inspired by PRs #17340 and #15881.

  - add logging
  - pass -whitelist in `set_test_params` to speed up transaction relay

  `wallet_avoidreuse.py` is not intended to test P2P transaction relay/timing, so it should be fine to do this here. This reduces test run time variability and speeds up the test by 2-3 times on average.

  Test run times in seconds:

  - before: 20, 24, 22, 17, 27, 40, 30

  - after: 10, 10, 8, 9, 10, 7, 8

ACKs for top commit:
  MarcoFalke:
    ACK 0e7c90e 🐊
  fanquake:
    ACK 0e7c90e

Tree-SHA512: 6d954a0aaf402c9594201626b59d29263479059e68fa5155bb44ed973cd0c3347729dd78b78b4d5a2275e45da365dc1afb4cc7e3293dea33fcc2e3e83a39faf5
@maflcko maflcko deleted the 1904-txRelayRegtest branch November 26, 2021 11:16
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 19, 2022
0e7c90e test: speed up wallet_avoidreuse.py (Jon Atack)
6d50b26 test: add logging to wallet_avoidreuse.py (Jon Atack)

Pull request description:

  Inspired by PRs bitcoin#17340 and bitcoin#15881.

  - add logging
  - pass -whitelist in `set_test_params` to speed up transaction relay

  `wallet_avoidreuse.py` is not intended to test P2P transaction relay/timing, so it should be fine to do this here. This reduces test run time variability and speeds up the test by 2-3 times on average.

  Test run times in seconds:

  - before: 20, 24, 22, 17, 27, 40, 30

  - after: 10, 10, 8, 9, 10, 7, 8

ACKs for top commit:
  MarcoFalke:
    ACK 0e7c90e 🐊
  fanquake:
    ACK 0e7c90e

Tree-SHA512: 6d954a0aaf402c9594201626b59d29263479059e68fa5155bb44ed973cd0c3347729dd78b78b4d5a2275e45da365dc1afb4cc7e3293dea33fcc2e3e83a39faf5
@bitcoin bitcoin locked and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.