Fix telemetry rpc getting stuck if all nodes have bandwidth set to 0#3643
Conversation
|
what are the steps to reproduce this issue? |
|
|
I think this bug is caused by this if statement, which removes the zero bandwidth values. If all bandwidth_cap are set 0 then Therefore, this should be reproducible by a unit test with 10 nodes having bandwidth set to 0. @fikumikudev are you willing to write such a unit test? |
|
Also, I do not think this bug is fully fixed by the proposed change. The function |
|
Yes, it looks like my fix is not complete. |
|
There are two ways to fix this I can think of, first is to rewrite |
|
I think improving strip_outliers_and_sum to handle any number of elements is the right way to do it. |
bf67968 to
af495ea
Compare
af495ea to
1aee398
Compare
|
@dsiganos I fixed the |
|
Yeap, I'll look at it now. I pushed a PR to your election scheduler RPC PR branch. thank you for your contributions, they are great! |
I encountered this bug when I was doing some testing on private network, where all nodes have bandwidth set to 0. In that case the 'bandwidths' set is empty and std::next() has undefined behavior (gets stuck in infinite loop).