-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Issue
Running go test -v -run=TestRouterPaymentStateMachine/MP_terminal inside routing package produces a test flake.
--- FAIL: TestRouterPaymentStateMachine (5.99s)
--- FAIL: TestRouterPaymentStateMachine/MP_terminal (5.09s)
payment_lifecycle_test.go:1015: got no payment result
Cause
Inside the test MP terminal, we fire four shards. The first shard will fail with a terminal error, while the rest will fail with a temp error. In the end, we want to have channeldb.FailureReasonPaymentDetails to be returned, while it's not always the case.
Inside the function resumePayment(), each time we fire a shard, we call shardHandler.collectResultAsync(attempt) in the end, which, as the name suggests it's asynced. Inside the function, relevant to the context, it has two steps,
- Step one, fail the
HTLCAttempt. - Step two, fail the payment if decide not to retry. It will send the expected error to channel
p.shardErrors, block until the msg is received.
Back to the function resumePayment(), we can only get the expected error in the following lines,
- When we call
checkShards(), which reads the error from channelp.shardErrors. - When we hit
case state.terminate && state.numShardsInFlight == 0, where the payment is terminated and we have no shards in flight. - When we hit
case state.terminate || state.remainingAmt == 0, where either we are in the terminate state, or there are no more shards to fire. We would then callwaitForShard()to get the error fromp.shardErrors. - Finally here after we call
RequestRouteand it finds no path plus we have shards in flight. We would then callwaitForShard()to get the error fromp.shardErrors.
Now image the edge case. We fire shard#1, which should have failed the HTLCAttempt and the payment. However, inside the collectResultAsync, it only finished step one but never step two, so it didn't send an error to p.shardErrors yet, which means,
checkShards()will return nil.- the
HTLCAttemptis failed, we won't hit conditionstate.remainingAmt == 0. - the payment is not failed, hence we won't hit
case state.terminate.
What's left is the last choice, we will call RequestRoute again. And this is where the issue comes from, we can't do it now because the test requires a routeRelease step, otherwise the function will block and the test will fail.