test(sequencer-relayer): reinstate black box tests by Fraser999 · Pull Request #1033 · astriaorg/astria

Fraser999 · 2024-05-01T21:42:08Z

Summary

The black box tests were removed in a recent PR. This PR reinstates them using the new mock gRPC framework astria-grpc-mock.

Background

The tests would have needed heavy refactoring in #963, and we wanted to avoid adding a lot more code to that already-large PR. We decided to temporarily delete them and reinstate them using astria-grpc-mock to mock responses from the Celestia app.

Changes

The tests removed in #963 and a single test briefly added in #1001 then removed again have been restored. I also added a new test to check the relayer shuts down in a timely manner.

Previously the tests leaned heavily on counts of blobs received by the mock Celestia app. The current tests retain these checks, but also query the /status endpoint of the sequencer-relayer to confirm its state. There was also a single test previously which checked the state of the postsubmit.json file. I didn't think this was necessary given that we're querying the state in a more "black box" way now (via the http server), but I didn't remove the function to perform this check (TestSequencerRelayer::assert_state_files_are_as_expected) pending a decision on whether to reinstate that check or not inside the later_height_in_state_leads_to_expected_relay test.

As @SuperFluffy predicted, all the new protobuf packages added in #963 had to be added to the pbjson_build builder to generate the serde impls required for using them in astria-grpc-mock. This accounts for the vast majority of the new LoC in this PR.

I made a few small changes to the mock framework:

added Mock::up_to_n_times to avoid failures due to a single-use mock response being sent multiple times
added DynamicResponse to support constructing mock responses which can be passed the relevant gRPC request
changed MockGuard::wait_until_satisfied to take self by ref rather than value, since if it consumes self and wait_until_satisfied times out, the MockGuard panics in its Drop impl, meaning we don't get a good indication from e.g. a tokio::time::timeout as to what went wrong

Most of the new functionality lives in MockCelestiaAppServer and TestSequencerRelayer. For the former, all gRPCs except for BroadcastTx and GetTx are set up to always return valid-enough responses - i.e. these don't need to be mounted individually in every test case. In the case of MockCelestiaAppServer, the new functionality is a combination of support for mounting mock responses and functions with timeouts to query the /status, /healthz and /readyz http endpoints of the sequencer-relayer.

Testing

These changes are tests.

Closes #1008.

Fraser999 · 2024-05-01T21:49:00Z

crates/astria-sequencer-relayer/src/relayer/read.rs

+    let maybe_block = SequencerBlock::try_from_raw(block.into_inner())
+        .wrap_err("failed to parse raw proto block from grpc response");
+    state.set_sequencer_connected(maybe_block.is_ok());


This is the only change to production code, so worth highlighting :)

I think this is the correct behaviour, since if we return an error here, we exit the select! loop in Relayer::run and are hung until externally killed. While in that state, I think it makes sense to have the sequencer noted as disconnected so that the /healthz endpoint reports degraded health.

Are we hanging until externally killed or do we just shut down gracefully?

Hanging here. We could do self.shutdown_token.cancel(); just before that if we do want to actually shut down?

I see - yeah. Good catch. We should do that. It's outside the scope of this PR I'd say, but relayer knows that if the read task fails no progress will be made. So it should just shut everything down.

crates/astria-sequencer-relayer/src/relayer/read.rs

SuperFluffy

It's very nice to have these back. I also the additions like the custom timeout_ms that reports /status in case of failures (the contents of which we should probably expand on :-) ).

I believe some mocks that currently return guards should not do so (I am thinking of the ABCI info one and also the sequencer-block fetch; I might have missed one). In my mind, "intermittent" mocks should just be mounted without any extra requirements when and in what order they are picked up by relayer - only the final result (blocks posted to Celestia) should actually be awaited. This would leave the order and timing "inside" relayer opaque.

Also, I think the Celestia-App checks (e.g. how many blocks were received) could be replaced by mocks.

But I also realize that both requirements are not necessarily trivial and might be beyond the scope of this PR, which stays very close to the original hand-written mock. I think these would be good to cover in future PRs.

I think with this PR you can also remove astria-celestia-client and astria-celestia-mock?

tools/protobuf-compiler/src/main.rs

crates/astria-sequencer-relayer/tests/blackbox/helpers/mock_sequencer_server.rs

crates/astria-sequencer-relayer/tests/blackbox/main.rs

SuperFluffy · 2024-05-03T11:48:49Z

crates/astria-sequencer-relayer/tests/blackbox/main.rs

+    // Mount a bad block next, so the relayer will fail to fetch the block.
+    let _guard = sequencer_relayer.mount_abci_response(2).await;
+    let block_to_mount = SequencerBlockToMount::BadAtHeight(2);
+    let block_guard = sequencer_relayer


I think it's not really necessary to have this guard here - if we confirm /healthz before (which we do), then mount the abci + block responses, and finally observe 500 on /healthz that's enough. What do you think?

That would allow removing the guard returned by mount_sequencer_block_response

If we did the same for the abci one (i.e. not return a guard for it), I think that should work.

Actually, I take that back :) Without the scoped guards being awaited, the test exits before the mocks have registered that they've been consumed, so they fail the test.

I think I can get away with making the abci one not scoped, and have two methods for the sequencer block - one scoped and the other not.

Done in d64481e.

crates/astria-sequencer-relayer/tests/blackbox/helpers/mock_sequencer_server.rs

SuperFluffy · 2024-05-03T12:04:36Z

crates/astria-grpc-mock/src/mock.rs

    }

+    #[must_use = "a mock must be mounted on a server to be useful"]
+    pub fn up_to_n_times(mut self, n: u64) -> Self {


This is the original wiremock functionality that we hadn't yet ported over right?

Yup, indeed.

SuperFluffy · 2024-05-03T12:07:23Z

crates/astria-grpc-mock/src/mock_server.rs


 impl MockGuard {
-    pub async fn wait_until_satisfied(self) {
+    pub async fn wait_until_satisfied(&self) {


What makes it necessary to take this by reference? The timeout?

I actually thought of it as a feature that it gets dropped at the end of the scope, immediately returning the panic/report. But I suppose that wasn't useful and/or blocking some other functionality?

If we had e.g.

tokio::time::timeout(duration, guard.wait_until_satisfied()).expect("shouldn't timeout");

then what was happening if wait_until_satisfied exceeded the duration was:

tokio::time::timeout expired and cancelled the wait_until_satisfied future

wait_until_satisfied had consumed the MockGuard, so when it was cancelled, the guard was dropped

the Drop impl of the guard panicked due to unsatisfied condition

we never get as far as executing .expect("shouldn't timeout") since we've panicked

I found this unexpected initially - I was expecting to see "shouldn't timeout" in the panic message, and then the report about the unsatisfied mock. That's what happens now by taking &self in wait_until_satisfied.

SuperFluffy · 2024-05-03T12:11:03Z

crates/astria-grpc-mock/src/response.rs

    }
 }

+pub fn dynamic_response<I, O, F>(responder: F) -> DynamicResponse<I, O, F>


I wonder if we can implement e.g.:

impl<I, O, F> Respond for F, where F: Fn(&I) -> O, O: erased_serde::Serialize + prost::Name + Clone + 'static,

Maybe another indirection would be necessary like MakeDynamicResponse and then implemeent Respond for MakeDynamicResponse. Gonna think about how to do that.

Yeah - that was my initial approach, but I and O are unconstrained type parameters so it doesn't fly :)

I played with this for a while but the explicit dynamic_response function seemed like the path of least resistance. Maybe there's a cleaner way though?

SuperFluffy · 2024-05-03T12:11:42Z

crates/astria-sequencer-relayer/src/relayer/read.rs

+    let maybe_block = SequencerBlock::try_from_raw(block.into_inner())
+        .wrap_err("failed to parse raw proto block from grpc response");
+    state.set_sequencer_connected(maybe_block.is_ok());


Are we hanging until externally killed or do we just shut down gracefully?

crates/astria-sequencer-relayer/src/relayer/read.rs

Fraser999 · 2024-05-03T16:00:28Z

@SuperFluffy

I think with this PR you can also remove astria-celestia-client and astria-celestia-mock?

Done in c9a5841.

reinstate sequencer-relayer black box tests

70cc672

Fraser999 requested a review from a team as a code owner May 1, 2024 21:42

Fraser999 requested a review from SuperFluffy May 1, 2024 21:42

github-actions bot added proto pertaining to the Astria Protobuf spec sequencer-relayer pertaining to the astria-sequencer-relayer crate labels May 1, 2024

Fraser999 commented May 1, 2024

View reviewed changes

crates/astria-sequencer-relayer/src/relayer/read.rs Outdated Show resolved Hide resolved

Fraser999 and others added 3 commits May 1, 2024 22:52

Merge branch 'main' into fraser/restore-relayer-black-box-tests

726ec04

fix unused warning

165adf7

Merge branch 'main' into fraser/restore-relayer-black-box-tests

34b631a

SuperFluffy approved these changes May 3, 2024

View reviewed changes

Fraser999 added 5 commits May 3, 2024 14:01

tidied up list of protobuf packages in compiler tool

78fb6df

make block corruption in test guaranteed

c2aac68

avoid returning mock guards when they're not used

d64481e

return an error if sequencer provides wrong block

d198996

remove unused crates

c9a5841

Fraser999 requested a review from a team as a code owner May 3, 2024 15:57

Fraser999 requested a review from aajimal May 3, 2024 15:57

update dependency 'pest' to silence audit warning

e92dac7

Fraser999 added this pull request to the merge queue May 3, 2024

Merged via the queue into main with commit 83d23ab May 3, 2024

Fraser999 deleted the fraser/restore-relayer-black-box-tests branch May 3, 2024 16:11

Conversation

Fraser999 commented May 1, 2024

Summary

Background

Changes

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SuperFluffy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fraser999 May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fraser999 commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fraser999 May 3, 2024 •

edited

Loading