Skip to content

horizontal-scaling: message distributor service client#647

Merged
dubious90 merged 8 commits intoenvoyproxy:mainfrom
oschaaf:horizontal-scaling-distributor-client
Mar 19, 2021
Merged

horizontal-scaling: message distributor service client#647
dubious90 merged 8 commits intoenvoyproxy:mainfrom
oschaaf:horizontal-scaling-distributor-client

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Mar 15, 2021

  • Adds the wire skeleton of the api for a service which distributes
    messages to other load gen services & sinks.
  • Adds a gRPC client implementation + unit tests

Signed-off-by: Otto van der Schaaf ovanders@redhat.com

Otto van der Schaaf added 3 commits March 15, 2021 13:25
- Adds the wire skeleton of the api for a service which distributes
  messages to other load gen services & sinks.
- Adds a gRPC client implementation + unit tests

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
…distributor-client

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf marked this pull request as ready for review March 16, 2021 21:52
@oschaaf oschaaf added the waiting-for-review A PR waiting for a review. label Mar 16, 2021
@oschaaf oschaaf changed the title horizontal-scaling: message distributor horizontal-scaling: message distributor service client Mar 16, 2021
@dubious90
Copy link
Copy Markdown
Contributor

@eric846 please review and assign back to me once done.

@dubious90 dubious90 requested a review from eric846 March 17, 2021 01:40
Otto van der Schaaf added 2 commits March 17, 2021 20:38
…distributor-client

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
eric846
eric846 previously approved these changes Mar 17, 2021
Copy link
Copy Markdown
Contributor

@eric846 eric846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

EXPECT_CALL(*mock_reader_writer, Read(_)).WillOnce(Return(true)).WillOnce(Return(false));
// Capture the Nighthawk request DistributedRequest sends on the channel.
EXPECT_CALL(*mock_reader_writer, Write(_, _))
.WillOnce(::testing::DoAll(::testing::SaveArg<0>(&request), Return(true)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need ::testing::

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in e1d6f59

return mock_reader_writer;
});

::nighthawk::DistributedRequest distributed_request;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the style guide I believe they want us to omit the leading :: except in using statements. There are also some occurrences of leading :: in other files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know that, and you're right, this crept into other files. Cleaned it up here in e1d6f59, filed #652 to track cleaning up the other places.

absl::StatusOr<DistributedResponse> distributed_response_or =
client.DistributedRequest(mock_nighthawk_service_stub, distributed_request);
EXPECT_TRUE(distributed_response_or.ok());
ASSERT_TRUE(request.has_execution_request());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With protos it would be safe to skip these ASSERTs and go right to the last EXPECT.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(depending on how valuable it is to know which level was missing)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminated these lines in e1d6f59, leaning towards less code

@eric846 eric846 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Mar 17, 2021
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Mar 17, 2021
// turn delegate to one or more other services for actual handling.
message DistributedRequest {
oneof distributed_request_type {
client.ExecutionRequest execution_request = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we'd need to distribute sink requests among multiple sinks. Can you help clarify that a little?

My confusion is twofold here:

  1. Don't sink requests just return values stored by previous execution requests? If DistributedResponse isn't even returning anything, why would you ever go through this medium?
  2. Isn't the purpose of the sink to be a single point of truth where you can determine the whole set of results from multiple execution responses across multiple nighthawks? If so, isn't supporting a distributed architecture for them possibly defeating their very purpose?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. DistributedResponse is empty in this PR, its contents will be added in follow up(s).
  2. When a DistributedRequest wraps a SinkRequests, the number of services must equal precisely one. I can see the confusion, but it doesn't seem to fit into the declarative validation model to annotate this as such.

Some thoughts:

Initially the idea was to have the distributor be a "generic" service which could be described as one that would accept arbitrary messages and forward them as-is to 1..n services within the cluster, and subsequently stream back the replies (wrapping these reply messages to annotate them with which internal service was associated to replies on the stream). Consensus was that it would be better to not attempt the generic approach and instead be specific, hence the explicit wrapper messages we use for distributed requests & replies. With that, the possibility to address > 1 sinks through the distributor stands out, and I think that is what is causing confusion. I'd be up for iterating on suggestions to enhance the modelling here. The services field is shared across the message types that can be distributed, but the constraints on this field varies across message types, and right now this "knowledge" is implicit in clients consuming the distributor service.

For clarity and ensuring we're all on the same page, here's a diagram of what this is converging to:

image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation @oschaaf. I am assuming that when SinkRequest is sent, the client is trying to consume data stored in the sink service, rather than write them. With that I would like us to consider two suggestions that would remove this confusion.

  1. Does the client need to access the Sink service via the Distributor service, or are they chained just for convenience? Assuming the Distributor doesn't provide any critical functionality to the client - there is nothing wrong with dialing the Sink from the client directly. Would that work?
  2. If we do identify critical functionality that the Distributor needs to provide to the client when retrieving the results, I would suggest we make the approach more specific by defining a second RPC method solely for the purposes of getting the results from the Sink.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a convenience thing. Option 1. would be low impact to move to, I will lean the api in this PR up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2206035

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated diagram:

image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating here Otto!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended alternate/diagram that includes the adaptive load controller:

image

In this flow the adaptive load controller runs behind nighthawk_service, which avoids the need to teach the adaptive load controller about sink and distributor services. (nighthawk_service will do that on its behalf).

import "validate/validate.proto";

// Perform an execution request through an intermediate service that will in
// turn delegate to one or more other services for actual handling.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify in this comment that we are duplicating one request across multiple services? (Distributing to my understanding could also mean that we are choosing one service from your list to send it to)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this in 4185e3c, let me know if it looks better

virtual ~NighthawkDistributorClient() = default;

/**
* @brief Propagate messages to one or more other services for handling.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need brief here, because this is already only one line.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4185e3c

*
* @param nighthawk_distributor_stub Used to open a channel to the distributor service.
* @param distributed_request Provide the message that the distributor service should propagate.
* @return absl::StatusOr<::nighthawk::DistributedResponse> Either a status indicating failure, or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/optional: can we add a line break between the @params and @return? Ignore if this is unusual in nighthawk

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this as-is, because I checked, and we don't tend to do that in this repository.

// response messages.
service NighthawkDistributor {
// Propagate the message wrapped in DistributedRequest to one or more other services for handling.
rpc DistributedRequestStream(stream DistributedRequest) returns (stream DistributedResponse) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these just streams because ExecutionRequest/ExecutionResponse are streams?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the response stream: when initiating a load test and requesting it to start on multiple load gen services, you'll get back a stream of replies from each instance.
For the request stream: a client can re-use the stream to subsequently initiate a load test and query results from the sink.

@dubious90 dubious90 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Mar 18, 2021
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Mar 18, 2021
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Mar 18, 2021
@oschaaf oschaaf added the waiting-for-review A PR waiting for a review. label Mar 18, 2021
@dubious90 dubious90 merged commit 211b3b5 into envoyproxy:main Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-for-review A PR waiting for a review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants