Skip to content

horizontal-scaling: sink gRPC service + tests#663

Merged
dubious90 merged 7 commits intoenvoyproxy:mainfrom
oschaaf:horizontal-sclaing-sink-grpc-service
Apr 6, 2021
Merged

horizontal-scaling: sink gRPC service + tests#663
dubious90 merged 7 commits intoenvoyproxy:mainfrom
oschaaf:horizontal-sclaing-sink-grpc-service

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Mar 25, 2021

This wires in a lot of the gRPC Sink service functionality, but not all of it.
I trimmed it down somewhat so we can focus on wiring in most of the flesh of it
in a bite sized review.

Summarising statistics & handling of chunks of raw statistics data will
be added in one or more follow-ups to this.

Signed-off-by: Otto van der Schaaf ovanders@redhat.com

@oschaaf oschaaf force-pushed the horizontal-sclaing-sink-grpc-service branch from 214b269 to 4bb4a57 Compare March 26, 2021 09:34
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf force-pushed the horizontal-sclaing-sink-grpc-service branch from 4bb4a57 to 1aebb64 Compare March 26, 2021 12:55
@oschaaf oschaaf marked this pull request as ready for review March 26, 2021 14:57
@oschaaf oschaaf added the waiting-for-review A PR waiting for a review. label Mar 26, 2021
…sink-grpc-service

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@dubious90 dubious90 requested a review from eric846 March 29, 2021 15:53
@dubious90
Copy link
Copy Markdown
Contributor

@eric846 please review and assign back to me once done

while (request_reader->Read(&request)) {
ENVOY_LOG(info, "StoreExecutionResponseStream request {}", request.DebugString());
const nighthawk::client::ExecutionResponse& response_to_store = request.execution_response();
const auto status = sink_->StoreExecutionResultPiece(response_to_store);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be absl::Status?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

break;
}
}
grpc::Status grpc_status =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this absl::Status-to-grpc::Status conversion were a helper function (that also logged the status), it might allow you to return right out of the middle of the loop.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

absl::StatusOr<nighthawk::client::ExecutionResponse> response =
mergeExecutionResponses(request.execution_id(), responses);
status.Update(response.status());
if (status.ok()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop feels more complicated than it is because of the indentation and running status variable. As soon as something goes wrong, we basically exit the function entirely, right? Could you just return immediately whenever a call returns non-ok and flatten the loop to one level of indentation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified it for a bit in e12139c, using the suggestion above to create a helper. Let me know what you think.

if (!merge_target.has_options()) {
// If no options are set, that means this is the first part of the merge.
// Set some properties that shouldbe equal amongst all Output instances.
*(merge_target.mutable_options()) = input_to_merge.options();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for ( ) around the left side I believe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

}
}
// Append all input results into our own results.
for (const auto& result : input_to_merge.results()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could auto just be nighthawk::client::Result?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

absl::StatusOr<std::vector<ExecutionResponse>> response_from_mock_sink =
std::vector<ExecutionResponse>{{}};
ExecutionResponse& response = response_from_mock_sink.value().at(0);
response.mutable_execution_id()->assign(kTestId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be .set_execution_id(kTestId) as on request_, or is there a distinction? Same question about all uses of .assign(), which I'm seeing for the first time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.assign() accepts the abs::string_view we're passing it, with .set_execution_id we will have to create a std::string because passing the string_view directly will not work. Personally I find this slightly nicer to look at, but if you feel strongly about it I can change this here (in this PR, but there are some other spots where this is done too).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was afraid it would be string_view not working. No problem keeping .assign()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't have a problem here, but we aren't passing a string_view here, are we? we're just passing a const string.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, changed this in all the appropriate spots in 8492073

response.mutable_execution_id()->assign(kTestId);
response.mutable_output();
request_.set_execution_id(kTestId);
auto r = stub_->SinkRequestStream(&context_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this type is something like ClientReaderWriter<nighthawk::StoreExecutionRequest, nighthawk::StoreExecutionResponse> I think we still prefer to spell it out despite the verbosity. Also could you rename r to something more descriptive, possibly stream like https://grpc.io/docs/languages/cpp/basics/#streaming-rpcs?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

TEST(ResponseVectorHandling, NoResultsInOutputYieldsNone) {
ExecutionResponse result;
std::vector<ExecutionResponse> responses{result, result, result};
absl::StatusOr<ExecutionResponse> response = mergeExecutionResponses("", responses);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we label this /*execution_id=*/"" or just make the value "execution-id"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c


TEST(MergeOutputs, MergeDivergingOptionsInResultsFails) {
const std::string kTestId = "test-id";
std::vector<ExecutionResponse> responses;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responses and the variables response_1, response_2 seem unused in the mergeOutput tests. Could the tests just create nighthawk::client::Output variables directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, cleaned this up in e12139c

while (stream->Read(&request)) {
ENVOY_LOG(trace, "Inbound SinkRequest {}", request.DebugString());
absl::StatusOr<std::vector<nighthawk::client::ExecutionResponse>>
status_or_execution_responses = sink_->LoadExecutionResult(request.execution_id());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A style accepted elsewhere for limiting the verbosity of StatusOr:

  • omit status and _or from the variable name (compiler will protect against losing track of what is and isn't a StatusOr)
  • no need to unpack the value into a second variable
  • instead of .value(), use the overloaded * operator (if implemented)
  • instead of .value()., use the overloaded -> operator (if implemented)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e12139c

@eric846 eric846 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Mar 30, 2021
Otto van der Schaaf added 2 commits March 30, 2021 17:40
…sink-grpc-service

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Mar 30, 2021
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Mar 30, 2021

Thank you for the review @eric846, this is ready for another pass.

eric846
eric846 previously approved these changes Mar 30, 2021
Copy link
Copy Markdown
Contributor

@eric846 eric846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
return grpc::Status::OK;
};

grpc::Status SinkServiceImpl::abslStatusToGrpcStatus(const absl::Status& status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of being a private method on this class, can this just be a function in the anonymous namespace?

(unless i'm missing something, this function doesn't use anything on the object itself)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8492073

SinkServiceImpl::SinkServiceImpl(std::unique_ptr<Sink> sink) : sink_(std::move(sink)) {}

grpc::Status SinkServiceImpl::StoreExecutionResponseStream(
grpc::ServerContext*, grpc::ServerReader<nighthawk::StoreExecutionRequest>* request_reader,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a null pointer check for request_reader? (return invalid argument if it's nullptr)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a RELEASE_ASSERT here (and also in the other gRPC method): this gets generated AFAICT there's no way to hit this in tests.

nighthawk::StoreExecutionRequest request;
while (request_reader->Read(&request)) {
ENVOY_LOG(info, "StoreExecutionResponseStream request {}", request.DebugString());
const nighthawk::client::ExecutionResponse& response_to_store = request.execution_response();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can just inline this in the next line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ca57043

const absl::Status status = sink_->StoreExecutionResultPiece(response_to_store);
if (!status.ok()) {
ENVOY_LOG(error, "StoreExecutionResponseStream failure: {}", status.ToString());
return grpc::Status(grpc::StatusCode::INTERNAL, status.ToString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a good place for your abslStatusToGrpcStatus?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8492073

*merge_target.mutable_timestamp() = input_to_merge.timestamp();
*merge_target.mutable_version() = input_to_merge.version();
} else {
// Options used should not diverge for a executions under a single execution id.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/grammar: remove a

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8492073

}
// Append all input results into our own results.
for (const nighthawk::client::Result& result : input_to_merge.results()) {
merge_target.add_results()->MergeFrom(result);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to my knowledge we don't have code anywhere that alters result names based on which nighthawk they came from. As a result, I believe that if you're using three nighthawks, you'd wind up with three results with the name "global" and if those nighthawks each had two workers, you'd also have three of each "worker1" and "worker2" (or however they're named).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch, this looses some information unintentionally. Thank you for catching that!

I think we could improve here by having a new field in nighthawk::client::Result, where we can tag the originating (listening address of the) associated load gen service, so we can trace back where the result came from.. As it stands, we do know due to the way that this is structured that the results are distinct, but we do not know which service was responsible for what results. I'll propose something to address this.

// Append all input results into our own results.
for (const nighthawk::client::Result& result : input_to_merge.results()) {
merge_target.add_results()->MergeFrom(result);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the above, I'm concerned about this more architecturally. I would have thought the purpose of the full Execution Response would be to allow aggregation of all of the responses into a single cross-nighthawk global result object.

As it is, isn't this now a massive memory risk for the Sink service? Will it run out of memory holding onto all of these execution responses.

Without the ability to combine all results into a single global result, it seems to me that this would be safer if we instead provided a stream of execution responses back. That way, we wouldn't have to hold onto the memory for all of these different (potentially large) protos at the same time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we act on any of this, I want to get @mum4k 's eyes on it. Because he's been more involved with you on this design, and we don't want to expand scope at all.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the concern here; however in the current state there are no "raw" statistics being transferred,
just the summaries (one for each worker thead involved + a summary per process/load-gen-service).

The raw statistics have been what I was thinking about mostly when being concerned about memory.
That will be addressed transparantly to this service when we introduce it, but I can see that is hard to assess by just looking at this PR: I have tried to intentionally build up the functionality to keep review more fun and focussed.
We don't tag on an aggregated summary across all load gen services yet, and as such we don't have the raw stats being used yet.

Having said that: if we should anticipate volumes of results to an extent that this is a problem, then your concern is very valid, and my assumption here is off. In that case I'd be happy to iterate. I think that is solvable in various ways without too much effort.

Lastly, if we can, I do have a preference: I think that it is useful to propagate the information at this level by default. The summaries can sometimes be misleading -- at least the way we do them right now. A practical example I've seen in the wild: one node running a cpu hog (prometheus), significantly impacting compute capacity of a load generator running on the same node. It's not a silver bullet, but being able to look at all the details improves chances of catching hints of such things.

absl::StatusOr<std::vector<ExecutionResponse>> response_from_mock_sink =
std::vector<ExecutionResponse>{{}};
ExecutionResponse& response = response_from_mock_sink.value().at(0);
response.mutable_execution_id()->assign(kTestId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't have a problem here, but we aren't passing a string_view here, are we? we're just passing a const string.

@dubious90 dubious90 self-assigned this Apr 5, 2021
@dubious90 dubious90 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Apr 5, 2021
Otto van der Schaaf added 2 commits April 6, 2021 10:13
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
Signed-off-by: Otto van der Schaaf <ovanders@redhat.com>
@dubious90 dubious90 merged commit 228d91f into envoyproxy:main Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-for-changes A PR waiting for comments to be resolved and changes to be applied.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants