Skip to content

[Release 1.11.0][Core] avoid unnecessary work during event stats collection#22054

Merged
ericl merged 2 commits intomasterfrom
stats
Feb 3, 2022
Merged

[Release 1.11.0][Core] avoid unnecessary work during event stats collection#22054
ericl merged 2 commits intomasterfrom
stats

Conversation

@mwtian
Copy link
Copy Markdown
Member

@mwtian mwtian commented Feb 2, 2022

Why are these changes needed?

This PR avoids some unnecessary copying and branching when recording event stats. It improves / recovers ~10% of single_client_get_calls_Plasma_Store performance. On AWS EC2 m5.8xlarge,

  • single_client_get_calls_Plasma_Store current: ~5200/s
  • single_client_get_calls_Plasma_Store with PR: ~5800/s

When RAY_event_stats=0, single_client_get_calls_Plasma_Store can reach ~6800/s. If we want to optimize further, we can record data in opencensus only in intervals, or when the data are exported.

Related issue number

#22053

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mwtian mwtian changed the title [Release 1.11.0][Core] avoid unnecessary work for event stats [Release 1.11.0][Core] avoid unnecessary work during event stats collection Feb 2, 2022
Comment on lines +352 to +360
for (auto &[tag_key, tag_val] : tags) {
#ifndef NDEBUG
// In debug build, verify tag_val is printable.
for (auto &c : tag_val) {
RAY_CHECK(isprint(c)) << "Found unprintable character " << static_cast<int>(c)
<< " in " << tag_val;
}
combined_tags.emplace_back(TagKeyType::Register(t.first), std::move(t.second));
#endif // NDEBUG
combined_tags.emplace_back(TagKeyType::Register(tag_key), std::move(tag_val));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this code is very similar to the above code. Could we dedup them?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have chance to run this code in debug mode? I'm worried that it won't catch things if we always use NDEBUG.

Also if the message fail to send due to non-printable chars, do we get some error code that we can check fail?

Copy link
Copy Markdown
Member Author

@mwtian mwtian Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the logic to a separate function.

This logic may not run for all possible input in debug mode, but I'm not sure if verification / applying fixes are useful in production mode. Unprintable char in tag value is usually a sign of misconfiguration or data corruption. Exporting these data seem to have low value. I doubt any dashboard will be built with the ? character in metric / dimension name. And recording these data can result in high memory usage in opencensus. Dropping the malformed data as in opencensus seems like the best approach in production. Later on if it turns out opencensus does not log the malformed data, we can have verification with error logging for 1 in 1000 CheckPrintableChar() calls.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this loop costs 1.5~2% of the benchmark.

@ericl ericl merged commit 9ac3f68 into master Feb 3, 2022
@ericl ericl deleted the stats branch February 3, 2022 20:01
rkooo567 added a commit to rkooo567/ray that referenced this pull request Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants