Skip to content

[utility]: wrap try in an assertion to ensure that exception are only caught on main thread #15251

Merged
jmarantz merged 50 commits intoenvoyproxy:mainfrom
chaoqin-li1123:try_with_assert_macros
Mar 30, 2021
Merged

[utility]: wrap try in an assertion to ensure that exception are only caught on main thread #15251
jmarantz merged 50 commits intoenvoyproxy:mainfrom
chaoqin-li1123:try_with_assert_macros

Conversation

@chaoqin-li1123
Copy link
Copy Markdown
Contributor

@chaoqin-li1123 chaoqin-li1123 commented Mar 2, 2021

Commit Message: Define a macros that wrap try with an assertion that exceptions are only caught in main thread. Currently, envoy use c++ exception for error propagation. This PR is one of the steps to address #14320. The long term goal is to disallow raw try in envoy core code and eliminate c++ exception from data plane, which can improve exception safety. The try in the PR happen on main thread and can be wrapped in main thread assertion without breaking any existing test. In the following PR, raw try that can not be replaced by TRY_ASSERT_MAIN_THREAD will be removed from core codebase with other error propagation.
Additional Description:
Risk Level:
Testing: none
Docs Changes: none
Release Notes: none
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

chaoqin-li1123 added 11 commits February 23, 2021 16:42
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

cc @mattklein123 for thoughts.

Also please ignore the mac CI timeout - we're looking into it!

@mattklein123 mattklein123 self-assigned this Mar 2, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this makes sense and looks great to me. Are you going to add a linter also?

/wait

@chaoqin-li1123
Copy link
Copy Markdown
Contributor Author

At a high level this makes sense and looks great to me. Are you going to add a linter also?

/wait

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

@mattklein123
Copy link
Copy Markdown
Member

mattklein123 commented Mar 2, 2021

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

My concern is that we may play whackamole until then. What about a TRY_NEEDS_AUDIT macro and then we can lint for try now?

@chaoqin-li1123
Copy link
Copy Markdown
Contributor Author

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

My concern is that we may play whackamole until then. What about a TRY_NEEDS_AUDIT macro and then we can lint for try now?

Yes, that makes sense. I will add format checking now with this workaround.

@htuch
Copy link
Copy Markdown
Member

htuch commented Mar 3, 2021

+1 on linting in CI. Thanks @chaoqin-li1123 this looks great.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
chaoqin-li1123 added 5 commits March 3, 2021 15:39
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with some small comments.

/wait

try {
MessageUtil::jsonConvertValue(*proto, val);
} catch (EnvoyException& ex) {
TRY_ASSERT_MAIN_THREAD { MessageUtil::jsonConvertValue(*proto, val); }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this is on workers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't break any test when I add the main thread assertion. maybe because of missing test coverage, I will take a look at the test coverage later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping please audit this. I'm 100% positive this happens on workers given the function params. (This feature should probably have an integration test anyway so maybe add that in a different PR?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will see whether some integration test can be added.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 2 remaining nits from my perspective.

thanks for pushing this through!

chaoqin-li1123 added 3 commits March 25, 2021 05:23
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with small comments, thanks!

/wait


if (completed_) {
if (!cancelled_) {
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate. This can happen during normal operation for some of these I think. Doesn't this need some AUDIT statement or CI should fail? Same below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in RAW_TRY_ALLOWLIST in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.

Do we need to allow the callback to throw? Or can we just have the callbacks return an error?

I assume it isn't c_ares that forces us to handle exceptions here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is executed in main thread and in dns filter, I think it is fine to allow a filter to throw and catch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, comment updated.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
mattklein123
mattklein123 previously approved these changes Mar 26, 2021
@chaoqin-li1123
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15251 (comment) was created by @chaoqin-li1123.

see: more, trace.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 this PR will not close the Issue -- it's just the first step. Then we have to go through and fix all the audits. But getting the mechanism in place is a great stepping stone.

Matt -- you approved, but I wanted to double check you are you OK to merge? There were a couple of unanswered comment threads I think.

@mattklein123
Copy link
Copy Markdown
Member

I'm fine to merge as long as we have a good plan for all the follow ups. @jmarantz I will defer to you for final review/merge, thanks!

@chaoqin-li1123
Copy link
Copy Markdown
Contributor Author

There are only 4 TRY_NEEDS_AUDIT left, I think I can clean them up in the followup PRs.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of unresolved comments and then we can merge.


if (completed_) {
if (!cancelled_) {
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in RAW_TRY_ALLOWLIST in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.

Do we need to allow the callback to throw? Or can we just have the callbacks return an error?

I assume it isn't c_ares that forces us to handle exceptions here?

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15251 (comment) was created by @chaoqin-li1123.

see: more, trace.

@jmarantz jmarantz merged commit cfbb1a8 into envoyproxy:main Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants