config: ensure Config::Utility::getAndCheckTransportVersion is called on main thread.#15548
config: ensure Config::Utility::getAndCheckTransportVersion is called on main thread.#15548htuch merged 10 commits intoenvoyproxy:mainfrom
Conversation
… on main thread. During the v2 -> v3 migration, some uses of this were added inside filter factory lambdas and could occur on worker threads, rejecting config on the data plane. Relevant to envoyproxy#10943 Risk level: Low Testing: New assertion in method caused existing tests to fail, fix addresses these. Also manually audited via grep. Signed-off-by: Harvey Tuch <htuch@google.com>
|
/retest |
|
Retrying Azure Pipelines: |
|
@htuch I think this needs a main merge, there is a conflict. /wait |
Signed-off-by: Harvey Tuch <htuch@google.com>
|
@jmarantz |
|
Which assert? I didn't see any obvious assert logs when I clicked through
that link.
My first guess it that something in the test is allowing a SymbolTable
while some of the stats are still being referenced.
…On Sun, Mar 21, 2021 at 10:03 PM htuch ***@***.***> wrote:
@jmarantz <https://github.com/jmarantz> filter_fuzz_test keeps flaking on
a symbol table ASSERT in linux_x64 gcc, see
https://dev.azure.com/cncf/envoy/_build/results?buildId=69835&view=logs&j=8bf29878-a4cc-50f7-4e84-2255e6fd4065&t=3267cd16-a387-57e0-0ef4-d7a4d1c6c089.
This seems totally unrelated to this change, but keeps popping up when I
kick CI. Any thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15548 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO2IPNVBUDPYTYSG5HQSVLTE2QODANCNFSM4ZLUHHXA>
.
|
|
Also there are several filter_fuzz_test targets in the hierarchy. Which one
is failing for you?
On Sun, Mar 21, 2021 at 10:39 PM Joshua Marantz ***@***.***>
wrote:
… Which assert? I didn't see any obvious assert logs when I clicked through
that link.
My first guess it that something in the test is allowing a SymbolTable
while some of the stats are still being referenced.
On Sun, Mar 21, 2021 at 10:03 PM htuch ***@***.***> wrote:
> @jmarantz <https://github.com/jmarantz> filter_fuzz_test keeps flaking
on
> a symbol table ASSERT in linux_x64 gcc, see
>
https://dev.azure.com/cncf/envoy/_build/results?buildId=69835&view=logs&j=8bf29878-a4cc-50f7-4e84-2255e6fd4065&t=3267cd16-a387-57e0-0ef4-d7a4d1c6c089
.
> This seems totally unrelated to this change, but keeps popping up when I
> kick CI. Any thoughts?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#15548 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAO2IPNVBUDPYTYSG5HQSVLTE2QODANCNFSM4ZLUHHXA
>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#15548 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO2IPNAAVAGYOOG7ETZUNDTE2UXRANCNFSM4ZLUHHXA>
.
|
|
That assert is just saying that there are outstanding symbols when the symbol table was destructed. The stack-trace tells us that the symbol-table was being destructed from the factory-context. Is it possible that that this test was flaking with or without this PR's changes? |
Signed-off-by: Harvey Tuch <htuch@google.com>
|
Is it possible that the fuzz test is holding onto a reference to a stat in a thread that persists pass the destruction of the factories? Looking over the PR, I'm wondering what the lifetime of the LoggerCache is. Does that hold onto a Stats::Scope? Nothing obvious in this PR indicates to me that the lifetime of that has changed, but it looks like the thing that's been touched that might have an interesting issue with its lifetime in tests. |
|
The only test that I can believe might be affected is the rate limit HTTP filter, since this is an HTTP filter fuzz test. The change there seems innocuous. @asraa for an extra pair of eyes. I'll get it replicating and go deeper as next step. |
About this and lifetime, can you repro it locally? ACTUALLY (fortunately i procrastinated pressing comment)! maybe something is exiting early from what if you try adding a reset in the try/catch early return or adding it to the filter fuzzer destructor? |
|
@asraa why is https://github.com/envoyproxy/envoy/blob/main/test/extensions/filters/http/common/fuzz/filter_fuzz_test.cc#L53 setup for persistence? I think for the unit test style fuzzers (rather than e2e ones), we would be better off with non-persistence if possible to allow earlier detection of these issues. I can repro locally and digging further into this one, the issue is with ratelimit and test/extensions/filters/http/common/fuzz/filter_corpus/clusterfuzz-testcase-minimized-filter_fuzz_test-6484279454466048. |
|
Also that's a non-pod-static violation, fwiw.
…On Thu, Mar 25, 2021 at 10:25 AM htuch ***@***.***> wrote:
@asraa <https://github.com/asraa> why is
https://github.com/envoyproxy/envoy/blob/main/test/extensions/filters/http/common/fuzz/filter_fuzz_test.cc#L53
setup for persistence? I think for the unit test style fuzzers (rather than
e2e ones), we would be better off with non-persistence if possible to allow
earlier detection of these issues. I can repro locally and digging further
into this one.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15548 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO2IPN7VHJ4RNMAMGODQ2TTFNBT3ANCNFSM4ZLUHHXA>
.
|
Signed-off-by: Harvey Tuch <htuch@google.com>
|
Well, the good news is I have a fix, the bad news is I don't understand why it works; see 1e1ac2a. Basically, |
I'm no expert but I think lambda captures are not deterministic (C++ sigh), so I think the way to make it deterministic is to pre-capture into a struct and then pass that. I know @lambdai has been fixing similar issues so might be worth doing that here? |
|
Still not sure how that relates though; even if non-deterministic, if we take the ref count, it should safely destruct presumably and decrement, if we don't take it then there should not be an issue. |
Honestly I think you are hitting a compiler bug. Given that lambda capture order is not deterministic I would probably just switch to a struct anyway and be done with it, but up to you. |
MIssed this but worth the comment: this class just contains mock objects that could persist (and state's meant to be reset at the end for this class) for speed reasons. |
During the v2 -> v3 migration, some uses of this were added inside filter factory lambdas and could
occur on worker threads, rejecting config on the data plane.
Relevant to #10943
Fixes #15083
Risk level: Low
Testing: New assertion in method caused existing tests to fail, fix addresses these. Also manually
audited via grep.
Signed-off-by: Harvey Tuch htuch@google.com