-
Notifications
You must be signed in to change notification settings - Fork 85
[BUG] Getting DeadlineTimeoutException error during security tests after upgrading to 3.0 #580
Description
What is the bug?
With the upgrade to 3.0 in #556, the dependency on org.apache.httpcomponents:httpclient and org.apache.httpcomponents:httpcore changed to org.apache.httpcomponents.client5:httpclient5 and org.apache.httpcomponents.core5:httpcore5 respectively (these are coming from OpenSearch core).
After this upgrade we're starting to see some of the tests fail during the security test CI workflow with the following error:
REPRODUCE WITH: ./gradlew ':notifications:integTest' --tests "org.opensearch.integtest.SecurityNotificationIT.test get slack notification config without get Notification permission" -Dtests.seed=C977BA49C8625070 -Dtests.security.manager=false -Dtests.locale=be -Dtests.timezone=Asia/Yakutsk -Druntime.java=17
org.opensearch.integtest.SecurityNotificationIT > test get slack notification config without get Notification permission FAILED
java.lang.RuntimeException: error while performing request
at __randomizedtesting.SeedInfo.seed([C977BA49C8625070:44430BFE7A23F02]:0)
at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:1045)
at org.opensearch.client.RestClient.performRequest(RestClient.java:341)
at org.opensearch.client.RestClient.performRequest(RestClient.java:329)
at org.opensearch.integtest.PluginRestTestCase.createUserRolesMapping(PluginRestTestCase.kt:185)
at org.opensearch.integtest.PluginRestTestCase.createUserWithCustomRole(PluginRestTestCase.kt:256)
at org.opensearch.integtest.SecurityNotificationIT.test get slack notification config without get Notification permission(SecurityNotificationIT.kt:304)
Caused by:
org.apache.hc.core5.util.DeadlineTimeoutException: Deadline: 292278994-08-17T16:12:55.807+0900, 9223370365877384600 MILLISECONDS overdue
at app//org.apache.hc.core5.util.DeadlineTimeoutException.from(DeadlineTimeoutException.java:49)
at app//org.apache.hc.core5.pool.StrictConnPool.lease(StrictConnPool.java:217)
at app//org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager$1.<init>(PoolingAsyncClientConnectionManager.java:231)
at app//org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager.lease(PoolingAsyncClientConnectionManager.java:227)
at app//org.apache.hc.client5.http.impl.async.InternalHttpAsyncExecRuntime.acquireEndpoint(InternalHttpAsyncExecRuntime.java:100)
at app//org.apache.hc.client5.http.impl.async.AsyncConnectExec.execute(AsyncConnectExec.java:135)
at app//org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at app//org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
at app//org.apache.hc.client5.http.impl.async.AsyncProtocolExec.internalExecute(AsyncProtocolExec.java:198)
at app//org.apache.hc.client5.http.impl.async.AsyncProtocolExec.execute(AsyncProtocolExec.java:158)
at app//org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at app//org.apache.hc.client5.http.impl.async.AsyncExecChainElement$1.proceed(AsyncExecChainElement.java:62)
at app//org.apache.hc.client5.http.impl.async.AsyncRedirectExec.internalExecute(AsyncRedirectExec.java:112)
at app//org.apache.hc.client5.http.impl.async.AsyncRedirectExec.execute(AsyncRedirectExec.java:278)
at app//org.apache.hc.client5.http.impl.async.AsyncExecChainElement.execute(AsyncExecChainElement.java:54)
at app//org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.executeImmediate(InternalAbstractHttpAsyncClient.java:368)
at app//org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient$1.sendRequest(InternalAbstractHttpAsyncClient.java:224)
at app//org.apache.hc.core5.http.nio.support.BasicRequestProducer.sendRequest(BasicRequestProducer.java:93)
at app//org.apache.hc.client5.http.impl.async.InternalAbstractHttpAsyncClient.doExecute(InternalAbstractHttpAsyncClient.java:180)
at app//org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:97)
at app//org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:107)
at app//org.opensearch.client.RestClient.performRequest(RestClient.java:337)
... 4 more
Not all of the tests failed in this example workflow that is being referenced:
75 tests completed, 6 failed
Tests with failures:
- org.opensearch.integtest.SecurityNotificationIT.test delete slack notification config with user that has get Notification permission
- org.opensearch.integtest.SecurityNotificationIT.test getChannelList should return only channels with get channel permission
- org.opensearch.integtest.SecurityNotificationIT.test get slack notification config without get Notification permission
- org.opensearch.integtest.SecurityNotificationIT.test get slack notification config with user that has get Notification permission
- org.opensearch.integtest.config.DeleteNotificationConfigIT.test Delete single absent notification config should fail
- org.opensearch.integtest.config.EmailNotificationConfigCrudIT.test Create email notification config without email_group IDs
Given the stacktrace, it seems that the client used to make the calls to the cluster during the integration tests needed to obtain a lease from a connection pool and that lease either took too long or could not be obtained. We should check with the OpenSearch core team if the changes to these httpcomponents related dependencies has changed anything with the connection pool resourcing/timeouts leading to these occasional errors. Strangely enough, this error was only seen during the security test runs in the CI. The regular integration tests did not run into this problem. Either it's flaky or the conditions of the Docker security tests make it more likely to happen.
How can one reproduce the bug?
Steps to reproduce the behavior:
Run the security integration test CI in a PR or locally run the security tests against a Docker image running OpenSearch 3.0 with security
What is the expected behavior?
The client is able to get the lease in time and perform the operation during the tests.