CI Link
https://gradle-enterprise.elastic.co/s/hixbz3k3fauom
Repro line
gradlew ":x-pack:plugin:esql:qa:server:multi-clusters:v9.0.7#newToOld" -Dtests.class="org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT" -Dtests.method="test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_of_integer}" -Dtests.seed=3DF7A01082630BF4 -Dtests.bwc=true -Dtests.locale=ann-Latn-NG -Dtests.timezone=Indian/Christmas -Druntime.java=24
Does it reproduce?
No
Applicable branches
main
Failure history
No response
Failure excerpt
MultiClusterSpecIT > test {csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_of_integer} FAILED
java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-8 [ACTIVE]
at __randomizedtesting.SeedInfo.seed([3DF7A01082630BF4:B5A39FCA2C9F660C]:0)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98)
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506)
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
at java.base/java.lang.Thread.run(Thread.java:1447)
REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:qa:server:multi-clusters:v9.0.7#newToOld" -Dtests.class="org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT" -Dtests.method="test {csv-spec:stats.MaxOfByte}" -Dtests.seed=3DF7A01082630BF4 -Dtests.bwc=true -Dtests.locale=ann-Latn-NG -Dtests.timezone=Indian/Christmas -Druntime.java=24
MultiClusterSpecIT > test {csv-spec:stats.MaxOfByte} FAILED
org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:45821], URI [/languages_lookup_non_unique_key], status line [HTTP/1.1 400 Bad Request]
:)
�ú�errorú�root_causeøú�type`resource_already_exists_exception�reasonàindex [languages_lookup_non_unique_key/e2f-qB8aRhmgXqaEQFESnA] already existsü�index_uuidUe2f-qB8aRhmgXqaEQFESnA�index^languages_lookup_non_unique_keyûùB`resource_already_exists_exceptionCàindex [languages_lookup_non_unique_key/e2f-qB8aRhmgXqaEQFESnA] already existsüDUe2f-qB8aRhmgXqaEQFESnAE^languages_lookup_non_unique_keyû�status$ û
at __randomizedtesting.SeedInfo.seed([3DF7A01082630BF4:B5A39FCA2C9F660C]:0)
at app//org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)
at app//org.elasticsearch.client.RestClient.access$1900(RestClient.java:109)
at app//org.elasticsearch.client.RestClient$1.completed(RestClient.java:401)
at app//org.elasticsearch.client.RestClient$1.completed(RestClient.java:397)
at app//org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
This type of failures started happening around Sept 10th with this failure and it kept failing for some CI runs, I think most of them (at least the 3-4 I looked at) with intake/main/[version]/bwc-snapshots. Another example here.
I couldn't explain why it happens, but:
[2025-09-15T21:07:15,784][INFO ][o.e.x.e.c.MultiClusterSpecIT][test] [csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_of_integer] before test
...............
[2025-09-15T21:07:29,274][INFO ][o.e.c.m.MetadataCreateIndexService] [remote_cluster-1] [multi_column_joinable_lookup] creating index, cause [api], templates [], shards [1]/[1]
[2025-09-15T21:07:29,300][INFO ][o.e.c.m.MetadataCreateIndexService] [local_cluster-0] creating index [multi_column_joinable_lookup] in project [default], cause [api], templates [], shards [1]/[1]
[2025-09-15T21:07:29,592][INFO ][o.e.c.r.a.AllocationService] [local_cluster-0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[multi_column_joinable_lookup][0]]])." previous.health="YELLOW" reason="shards started [[multi_column_joinable_lookup][0]]"
[2025-09-15T21:07:29,623][INFO ][o.e.c.r.a.AllocationService] [remote_cluster-1] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[multi_column_joinable_lookup][0]]])." previous.health="YELLOW" reason="shards started [[multi_column_joinable_lookup][0]]"
[2025-09-15T21:08:29,603][INFO ][o.e.x.e.CsvTestsDataLoader][test] Data loading of [2918] bytes into [multi_column_joinable_lookup] OK
[2025-09-15T21:08:29,621][INFO ][o.e.c.m.MetadataCreateIndexService] [local_cluster-0] creating index [clientips] in project [default], cause [api], templates [], shards [1]/[1]
[2025-09-15T21:08:29,623][INFO ][o.e.c.m.MetadataCreateIndexService] [remote_cluster-1] [clientips] creating index, cause [api], templates [], shards [1]/[1]
[2025-09-15T21:08:29,779][INFO ][o.e.x.e.CsvTestsDataLoader][test] Data loading of [392] bytes into [clientips] OK
....
[2025-09-15T21:08:47,935][INFO ][o.e.x.e.EnrichPolicyRunner] [remote_cluster-0] Policy [heights_policy]: Policy execution complete
[2025-09-15T21:08:47,995][INFO ][o.e.x.e.EnrichPolicyRunner] [local_cluster-1] Policy [heights_policy]: Policy execution complete
[2025-09-15T21:09:48,102][INFO ][o.e.x.e.c.MultiClusterSpecIT][test] [csv-spec:k8s-timeseries-avg-over-time.Avg_over_time_of_integer] after test
There are two big gaps in logs (21:07:29,623 - 21:08:29,603 and 21:08:47,995 - 21:09:48,102), each almost 60 seconds long.
CI Link
https://gradle-enterprise.elastic.co/s/hixbz3k3fauom
Repro line
Does it reproduce?
No
Applicable branches
main
Failure history
No response
Failure excerpt
This type of failures started happening around Sept 10th with this failure and it kept failing for some CI runs, I think most of them (at least the 3-4 I looked at) with
intake/main/[version]/bwc-snapshots. Another example here.I couldn't explain why it happens, but:
MultiClusterSpecITThere are two big gaps in logs (21:07:29,623 - 21:08:29,603 and 21:08:47,995 - 21:09:48,102), each almost 60 seconds long.