Tests that leak tasks

Followup to #85683

My original approach on that PR was this change:

```diff
diff --git a/test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java b/test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
index e9ee022f79c..d5bde29e8d1 100644
--- a/test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
+++ b/test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java
@@ -401,6 +401,7 @@ public abstract class ESRestTestCase extends ESTestCase {
     public final void cleanUpCluster() throws Exception {
         if (preserveClusterUponCompletion() == false) {
             ensureNoInitializingShards();
+            waitForPendingTasks();
             wipeCluster();
             waitForClusterStateUpdatesToFinish();
             checkForUnexpectedlyRecreatedObjects();
@@ -439,6 +440,16 @@ public abstract class ESRestTestCase extends ESTestCase {
         return adminClient;
     }

+    /**
+     * Wait for outstanding tasks to complete. The default admin client is used to check the outstanding tasks and this is done using
+     * {@link ESTestCase#assertBusy(CheckedRunnable)} to give a chance to any outstanding tasks to complete.
+     *
+     * @throws Exception if an exception is thrown while checking the outstanding tasks
+     */
+    public static void waitForPendingTasks() throws Exception {
+        waitForPendingTasks(adminClient());
+    }
+
     /**
      * Wait for outstanding tasks to complete. The specified admin client is used to check the outstanding tasks and this is done using
      * {@link ESTestCase#assertBusy(CheckedRunnable)} to give a chance to any outstanding tasks to complete.
```

However, that resulted in [tons of test failures](https://github.com/elastic/elasticsearch/pull/85683#issuecomment-1087881450), which means that there are quite a few tests out there that are leaking tasks (or something).

Here's a sample of the worst offenders:

```
joegallo@galactic:~/Desktop $ cat failures\ from\ when\ task\ waiting\ was\ enabled.txt | grep -a3 "active tasks" | grep rollup_sensor | wc -l
     142
joegallo@galactic:~/Desktop $ cat failures\ from\ when\ task\ waiting\ was\ enabled.txt | grep -a3 "active tasks" | grep -a3 -m1 rollup_sensor
org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT > test {yaml=reference/rollup/apis/start-job/line_50} FAILED
    java.lang.AssertionError: 1 active tasks found:
    xpack/rollup/job[c]            jXhRlwO8SmSoysY8MJjyPw:1991 cluster:1                   persistent 1649093716219 17:35:16 32.8s       127.0.0.1 node-0 rollup_sensor
     expected:<0> but was:<1>
        at __randomizedtesting.SeedInfo.seed([ED1333AE1986C19E:65470C74B77AAC66]:0)
--
```

```
joegallo@galactic:~/Desktop $ cat failures\ from\ when\ task\ waiting\ was\ enabled.txt | grep -a3 "active tasks" | grep geoip-downloader | wc -l
      18
joegallo@galactic:~/Desktop $ cat failures\ from\ when\ task\ waiting\ was\ enabled.txt | grep -a3 "active tasks" | grep -a3 -m1 geoip-downloader

org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT > test {yaml=ingest_geoip/20_geoip_processor/Test geoip processor with different database file - GeoLite2-Country} FAILED
    java.lang.AssertionError: 1 active tasks found:
    geoip-downloader[c]            GRiCtkwGShORy7Z0-RsGRQ:16  cluster:1                  persistent 1649093712826 17:35:12 42.5s       127.0.0.1 yamlRestTest-0 id=geoip-downloader
     expected:<0> but was:<1>
        at __randomizedtesting.SeedInfo.seed([ED1333AE1986C19E:65470C74B77AAC66]:0)
--
```

While we could ignore these two tasks in the `waitForPendingTasks` code, I think it would be more interesting to look at preventing these kinds of leaks from happening -- it seems to me that tests are written to expect exclusive access to the cluster while they're running, and having tasks running in the background violates that. For the most part it seems this doesn't cause a problem, in that we run many thousands of tests a day and they mostly seem fine, but this does indeed cause problems sometimes (e.g. all the test failures that were closed by https://github.com/elastic/elasticsearch/pull/85683) and the kinds of problems it causes are hard to replicate and solve, so they're a pain.

/cc @nik9000 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests that leak tasks #85700

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tests that leak tasks #85700

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions