Skip to content

Wait for pending ml tasks in docs tests#44123

Merged
davidkyle merged 4 commits intoelastic:masterfrom
davidkyle:docs-tests-wait-for-pending
Jul 15, 2019
Merged

Wait for pending ml tasks in docs tests#44123
davidkyle merged 4 commits intoelastic:masterfrom
davidkyle:docs-tests-wait-for-pending

Conversation

@davidkyle
Copy link
Copy Markdown
Member

@davidkyle davidkyle commented Jul 9, 2019

#43271 describes the problem where PUTing a ml job or data frame causes a notification document (saying something like Job X created) to be written to the ml-notifications index. This is done async and can occur after the test has finished and the teardown deleting indices has completed causing the index to be recreated and leaking into the next test.

This is a known issue XPackRestIT handles this by waiting for pending tasks to complete. This change adds the same step to DocsClientYamlTestSuiteIT

Unmutes the muted ml and data frame tests and closes #43271

XPackRestIT also has logic to stop datafeeds and close jobs post test that isn't necessary here as none of the tests start a job or data frame but may be required in the future

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-docs

@davidkyle davidkyle requested a review from nik9000 July 10, 2019 14:35
@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how bad it'd be to do this after every test. I don't feel great about relying on stuff in the test name. It just feels a bit too magical.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit complicated because Rollups do the wait in the base ESRestTestCase

Additionally some tests leave tasks running. get-follow-info.asciidoc line 38 is a good example as it creates various CCR tasks which will be waited on indefinitely unless the test teardown is run. Interestingly what appears to be happening is the @After method of this class is called before the test teardown

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly what appears to be happening is the @After method of this class is called before the test teardown

Weird!

I'm not a big fan of leaving things running in those tests either. Is there a way you could do something like the rollups here? It looks like it only cares about rollup style jobs. Does ml have something similar?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah rollups filter the waiting tasks with taskName.startsWith("xpack/rollup/job") and we can do something similar with ml jobs but the action causing the leakage in #43271 is indexing a document not an ml task. Waiting for all tasks catches unexpected issues and actually helps debugging tests that have failed due to leakage from a previous test, experience from using this in XPackRestIT has shown that it is very valuable.

If I remove the if (isMachineLearningTest() || isDataFrameTest()) { check then the tests that fail with pending tasks are ccr and rollup. I'll look into what's happening there and maybe there is a way of removing the _if ml ... _ conditional

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the Rollup and CCR tests, unfortunately it is not possible to wait for pending tasks after every test because those tests require special handling. I cannot see a way to simplify the logic and I think the current code is best as it is explicitly for the ml & data frame tests.

Also as more xpack feature snippet testing is added I would expect more usages of the pattern e.g. if (isSecurityTest()) { // security specific cleanup

Using the test name to determine if the test is an ml test is a valid use. XPackRestIT set the precedent some time ago and it has not caused problems there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.

I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....

@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.

I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.

@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....

@davidkyle davidkyle merged commit 4402cf3 into elastic:master Jul 15, 2019
@davidkyle davidkyle deleted the docs-tests-wait-for-pending branch July 15, 2019 10:58
davidkyle added a commit that referenced this pull request Jul 15, 2019
ML and Data Frame tests should wait for pending tasks
davidkyle added a commit that referenced this pull request Jul 15, 2019
ML and Data Frame tests should wait for pending tasks
@jpountz jpountz added the >test Issues or PRs that are addressing/adding tests label Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>test Issues or PRs that are addressing/adding tests v7.3.0 v7.4.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] Internal data frame indices can cause unrelated docs tests to fail

5 participants