Skip to content

Fix to actually throttle indexing under memory pressure#61768

Merged
henningandersen merged 6 commits intoelastic:masterfrom
nitin2goyal:throttle-fix
Oct 2, 2020
Merged

Fix to actually throttle indexing under memory pressure#61768
henningandersen merged 6 commits intoelastic:masterfrom
nitin2goyal:throttle-fix

Conversation

@nitin2goyal
Copy link
Copy Markdown
Contributor

@nitin2goyal nitin2goyal commented Sep 1, 2020

Fix to actually throttle indexing on getting activated. Looks like this got reverted in following change

3ad6d6e

[Edit on 09/08]
How did I identify this?

I was seeing increased load in production cluster for scheduled indexing and too many merge threads and so I started to look at code and identified this (bulk/write thread pool were choking on shard refresh lock but never got throttled).

@cla-checker-service
Copy link
Copy Markdown

cla-checker-service bot commented Sep 1, 2020

💚 CLA has been signed

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

cc @jpountz

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

❌ Author of the following commits did not sign a Contributor Agreement:
2fbe744

Please, read and sign the above mentioned agreement if you want to contribute to this project

I signed it.

@ywelsch ywelsch added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >bug labels Sep 1, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team. label Sep 1, 2020
@nitin2goyal
Copy link
Copy Markdown
Contributor Author

@ywelsch any insights on how can this be moved forward?

@ywelsch
Copy link
Copy Markdown
Contributor

ywelsch commented Sep 3, 2020

Thank you for your contribution. Can you provide some context as to how you discovered this bug? Note that this PR needs tests, to ensure no regressions will be reintroduced in the future again, as happened in 3ad6d6e.

Author of the following commits did not sign a Contributor Agreement:
I signed it.

The commit needs to carry the same e-mail address as the one you've provided when signing the CLA. The commit uses your sprinklr.com e-mail address, while the CLA uses the gmail.com based one.

Please also stop pinging folks here, there are no SLAs on code reviews (see also our contributing guidelines).

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

Thank you for your contribution. Can you provide some context as to how you discovered this bug? Note that this PR needs tests, to ensure no regressions will be reintroduced in the future again, as happened in 3ad6d6e.

Author of the following commits did not sign a Contributor Agreement:
I signed it.

The commit needs to carry the same e-mail address as the one you've provided when signing the CLA. The commit uses your sprinklr.com e-mail address, while the CLA uses the gmail.com based one.

Please also stop pinging folks here, there are no SLAs on code reviews (see also our contributing guidelines).

My apologies. I have updated the first comment on how did I identify this. Will work on test case and update. Thanks

@henningandersen
Copy link
Copy Markdown
Contributor

@nitin2goyal thanks for this contribution. It sounds like an important fix indeed. I would like to check in to hear if you are still intending to continue working on this?

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

@henningandersen yes, apologies for the delay. should be able to finish this in couple of days

Add test
@nitin2goyal nitin2goyal marked this pull request as ready for review September 23, 2020 11:45
CountDownLatch countDownLatch = new CountDownLatch(3);
Mockito.doAnswer((Answer<InternalEngine.IndexingStrategy>) invocation -> {
try {
Thread.sleep(sleepInMillis);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not a good design but couldn't think of a better way to ensure sequential indexing in case of throttling

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitin2goyal thanks for working on the test. A truly nice test would take some refactoring, which I would hope to avoid. I think we can make do with:

Create your own Engine.Index object, overloading startTime(). In your own startTime(), validate that the throttle lock is held when it should be and is not when it should not be. You will need to add a package-private method to InternalEngine to help that validation. Your test could do something like:

assertTrue(engine.throtteLockIsHeldByCurrentThread());

and InternalEngine.throtteLockIsHeldByCurrentThread() should call throttle.throtteLockIsHeldByCurrentThread(), which in turn calls locksReference.isHeldByCurrentThread()`.

I hope this can work out, let me know if you run into issues.

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestion @henningandersen . Have incorporated the same. Let me know if this looks good.

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitin2goyal thanks for writing the test, looks great. Just two comments on comments 🙂 left.

@nitin2goyal
Copy link
Copy Markdown
Contributor Author

thanks for reviewing @henningandersen . Added comments

@henningandersen henningandersen self-requested a review October 2, 2020 07:48
@henningandersen
Copy link
Copy Markdown
Contributor

@elasticmachine test this please

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution @nitin2goyal

@henningandersen
Copy link
Copy Markdown
Contributor

@elasticmachine update branch

@henningandersen
Copy link
Copy Markdown
Contributor

@elasticmachine test this please

@henningandersen
Copy link
Copy Markdown
Contributor

@elasticmachine run elasticsearch-ci/packaging-sample-windows

@henningandersen henningandersen merged commit f5affcd into elastic:master Oct 2, 2020
henningandersen pushed a commit that referenced this pull request Oct 2, 2020
In #22721, the decision to throttle indexing was inadvertently flipped,
so that we until this commit throttle indexing during recovery but
never throttle user initiated indexing requests. This commit
fixes that to throttle user initiated indexing requests and never
throttle recovery requests.

Closes #61959
henningandersen pushed a commit that referenced this pull request Oct 2, 2020
In #22721, the decision to throttle indexing was inadvertently flipped,
so that we until this commit throttle indexing during recovery but
never throttle user initiated indexing requests. This commit
fixes that to throttle user initiated indexing requests and never
throttle recovery requests.

Closes #61959
@henningandersen henningandersen changed the title Fix to actually throttle indexing on getting activated Fix to actually throttle indexing under memory pressure Oct 21, 2020
@henningandersen
Copy link
Copy Markdown
Contributor

When indexing data into Elasticsearch, heap memory is used in lucene and Elasticsearch for buffering data. To keep the memory usage under control, Elasticsearch will move data to disk according to the indexing buffer settings. If moving data cannot keep up with the ongoing indexing, Elasticsearch will start throttling indexing. This throttling behavior was broken such that it never throttled during normal indexing and this has been fixed by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. release highlight Team:Distributed Meta label for distributed team. v7.9.3 v7.10.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants