Fix to actually throttle indexing under memory pressure#61768
Fix to actually throttle indexing under memory pressure#61768henningandersen merged 6 commits intoelastic:masterfrom
Conversation
|
💚 CLA has been signed |
|
cc @jpountz |
I signed it. |
|
Pinging @elastic/es-distributed (:Distributed/Engine) |
|
@ywelsch any insights on how can this be moved forward? |
|
Thank you for your contribution. Can you provide some context as to how you discovered this bug? Note that this PR needs tests, to ensure no regressions will be reintroduced in the future again, as happened in 3ad6d6e.
The commit needs to carry the same e-mail address as the one you've provided when signing the CLA. The commit uses your Please also stop pinging folks here, there are no SLAs on code reviews (see also our contributing guidelines). |
My apologies. I have updated the first comment on how did I identify this. Will work on test case and update. Thanks |
2fbe744 to
33b3c62
Compare
|
@nitin2goyal thanks for this contribution. It sounds like an important fix indeed. I would like to check in to hear if you are still intending to continue working on this? |
|
@henningandersen yes, apologies for the delay. should be able to finish this in couple of days |
9e3f0a1 to
93f6f16
Compare
| CountDownLatch countDownLatch = new CountDownLatch(3); | ||
| Mockito.doAnswer((Answer<InternalEngine.IndexingStrategy>) invocation -> { | ||
| try { | ||
| Thread.sleep(sleepInMillis); |
There was a problem hiding this comment.
I know it's not a good design but couldn't think of a better way to ensure sequential indexing in case of throttling
henningandersen
left a comment
There was a problem hiding this comment.
@nitin2goyal thanks for working on the test. A truly nice test would take some refactoring, which I would hope to avoid. I think we can make do with:
Create your own Engine.Index object, overloading startTime(). In your own startTime(), validate that the throttle lock is held when it should be and is not when it should not be. You will need to add a package-private method to InternalEngine to help that validation. Your test could do something like:
assertTrue(engine.throtteLockIsHeldByCurrentThread());
and InternalEngine.throtteLockIsHeldByCurrentThread() should call throttle.throtteLockIsHeldByCurrentThread(), which in turn calls locksReference.isHeldByCurrentThread()`.
I hope this can work out, let me know if you run into issues.
|
Thanks for the suggestion @henningandersen . Have incorporated the same. Let me know if this looks good. |
henningandersen
left a comment
There was a problem hiding this comment.
@nitin2goyal thanks for writing the test, looks great. Just two comments on comments 🙂 left.
server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
24e8528 to
05e91af
Compare
|
thanks for reviewing @henningandersen . Added comments |
|
@elasticmachine test this please |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your contribution @nitin2goyal
|
@elasticmachine update branch |
|
@elasticmachine test this please |
|
@elasticmachine run elasticsearch-ci/packaging-sample-windows |
In #22721, the decision to throttle indexing was inadvertently flipped, so that we until this commit throttle indexing during recovery but never throttle user initiated indexing requests. This commit fixes that to throttle user initiated indexing requests and never throttle recovery requests. Closes #61959
In #22721, the decision to throttle indexing was inadvertently flipped, so that we until this commit throttle indexing during recovery but never throttle user initiated indexing requests. This commit fixes that to throttle user initiated indexing requests and never throttle recovery requests. Closes #61959
|
When indexing data into Elasticsearch, heap memory is used in lucene and Elasticsearch for buffering data. To keep the memory usage under control, Elasticsearch will move data to disk according to the indexing buffer settings. If moving data cannot keep up with the ongoing indexing, Elasticsearch will start throttling indexing. This throttling behavior was broken such that it never throttled during normal indexing and this has been fixed by this PR. |
Fix to actually throttle indexing on getting activated. Looks like this got reverted in following change
3ad6d6e
[Edit on 09/08]
How did I identify this?
I was seeing increased load in production cluster for scheduled indexing and too many merge threads and so I started to look at code and identified this (bulk/write thread pool were choking on shard refresh lock but never got throttled).