Skip to content

[ML] fix possible race condition in frequent item sets aggregation#95860

Merged
hendrikmuhs merged 2 commits intoelastic:mainfrom
hendrikmuhs:frequent-items-#95681
May 5, 2023
Merged

[ML] fix possible race condition in frequent item sets aggregation#95860
hendrikmuhs merged 2 commits intoelastic:mainfrom
hendrikmuhs:frequent-items-#95681

Conversation

@hendrikmuhs
Copy link
Copy Markdown

Refactorings done in #94023. improved memory management but also made it possible to run into a race condition in DelegatingCircuitBreaker. This change makes DelegatingCircuitBreaker thread-safe.

This change is a quick-fix with low risk. Eventually DelegatingCircuitBreaker should be removed. Work on that is underway: #89437. Under that circumstances the change is reasonable to me. The added locking overhead shouldn't be a problem, re-allocations happen rarely and there is just the disconnect case where 2 threads potentially access the structure at the same time.

#94023 hasn't been part of a release, therefore I mark this as non-issue, although technically this is a bug.

fixes #95681

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label May 5, 2023
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Copy Markdown

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

consumerThread.start();
producerThread.start();
consumerThread.join();
producerThread.join();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough if the threads just finish?
Or should there be some assertion in the end checking the value of estimate bytes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of the test is not hitting this line:

I was able to re-produce the CI issue with -Dtests.iters=1000 - but I don't want to check-in a loop.

Still, I should document ^this, will do.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.8

hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request May 5, 2023
…lastic#95860)

Refactorings done in elastic#94023. improved memory management but also made it possible to run into a race condition in DelegatingCircuitBreaker. This change makes DelegatingCircuitBreaker thread-safe.

This change is a quick-fix with low risk. Eventually DelegatingCircuitBreaker should be removed. Work on that is underway: elastic#89437. Under that circumstances the change is reasonable to me. The added locking overhead shouldn't be a problem, re-allocations happen rarely and there is just the disconnect case where 2 threads potentially access the structure at the same time.

fixes elastic#95681
elasticsearchmachine pushed a commit that referenced this pull request May 5, 2023
…95860) (#95879)

Refactorings done in #94023. improved memory management but also made it possible to run into a race condition in DelegatingCircuitBreaker. This change makes DelegatingCircuitBreaker thread-safe.

This change is a quick-fix with low risk. Eventually DelegatingCircuitBreaker should be removed. Work on that is underway: #89437. Under that circumstances the change is reasonable to me. The added locking overhead shouldn't be a problem, re-allocations happen rarely and there is just the disconnect case where 2 threads potentially access the structure at the same time.

fixes #95681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.8.0 v8.9.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] MlWithSecurityIT test {yaml=ml/frequent_item_sets_agg/Test frequent item sets filter} failing

3 participants