Skip to content

Improve reusability of the invalidation counting cache wrapper#66310

Merged
ywangd merged 14 commits intoelastic:masterfrom
ywangd:cache-clearing-refactor-privilege-store
Apr 18, 2021
Merged

Improve reusability of the invalidation counting cache wrapper#66310
ywangd merged 14 commits intoelastic:masterfrom
ywangd:cache-clearing-refactor-privilege-store

Conversation

@ywangd
Copy link
Copy Markdown
Member

@ywangd ywangd commented Dec 15, 2020

This PR is a refactoring for InvalidationCountingCacheWrapper to improve its reusability beyond ApiKeyService. It is a follow-up for #59376.

The change includes extracting the logic of "minimizing chance of caching stale results" into a new class, which is now generic and not tied to any cache related operations. This in turn allows it to be reused by different caches, which often have subtle, important and anonying differences that make it impossible (or at least cubersome and painful) to have a generic solution. So this change of reducing the size of reusable code feels like the right move.

As an example of applying this new pattern, the caches for NativePrivilegeStore are now migrated and managed by the centrialied CacheInvalidatorRegistry. The clear privilege cache rest and transport actions are also updated to use the new implementation. I intentionally left the Request/Response classes untouched and plan to have a separate PR to address them because they would involve user visible deprecation changes. I also plan to migrate other security related caches to use this pattern if this PR is approved.

We had quite lively discussions around this topic with #59376, but the time was constraint due to release. Therefore this PR targets v7.12.0 and should give us enough time for discussions.

@ywangd ywangd added >refactoring :Security/Security Security issues without another label v8.0.0 v7.12.0 labels Dec 15, 2020
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Dec 15, 2020
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (Team:Security)

*
* @return true if the runnable is executed, other false.
*/
public boolean runIfCountMatches(Runnable runnable, long count) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to read if the count were the first parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to compareAndRun? I think this name helps with the intuition from "compare and swap".
"counter" is used too much so maybe drop it if there's a parameter with the name counter already.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runnable -> CheckedRunnable, to avoid worrying about the checked/unchecked exceptions at this low level.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analogy of AtomicInteger makes sense. The name compareAndRun also sounds good and to the point. The count parameter naturally should become the first parameter after the method name change to follow the convention of compareAndSet. Thanks!

I decided to keep Runnable instead of changing it to CheckedRunnable. In fact, I started with CheckedRunnable, but I think it has a few disadvantages:

  1. If we don't handle the exception of CheckedRunnable, the method would have a signature like:
    public <E extends Exception> boolean compareAndRun(long count, CheckedRunnable<E> runnable) throws E
    It forces its caller to handle exceptions even when the Runnable may not throw any exception (as in the NativePrivilegesStore case)
  2. If we handle the exception inside the method and only bubble up RuntimeException, it will make the method easier to call because no exception handling is forced onto the caller. However, the method itself has no knowledge about he Runnable so it cannot perform meaningful handling other than maybe wrap it with a RuntimeException. It works, but makes it hard if the caller cares about the real exception and wants to handle it.

My understanding is that CheckedRunnable is better when possible exception types are limited and the method knows well how to handle (or ignore) them. Therefore, I prefer to just using Runnable. It works similar to above item 2, but giving caller full control of how exception should or should not be handled.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good to me. It's not going to be a big difference in practice anyway.

* An utility class that keeps a counter when executing runnables.
* It is designed to help minimizing the possibility of caching stable results in a {@link Cache}.
*/
public class CountingRunner {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we both agree that encapsulating a cache inside this class is the wrong abstraction (which is InvalidationCountingCacheWrapper), although I struggle to put it in words in a comment.

I still tend to have more ambitious hopes for this cache invalidation pattern, but I very much fear we can sunk in it and not achieve much.

Overall, I think abstraction here is good enough!
But its rich history is showing in the description and in the names. We should polish these.

I would think about this class as a special type of atomic integer, which has a fancy method "run if integer constant", such that the run block of code cannot overlap with changes (increments) to the integer.
It's hard to explain it's usage in the caching context, so I would steer away from it completely.
For example, we rely that changes to the underlying data (eg api key docs) are performed before calling increment and that we call getCount before we retrieve the underlying data, so that we ensure the sequence modify data -> increment -> getCount -> read data -> cache data .

My explicit suggestion is to rename to CounterCompare, or anythign else that inspires that the runnable and the counter cannot overlap.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion, if you think this is unlikely to be used outside of caching contexts, it might be useful to make it a static inner class of Cache.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thoughts and taking your advice onboard, I came up with the name LockingAtomicCounter. It is an unexciting name, but I think it conveys the major idea here: a counter with RWLock to ensure its value stays the same until the reading side finishes its task.

I think your analogy of AtomicInteger is a great one. In concept, compareAndSet and compareAndRun are very similiar. They are both executing some operation if the counter value matches the expectation. The only difference is that the operation of the former is a single setValue, but the later can have an arbitrary Runnable. This is also the reason that I went with the name of LockingAtomicCounter, which I hope makes its behaviour easier to understand.

I updated the comments accordingly.

As for the location of this class, I'd prefer to have it stay inside xpack security for the time being because:

  1. The usages that I can think of for now are security caches
  2. It in theory operates at a higher level than a single cache, e.g. ApiKeyDocCache is a wrapper for two vanila caches.
  3. The way it ends right now really does not say anything about caching anymore. So I have hope that it could still be useful elsewhere.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with the name LockingAtomicCounter

SGTM

Copy link
Copy Markdown
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this Yang!
I've left a couple of comments about naming and minor structure. I'm trying to push this abstraction in the direction of a "non-overlapping counter and runnable" and away from caches.
Looks good overall!

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Feb 5, 2021

@albertzaharovits Thanks for the review. As always, the feedbacks are very vaulable and help shaping this refactoring. I have updated the PR accordingly and it is ready for another look.

Copy link
Copy Markdown
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm pretty confident this is a pure refactoring.

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Apr 14, 2021

@tvernum As discussed during monthly, I am going to merge this PR since it has Albert's approval and is a pure refactoring.

@elasticmachine update branch

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Apr 14, 2021

@elasticmachine test this please

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Apr 14, 2021

@elasticmachine run elasticsearch-ci/2

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Apr 14, 2021

@elasticmachine update branch

@ywangd
Copy link
Copy Markdown
Member Author

ywangd commented Apr 14, 2021

@elasticmachine update branch

@ywangd ywangd merged commit d4b649c into elastic:master Apr 18, 2021
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Apr 19, 2021
This PR is a follow-up refactoring for elastic#59376. It replaces
InvalidationCountingCacheWrapper with a simpler and more resuable
LockingAtomicCounter.

The change includes extracting the logic of "minimizing chance of caching stale
results" into a new class, which is now generic and not tied to any cache
related operations. This in turn allows it to be reused by different caches,
which often have subtle, important and anonying differences that make it
impossible (or at least cubersome and painful) to have a generic solution. So
this change of reducing the size of reusable code is the right move.

As an example of applying this new pattern, the caches for NativePrivilegeStore
are now migrated and managed by the centrialied CacheInvalidatorRegistry. The
clear privilege cache rest and transport actions are also updated to use the
new implementation. The Request/Response classes are intentionally untouched
and plan to have a separate PR to address them because they would involve user
visible deprecation changes.

Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
ywangd added a commit that referenced this pull request Apr 19, 2021
This PR is a follow-up refactoring for #59376. It replaces
InvalidationCountingCacheWrapper with a simpler and more resuable
LockingAtomicCounter.

The change includes extracting the logic of "minimizing chance of caching stale
results" into a new class, which is now generic and not tied to any cache
related operations. This in turn allows it to be reused by different caches,
which often have subtle, important and anonying differences that make it
impossible (or at least cubersome and painful) to have a generic solution. So
this change of reducing the size of reusable code is the right move.

As an example of applying this new pattern, the caches for NativePrivilegeStore
are now migrated and managed by the centrialied CacheInvalidatorRegistry. The
clear privilege cache rest and transport actions are also updated to use the
new implementation. The Request/Response classes are intentionally untouched
and plan to have a separate PR to address them because they would involve user
visible deprecation changes.

Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Security/Security Security issues without another label Team:Security Meta label for security team v7.13.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants