Cache API key doc to reduce traffic to the security index#59376
Cache API key doc to reduce traffic to the security index#59376ywangd merged 45 commits intoelastic:masterfrom
Conversation
|
Please note this PR is built upon the yet-to-be-approved #58156. Therefeore many changes are actually not relevant to this PR itself. Here are a list of actual relevant changes:
|
|
@elasticmachine run elasticsearch-ci/1 |
|
I don't think it's worth pushing this in before FF. |
Thanks, I agree. The changes are still rough. I also thought about merging doc and auth caches but didn't do it mainly because of complexity and potential implications of complex changes. It requires more time to get right. |
8fa3869 to
32a7a31
Compare
32a7a31 to
7501b05
Compare
|
I spent some time look into how we could have a single cache for both API key doc and Auth hash. I came up with a version that works. However, it comes with high complexity because the two parts of data have different lifecycles. With the single cache approach, a cache item contains both content of an API key document and the Authentication hash. These two parts can vary independantly. That is, when the document needs to be refreshed for caching, the authentication hash can still be valid because it is based on user-provided credentials and only associated with the document ID, not its content. Similarly, when the authentication hash needs to be updated, the document can still be valid because unlike authentication hash it does not change based on user-provided credentials. So what we need is a cache where the cache item can be partially invalidated. This requirement is basically asking for two separate caches. It is possible to make it work with a single cache as mentioned earlier, but it leads to high complexity due to the diverging requirement plus how it could work with a distributed system. Therefore, even though I already have an implementation for it, I still prefer the approrach of separate caches. Honestly, I had fun caming up with my current implementaiton and trying to catch all edge cases. But I am concerned that it is difficult to explain how it works and maintain it a few years from now. After discussion with @tvernum, we decided to go with separate caches. There is no overlap between contents of the two caches. Hence we do not have concern in terms of duplication (memory consumption). |
This reverts commit df2a1d8.
...n/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityCacheRegistry.java
Outdated
Show resolved
Hide resolved
|
@tvernum After a few back and forth, here is the version that I am settled with. It touches the idea of consolidation for cache implementaitons, though with a much less ambitious approach. In addition to adding a cache for API key document, it also provides two enhancements:
The PR still needs a few more things:
But I'd appreciate your review on the approach before I polish it. |
|
@tvernum I updated the PR based on your suggestion with some name tweaks. The changes are basically to simplify the SecurityCache, which in fact is now replaced by The Could you please reivew the current approach? I'll proceeds with adding tests etc once we can agree on the overall approach. Thanks! |
| .setMaximumWeight(maximumWeight) | ||
| .build(); | ||
| this.roleDescriptorsBytesCache = CacheBuilder.<String, BytesReference>builder() | ||
| .setExpireAfterAccess(ttl) |
There was a problem hiding this comment.
I wonder about this.
It defaults to 24h, and we don 't have much in place to clean it up.
Should it really cache for that long? It's expiry after access, so 1h feels like it would be sufficient.
From my brief reading, once the doc cache expires (5m after last write) there will be no incoming references to these entries. They may get revived, but I don't think holding onto that memory for 24h is justified.
There was a problem hiding this comment.
I changed it to 1h. It was set to 24h because
- 24h is an existing setting, so felt easy to reuse. I understand technically it has different meaning. But the subtlety here feels irrelevant to end-users.
- Unlike API key document cache, the role descriptor cache, in theory, never "expires" in that they will never be outdated. Sure the entries may no longer be needed, but the association between a hash and its original content can never be invalid. Also many API keys may share the same role descriptors, for example, the
superuser(limiting) role. These made me feel it is plausible to cache the descriptors for longer. Because even when a Key is expired or invalidated or haven't checked in for a while, it is likey a new key will be replacing it and the new key will have the exact same role descriptors and thus reuse the cache entries.
With above being said, I don't think the benefit of caching for longer is anyway significant, since the main performance hit is the document retrieval itself. So I am OK to have it at a shorter duration.
...ecurity/src/main/java/org/elasticsearch/xpack/security/support/CacheInvalidatorRegistry.java
Outdated
Show resolved
Hide resolved
...ecurity/src/main/java/org/elasticsearch/xpack/security/support/CacheInvalidatorRegistry.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Tim Vernum <tim@adjective.org>
|
@albertzaharovits I updated this PR per your review and discussion. It is ready for another look. Thanks! |
tvernum
left a comment
There was a problem hiding this comment.
LGTM, but I have a few suggestions (some of which are strong suggestions).
client/rest-high-level/src/main/java/org/elasticsearch/client/SecurityRequestConverters.java
Outdated
Show resolved
Hide resolved
...rest-high-level/src/main/java/org/elasticsearch/client/security/ClearApiKeyCacheRequest.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/ApiKeyService.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/security/action/TransportClearSecurityCacheAction.java
Show resolved
Hide resolved
| super.writeTo(out); | ||
| out.writeString(cacheName); | ||
| out.writeOptionalStringArray(keys); | ||
| } |
There was a problem hiding this comment.
Ideally we would have serialization tests for every class that has a writeTo method.
...ty/src/test/java/org/elasticsearch/xpack/security/support/CacheInvalidatorRegistryTests.java
Outdated
Show resolved
Hide resolved
| final CacheInvalidator cacheInvalidator = cacheInvalidators.get(cacheName); | ||
| if (cacheInvalidator != null) { | ||
| cacheInvalidator.invalidate(keys); | ||
| } |
There was a problem hiding this comment.
The lack of an else seems like dangerous leniency. Why do we silently pretend to invalidate a cache that doesn't exist?
There was a problem hiding this comment.
This is a good point and there is no reason for this leniency. Add an else branch to throw exception and also relevant tests.
...est/java/org/elasticsearch/xpack/security/support/InvalidationCountingCacheWrapperTests.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Tim Vernum <tim@adjective.org>
albertzaharovits
left a comment
There was a problem hiding this comment.
LGTM Good efforts on this one!
) Getting the API key document form the security index is the most time consuing part of the API Key authentication flow (>60% if index is local and >90% if index is remote). This traffic is now avoided by caching added with this PR. Additionally, we add a cache invalidator registry so that clearing of different caches will be managed in a single place (requires follow-up PRs).
…63319) Getting the API key document form the security index is the most time consuing part of the API Key authentication flow (>60% if index is local and >90% if index is remote). This traffic is now avoided by caching added with this PR. Additionally, we add a cache invalidator registry so that clearing of different caches will be managed in a single place (requires follow-up PRs).
This PR is a follow-up refactoring for #59376. It replaces InvalidationCountingCacheWrapper with a simpler and more resuable LockingAtomicCounter. The change includes extracting the logic of "minimizing chance of caching stale results" into a new class, which is now generic and not tied to any cache related operations. This in turn allows it to be reused by different caches, which often have subtle, important and anonying differences that make it impossible (or at least cubersome and painful) to have a generic solution. So this change of reducing the size of reusable code is the right move. As an example of applying this new pattern, the caches for NativePrivilegeStore are now migrated and managed by the centrialied CacheInvalidatorRegistry. The clear privilege cache rest and transport actions are also updated to use the new implementation. The Request/Response classes are intentionally untouched and plan to have a separate PR to address them because they would involve user visible deprecation changes. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
This PR is a follow-up refactoring for elastic#59376. It replaces InvalidationCountingCacheWrapper with a simpler and more resuable LockingAtomicCounter. The change includes extracting the logic of "minimizing chance of caching stale results" into a new class, which is now generic and not tied to any cache related operations. This in turn allows it to be reused by different caches, which often have subtle, important and anonying differences that make it impossible (or at least cubersome and painful) to have a generic solution. So this change of reducing the size of reusable code is the right move. As an example of applying this new pattern, the caches for NativePrivilegeStore are now migrated and managed by the centrialied CacheInvalidatorRegistry. The clear privilege cache rest and transport actions are also updated to use the new implementation. The Request/Response classes are intentionally untouched and plan to have a separate PR to address them because they would involve user visible deprecation changes. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
This PR is a follow-up refactoring for #59376. It replaces InvalidationCountingCacheWrapper with a simpler and more resuable LockingAtomicCounter. The change includes extracting the logic of "minimizing chance of caching stale results" into a new class, which is now generic and not tied to any cache related operations. This in turn allows it to be reused by different caches, which often have subtle, important and anonying differences that make it impossible (or at least cubersome and painful) to have a generic solution. So this change of reducing the size of reusable code is the right move. As an example of applying this new pattern, the caches for NativePrivilegeStore are now migrated and managed by the centrialied CacheInvalidatorRegistry. The clear privilege cache rest and transport actions are also updated to use the new implementation. The Request/Response classes are intentionally untouched and plan to have a separate PR to address them because they would involve user visible deprecation changes. Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Getting the API key document form the security index is the most time consuing part of the authentication flow (>60% if index is local and >90% if index is remote). This traffic can be avoided with cache. Initially we also plan to avoid deserialising source of GetApiKeyDocument response if "primary term" and "sequence number" are unchanged. However, given the document fetching itself is dominant in execution time, it does not seem to be worthwhile to optimise for deserialisation. Hence, for uncached document, the cache will fetch the document and also simply deserialise it without further checking.
When a large number of API keys are in use, we expect most of them are created with exactly the same role-descriptors and limited-by-role-descriptors. It is wasteful to cache these identical role descriptors repeatitively for different API keys. Hence a two layer caches are put in place:
The lifecyces of the above two caches are mostly tied together and they need work together in a way to ensure least possibility of stale results. Hence a new
SecurityCachesclass is added and it implements all necessary handlings as discussed in #55836.@tvernum This is a quick attempt to implement a cache for API key document. It intentionally omits some parts, e.g. tests, configuration, to get a quick turnaround for proving the idea. It is easier to discuss when we have something concrete. We could try formalise it for v7.9 if feasible. But I am also OK to delay it if more careful evaluation is needed. Thanks!
Resolves: #53940