Return 400 error for GetUserPrivileges call with API keys#89333
Return 400 error for GetUserPrivileges call with API keys#89333elasticsearchmachine merged 7 commits intoelastic:mainfrom
Conversation
The GetUserPrivileges API returns a 500 error when it is called with an API key that has assigned role descriptors. This is because the underlying LimitedRole class that represents the API key's effective privileges does not support building a simple view of the privileges. This PR changes the code to return 400 error instead of 500 along with a better error message that suggests the GetApiKey API as an alternative. Relates: elastic#89058
|
Pinging @elastic/es-security (Team:Security) |
|
Hi @ywangd, I've created a changelog YAML for you. |
| "Cannot retrieve privileges for API keys with assigned role descriptors. " | ||
| + "Please use the Get API key information API https://ela.st/es-api-get-api-key", |
There was a problem hiding this comment.
@jakelandis I am not entirely clear about the current recommendation of adding HTTP links into error messages (or more general HTTP responses). My previous understanding is that we don't want to do it because the links may break. But I have noticed that links are being used today, e.g. in the health API. Could you please adivse whether it is OK to use such a short url here? If not, what would be our choices? Thanks!
There was a problem hiding this comment.
I don't have a strong opinion on this but I'm generally in favor of links in the error/response messages. It's helpful and potentially helps us cut down on lengthy failure messages.
There was a problem hiding this comment.
Both deprecation's and new health API use shortened URL links embedded in the code. The primary reason is that they will be viewed inside Kibana. I am a fan of using URL links in error messages (especially shortened links) but I don't think we have widely adopted them quite yet. While the shortening service itself is solid, some of the tooling and management of the shortened URLs are lacking. For example: #78273 and creating/updating these is a hodgepodge of custom scripts and manual efforts. We have also had some conversations around version specific links, but settled on just pointing to current version and we can figure that part out later.
I don't think we have an official stance, but IMO if it is only additive information to the error message I don't see any harm. If the docs move, they generally leave behind a re-direct or page to manually redirect and having the link is better than not. I imagine existing usages and increased usage will force a better eco-system around managing these. cc @masseyke (since he worked with these recently)
There was a problem hiding this comment.
Thanks for the context, Jake. The link you shared is super helpful! I like the proposal it puts forward. It's good to know that we are thinking ways to make things more robust.
For this particular usage here, my take is that it should be fine. Once the broader framework is ready, we just need to remember migrating it to the new system. The https://ela.st part is pretty easy to search upon. So I am not worried at all.
| try { | ||
| authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener); | ||
| } catch (UnsupportedOperationException e) { |
There was a problem hiding this comment.
I decided to use try-catch instead of pro-actively checking the Role class because:
- Whether LimitedRole can throw exception on building privileges response is an implementation detail. The LimitedRole class is in a separate module (
core) and in theory does not even need to be exposed to the security module. In fact, security module currently has no direct reference to it. - Checking the Role class does not really work with Custom Authorization engine by checking because it is theoretically possible for a custom engine to implement the
retrieveUserPrivilegescall in a different way.
There was a problem hiding this comment.
My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException thrown by retrieveUserPrivileges for an API key is because our implementation of retrieveUserPrivileges doesn't support this. UnsupportedOperationException is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException somewhere else throughout retrieveUserPrivileges).
I'm wondering if we should specialize the exception thrown in LimitedRole to a sublass of UnsupportedOperationException. This way we still leak the implementation but in a more robust, structured way.
Alternatively, we could throw the 400 API-key specific exception inside RBACEngine which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.
WDYT?
There was a problem hiding this comment.
I agree it's awkward. The problem is that not every API key should fail. Only those with assigned role descriptors will fail. Underlying those API keys have their roles represented as LimitedRole. But I don't want to expose LimiteRole to the security module because it should just rely on the Role interface. I pondered on this and didn't find a clear better solution:
- Throwing specific subclass of
UnsupportedOperationException(or just a specific exception) is slightly better. But it still leaks implementation. I also didn't want to add another class just for this error message. My thought was that the exception + checking authentication being an API key (authentication.isApiKey()inside the catch block) is sufficient to rule out accidental unexpected behaviour in future. - By "throwing 400 API key specific exception inside RBACEngine", I assume you mean actively checking whether the
Roleis aLimitedRoleand throw error accordingly? This would requireRBACEnginebe aware ofLimitedRolewhich I rather dislike as stated above. - If you mean use try-catch for
UnsupportedOperationExceptionin RBACEngine, this means putting the error message inside RBACEngine. I initially dislike this because I felt this error message, especially with the HTTP url should be a handle level thing. But I now noticed that similar messages are prepared at other service classes (mostly implementations of the newHealthIndicatorService). So maybe this is not too bad.
So overall, I think option 3 is my current preference. More concretely, catching UnsupportedOperationException in RBACEngine#getUserPrivileges and retrow it as a IllegalArgumentException with proper error message. Using IllegalArgumentException avoids having to set an explicit 400 status code which is something I also didn't like at service layer. Please let me know if this works for you.
There was a problem hiding this comment.
Exactly, I meant option 3 👍
I think that building a message that feels "handler-layer" inside RBACEngine#getUserPrivileges is a worthwhile trade-off here: we get to handle the issue close to the root cause and don't artificially impose limitations/interpretation on other implementations of AuthorizationEngine.
n1v0lg
left a comment
There was a problem hiding this comment.
A question around where and how we throw, before I review the rest of the PR.
Question inline, in response to your comment here.
| try { | ||
| authorizationService.retrieveUserPrivileges(subject, securityContext.getAuthorizationInfoFromContext(), listener); | ||
| } catch (UnsupportedOperationException e) { |
There was a problem hiding this comment.
My only reservation here is that we are leaking this implementation detail all the way up to the transport layer in an unstructured way: we are assuming that any UnsupportedOperationException thrown by retrieveUserPrivileges for an API key is because our implementation of retrieveUserPrivileges doesn't support this. UnsupportedOperationException is generic though so this could lead to unexpected behavior/incorrect error messages for custom implementations (or if we throw UnsupportedOperationException somewhere else throughout retrieveUserPrivileges).
I'm wondering if we should specialize the exception thrown in LimitedRole to a sublass of UnsupportedOperationException. This way we still leak the implementation but in a more robust, structured way.
Alternatively, we could throw the 400 API-key specific exception inside RBACEngine which would keep it closer to the root cause and not force our interpretation on exceptions thrown by custom implementations.
WDYT?
| when(role.application()).thenReturn(ApplicationPermission.NONE); | ||
| when(role.runAs()).thenReturn(RunAsPermission.NONE); | ||
|
|
||
| final UnsupportedOperationException unsupportedOperationException = mock(UnsupportedOperationException.class); |
There was a problem hiding this comment.
Nit: would just instantiate the exception here instead of mocking since we're not relying on any mocked behavior.
| try { | ||
| getUserPrivilegesResponse = buildUserPrivilegesResponseObject(role); | ||
| } catch (UnsupportedOperationException e) { | ||
| throw new IllegalArgumentException( |
There was a problem hiding this comment.
Nit: to stay consistent with the other exception above, should we pass this to onFailure instead?
There was a problem hiding this comment.
I was trying to be consistent with the existing behaviour, i.e. buildUserPrivilegesResponseObject just throws exceptions. But consistent with the caller method is also good. I'll make the change
|
Test failure is being fixed by https://github.com/elastic/elasticsearch/pull/89360/files#r946240748 |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
* upstream/main: (265 commits) Disable openid connect tests due to missing fixture (elastic#89478) Add periodic job for single processor node testing Updates to changelog processing after docs redesign (elastic#89463) Better support for multi cluster for run task (elastic#89442) Mute failing tests (elastic#89465) [ML] Performance improvements related to ECS Grok pattern usage (elastic#89424) Add source fallback support for date and date_nanos mapped types (elastic#89440) Reuse Info in lifecycle step (elastic#89419) feature: support metrics for multi value fields (elastic#88818) Upgrade OpenTelemetry API and remove workaround (elastic#89438) Remove LegacyClusterTaskResultActionListener (elastic#89459) Add YAML spec docs about matching errors (elastic#89370) Remove redundant cluster upgrade tests for auth tokens (elastic#89417) Return 400 error for GetUserPrivileges call with API keys (elastic#89333) User Profile - Detailed errors in hasPrivileges response (elastic#89224) Rollover min_* conditions docs and highlight (elastic#89434) REST tests for percentiles_bucket agg (elastic#88029) REST tests for cumulative pipeline aggs (elastic#88966) Clean-up file watcher keys. (elastic#89429) fix a typo in Security.java (elastic#89248) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
The GetUserPrivileges API returns a 500 error when it is called with an
API key that has assigned role descriptors. This is because the
underlying LimitedRole class that represents the API key's effective
privileges does not support building a simple view of the privileges.
This PR changes the code to return 400 error instead of 500 along with a
better error message that suggests the GetApiKey API as an alternative.
Relates: #89058