Skip to content

Add "maintenance" permission to the fleet-server service account#82125

Merged
aleksmaus merged 2 commits intoelastic:masterfrom
aleksmaus:feature/fleet-server-token-maintenance-priv
Dec 30, 2021
Merged

Add "maintenance" permission to the fleet-server service account#82125
aleksmaus merged 2 commits intoelastic:masterfrom
aleksmaus:feature/fleet-server-token-maintenance-priv

Conversation

@aleksmaus
Copy link
Copy Markdown
Contributor

@aleksmaus aleksmaus commented Dec 29, 2021

What is the problem this PR solves?

Elimination of the username/password authentication from the fleet server and replacement with the fleet service account token exposed an issue that the code that relies on performing queries with refresh is no longer works.

Surprisingly enough the _bulk API calls with refresh still succeed, which could be a possible issue that allows to circumvent the refresh permissions with _bulk requests.

Fleet server need to be able to perform some API calls with refresh flag enabled.

This would be a prerequisite for PR elastic/fleet-server#1039

How does this PR solve the problem?

Adds "maintenance" permission to the fleet-server service account in order to allow the fleet service perm operations with refresh.

Related issues

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Dec 29, 2021
@aleksmaus aleksmaus added the Team:Security Meta label for security team label Dec 29, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (Team:Security)

@aleksmaus aleksmaus force-pushed the feature/fleet-server-token-maintenance-priv branch from c9b4488 to b67de4b Compare December 29, 2021 18:30
@henningandersen henningandersen added the :Security/Security Security issues without another label label Dec 29, 2021
Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.0

aleksmaus added a commit to aleksmaus/elasticsearch that referenced this pull request Dec 30, 2021
…stic#82125)

* Add "maintenance" permission to the fleet-server service account

* Fix tests
elasticsearchmachine pushed a commit that referenced this pull request Dec 30, 2021
) (#82138)

* Add "maintenance" permission to the fleet-server service account

* Fix tests
@ruflin
Copy link
Copy Markdown
Contributor

ruflin commented Jan 10, 2022

@aleksmaus Is this an issue that potentially also exists in 7.x when the service account is used?

@aleksmaus
Copy link
Copy Markdown
Contributor Author

@aleksmaus Is this an issue that potentially also exists in 7.x when the service account is used?

this would affect the migration with update by query call https://github.com/elastic/fleet-server/blob/130056a170a3ca0b3b8ce4df12741ac40ae70c91/internal/pkg/dl/migration.go#L77
if there were any .fleet-agents document affected by this query, otherwise the refresh will noop.

for the rest of the code the _bulk API is used and elasticsearch doesn't enforce refresh permissions check on this API.

@ruflin
Copy link
Copy Markdown
Contributor

ruflin commented Jan 10, 2022

As this is a migration from 7.14 to 7.*, I assume it would also affect users going from 7.14 to 7.16/7.17. @scunningham Do you know what the side effect is if this migration doesn't go through for users?

@scunningham
Copy link
Copy Markdown

scunningham commented Jan 10, 2022

It could break a very small number of customers that use the security solution starting with 7.14. The migration added the agent_id field to the document in a way that accommodated the transform.

The bigger issue is whether the change confers the privilege retroactively. If we add the perms with this PR to the service-account, and then depend on it later, will the perms be applied to a fleet service account that was created in 7.14. If not, then we have to regenerate the fleet service account somehow or basically 1) break all customers before this PR, 2) not depend on it and remove it. @ruflin any idea?

@aleksmaus
Copy link
Copy Markdown
Contributor Author

aleksmaus commented Jan 10, 2022

I tested the upgrade from 7.14.2 to 7.16.2,
the fleet server exits with error on migration with refresh. once the server is restarted it works fine because the agent records are updated, it's just the refresh part of the query failed. once the data is updated the query becomes noop.
Deleting the token and creating a new one fixes the permissions.

@henningandersen @ywangd Is there any way we can update the existing token permissions under the hood and keep the same token value? what is the best place to implement the migration in case of permissions change?

@ywangd
Copy link
Copy Markdown
Member

ywangd commented Jan 11, 2022

By "token", I assume it means service account token? If so, the token's permission is updated automatically based on the node's version. I'd be surprised if the service token didn't get the latest permission after node is upgraded. Could you please provide more details on how to reproduce?

@ruflin
Copy link
Copy Markdown
Contributor

ruflin commented Jan 11, 2022

@scunningham Based on the comment from @ywangd it sounds like the Elasticsearch version / internal definition of the fleet-server service account defines the permissions so any token always has the "current" permissions.

@aleksmaus For the migration, it is not great especially that fleet-server exit. But as soon as it is started again, it is solved. This might actually explain some past user issue we have seen where fleet-server stopped but I had no explanation for it. Is it correct that for any user past 7.14 this will never happen?

@aleksmaus
Copy link
Copy Markdown
Contributor Author

@aleksmaus For the migration, it is not great especially that fleet-server exit. But as soon as it is started again, it is solved. This might actually explain some past user issue we have seen where fleet-server stopped but I had no explanation for it. Is it correct that for any user past 7.14 this will never happen?

@ywangd I made a mistake in my testing,while going between upgrades from 7.14.2 -> 7.16.2 and 7.14.2 -> 8.1.0.
I simplified the repro steps and went through the upgrade 7.14.2 -> 8.1.0. The token worked with new permissions after upgrade, the upgrade_by_query with refresh succeeded. So basically will just back port this PR to 7.16.3 and 7.17 and it should fix the permissions for the fleet server token.

Here are the steps I used to verify that it works:

  1. Start VERSION=7.14.2-SNAPSHOT from docker.elastic.co/elasticsearch/elasticsearch.
  2. Create a token
    Request:
curl -XPOST http://elastic:changeme@localhost:9200/_security/service/elastic/fleet-server/credential/token/token1

Response:

{"created":true,"token":{"name":"token1","value":"AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuMTp6UDJma0lqTFNwQ1UxWEI5RDl1LXh3"}}
  1. Create the agent record (simplified) with that token
    Request:
curl -XPUT -H"Authorization: Bearer AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuMTp6UDJma0lqTFNwQ1UxWEI5RDl1LXh3" -H "Content-Type: application/json" http://localhost:9200/.fleet-agents/_doc/2a32b4a7-bca2-461b-a834-fcf161cb341d -d '{"id": "2a32b4a7-bca2-461b-a834-fcf161cb341d"}'

Response:

{"_index":".fleet-agents-7","_type":"_doc","_id":"2a32b4a7-bca2-461b-a834-fcf161cb341d","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
  1. Stop Easticsearch 7.14.2

  2. Start Elasticsearch 8.1.0-SNAPSHOT on the same data.

  3. Update by query with refresh

Request:

curl -XPOST -H"Authorization: Bearer AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuMTp6UDJma0lqTFNwQ1UxWEI5RDl1LXh3" -H "Content-Type: application/json" "http://elasticsearch:9200/.fleet-agents/_update_by_query?conflicts=proceed&refresh=true"  -d'
{
  "script": {
    "source": "ctx._source.agent = [:];ctx._source.agent.id = ctx._id;",
    "lang": "painless"
  },
  "query": {
    "bool": {
      "must_not": [
        { "exists": {
          "field": "agent.id"
        }}
      ]
    }
  }
}'

Response:

{"took":78,"timed_out":false,"total":1,"updated":1,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

You would get this error with 7.16.2 instead right now, for example:

{"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:admin/refresh] is unauthorized for user [elastic/fleet-server], this action is granted by the index privileges [maintenance,manage,all]"}],"type":"security_exception","reason":"action [indices:admin/refresh] is unauthorized for user [elastic/fleet-server], this action is granted by the index privileges [maintenance,manage,all]"},"status":403}
  1. Check the agent record was updated, the new property agent.id was created:

Request:

curl -XGET -H"Authorization: Bearer AAEAAWVsYXN0aWMvZmxlZXQtc2VydmVyL3Rva2VuMTp6UDJma0lqTFNwQ1UxWEI5RDl1LXh3s” -H "Content-Type: application/json" "http://elasticsearch:9200/.fleet-agents/_search" | jq

Response:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".fleet-agents-7",
        "_id": "2a32b4a7-bca2-461b-a834-fcf161cb341d",
        "_score": 1,
        "_source": {
          "agent": {
            "id": "2a32b4a7-bca2-461b-a834-fcf161cb341d"
          },
          "id": "2a32b4a7-bca2-461b-a834-fcf161cb341d"
        }
      }
    ]
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :Security/Security Security issues without another label Team:Security Meta label for security team v7.16.3 v7.17.0 v8.0.0 v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants