Skip to content

xds-k8s: grant roles/iam.workloadIdentityUser automatically#26487

Merged
sergiitk merged 7 commits intogrpc:masterfrom
sergiitk:xds-k8s-grant-workload-identity
Jun 17, 2021
Merged

xds-k8s: grant roles/iam.workloadIdentityUser automatically#26487
sergiitk merged 7 commits intogrpc:masterfrom
sergiitk:xds-k8s-grant-workload-identity

Conversation

@sergiitk
Copy link
Copy Markdown
Member

@sergiitk sergiitk commented Jun 15, 2021

  1. Automatically grant roles/iam.workloadIdentityUser to workload identity member serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME] so kubernetes pods can use GCP service account
  2. Implement support for Service Account policy bindings via GCP IAM API, with features analogous to gcloud iam service-accounts add-iam-policy-binding and gcloud iam service-accounts remove-iam-policy-binding
  3. Describe how to setup dedicated GCP workload service account with minimal required privileges to run xds test client and server.
  4. Describe how to setup CI to allow test framework to grant roles/iam.workloadIdentityUser automatically (normally this requires project owner)

This is needed to run multiple instances of the driver, each restricted to its own kubernetes namespace. b/177354746


This change is Reviewable

@sergiitk sergiitk added the release notes: no Indicates if PR should not be in release notes label Jun 15, 2021
@sergiitk sergiitk requested a review from lidizheng June 15, 2021 22:06
Copy link
Copy Markdown
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Great work on adding a new mechanism to interact with the GCP IAM service.

Do you have a Kokoro run link for this PR?

Reviewed 5 of 10 files at r1.
Reviewable status: 5 of 10 files reviewed, 8 unresolved discussions (waiting on @sergiitk)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

export ZONE="us-central1-a" 
# Dedicated GCP Service Account to use with workload identity.
export WORKLOAD_SERVICE_ACCOUNT="xds-k8s-interop-tests@${PROJECT_ID}.iam.gserviceaccount.com"

I like previous version which allows people to pick their K8s namespace name. Is there a reason we want a fixed SA name?


tools/run_tests/xds_k8s_test_driver/README.md, line 108 at r1 (raw file):

project, **you can skip this step**.  
If you're configuring the test framework to run on a CI: use `roles/owner`
account once to allow test framework to grant `roles/iam.workloadIdentityUser`.

Optional: we might also want to mention scope for VMs or GKE clusters. During my experiments, I once blocked by a VM unable to access most GCP API due to bad scope setting.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 71 at r1 (raw file):

    @classmethod
    def from_response(cls, response):

Type annotation for response?


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 98 at r1 (raw file):

    @classmethod
    def from_response(cls, response):

Type annotation for response?


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):

@dataclasses.dataclass(eq=False, frozen=True)

Just curious, why do we need eq=False?


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

        super().__init__(api_manager.iam('v1'), project)
        # Shortcut to projects/*/serviceAccounts/ endpoints
        self._service_accounts = self.api.projects().serviceAccounts()

Do we need to worry about this line to hang or take unexpectedly long time?


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 300 at r1 (raw file):

    @staticmethod
    def _replace_binding(policy, binding, new_binding):

nit: This can be a private function in this module. It's better to simplify the class logic, since they tend to grow in complexity naturally.


tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):

                gcp_iam=self.gcp_iam,
                gcp_service_account=self.gcp_service_account,
                service_account_name=self.service_account_name)

Is it possible to share a service account across multiple namespaces? In that way, we can save a few seconds granting and revoking identity users.

@lidizheng lidizheng self-assigned this Jun 15, 2021
Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for useful feedback! Kokoro run link isn't ready yet, I had to setup new SA on prod first. Will post it as soon as it passes.

Reviewable status: 5 of 10 files reviewed, 8 unresolved discussions (waiting on @lidizheng and @sergiitk)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

I like previous version which allows people to pick their K8s namespace name.

K8S_NAMESPACE was only used for gcloud compute firewall-rules create "${K8S_NAMESPACE}-allow-health-checks", which has no relation to $K8S_NAMESPACE. Don't remember why I even used it.

Is there a reason we want a fixed SA name?

I'll break it into

export WORKLOAD_SA_NAME="xds-k8s-interop-tests"
export WORKLOAD_SA_EMAIL="${WORKLOAD_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

Apparently gcloud iam service-accounts create can't accept full workload name (AKA email format).


tools/run_tests/xds_k8s_test_driver/README.md, line 108 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Optional: we might also want to mention scope for VMs or GKE clusters. During my experiments, I once blocked by a VM unable to access most GCP API due to bad scope setting.

AFAIK --scopes=cloud-platform only needed when Workload Identity is off.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Just curious, why do we need eq=False?

Good catch! It used to contain unhashable field (members was a list), and I forgot that I swapped it for a frozen set.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Do we need to worry about this line to hang or take unexpectedly long time?

Not on this line, at this point it's basically building object from a loaded spec.
However, the line above could:

super().__init__(api_manager.iam('v1'), project)

api_manager.iam('v1') performs a network call to a discovery URL, to load the spec. I don't think we should fix it in this PR, but it's a valid concern. I'll take a note for my GCP API refactoring.


tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Is it possible to share a service account across multiple namespaces? In that way, we can save a few seconds granting and revoking identity users.

It's possible, but here's why we're not doing it

  1. Namespace is a part of Workload Identity member name: serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME], so we'll still need to do grant/revoke at least once per different namespace
  2. It's better to treat test client and test server as completely separate apps. Real world example would be something like a web server talking to a payment server, and each one will separate service account (with different set of permissions). This is reflected in PSM Security setup, where kubernetes service account is a part of spiffe identity: spiffe://WORKLOAD_POOL/ns/NAMESPACE/sa/KUBERNETES_SERVICE_ACCOUNT, which becomes app identifier in the certificates (see https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md#22-path). And we want to test that different apps (with different owners) can talk to each other

Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 11 files reviewed, 7 unresolved discussions (waiting on @lidizheng)


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 71 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Type annotation for response?

Done.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 98 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Type annotation for response?

Done.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Good catch! It used to contain unhashable field (members was a list), and I forgot that I swapped it for a frozen set.

Done.

@sergiitk sergiitk requested a review from lidizheng June 17, 2021 20:06
Copy link
Copy Markdown
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks great. Thanks for the changes. I will approve once we confirmed that Kokoro would be happy with this PR.

Reviewed 1 of 5 files at r2.
Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng and @sergiitk)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

I like previous version which allows people to pick their K8s namespace name.

K8S_NAMESPACE was only used for gcloud compute firewall-rules create "${K8S_NAMESPACE}-allow-health-checks", which has no relation to $K8S_NAMESPACE. Don't remember why I even used it.

Is there a reason we want a fixed SA name?

I'll break it into

export WORKLOAD_SA_NAME="xds-k8s-interop-tests"
export WORKLOAD_SA_EMAIL="${WORKLOAD_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

Apparently gcloud iam service-accounts create can't accept full workload name (AKA email format).

Yep. I mean we could set:

export WORKLOAD_SA_NAME="${K8S_NAMESPACE}-workload-sa"

tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):

--project=grpc-testing
--network=default-vpc
--gcp_service_account=xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com

Are the IAM policy bindings already set up for this SA?


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):

    @functools.lru_cache(None)
    def iam(self, version):

Type annotations.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Not on this line, at this point it's basically building object from a loaded spec.
However, the line above could:

super().__init__(api_manager.iam('v1'), project)

api_manager.iam('v1') performs a network call to a discovery URL, to load the spec. I don't think we should fix it in this PR, but it's a valid concern. I'll take a note for my GCP API refactoring.

Then... should we worry about the api_manager.iam('v1') taking unexpectedly long time? I'm not sure when the pulling of IAM resources happens. But we may want to either fast-fast or retry with a reasonable deadline.


tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

It's possible, but here's why we're not doing it

  1. Namespace is a part of Workload Identity member name: serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME], so we'll still need to do grant/revoke at least once per different namespace
  2. It's better to treat test client and test server as completely separate apps. Real world example would be something like a web server talking to a payment server, and each one will separate service account (with different set of permissions). This is reflected in PSM Security setup, where kubernetes service account is a part of spiffe identity: spiffe://WORKLOAD_POOL/ns/NAMESPACE/sa/KUBERNETES_SERVICE_ACCOUNT, which becomes app identifier in the certificates (see https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md#22-path). And we want to test that different apps (with different owners) can talk to each other

Sounds good.

Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed! https://fusion2.corp.google.com/invocations/92ff9b67-0205-4f1f-a832-2f8b0963c0df

Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Yep. I mean we could set:

export WORKLOAD_SA_NAME="${K8S_NAMESPACE}-workload-sa"

Potentially, but I don't see how $K8S_NAMESPACE related to $WORKLOAD_SA_NAME.

Especially that we'll generate k8s namespace name on the flight soon, with including unique test run id. In that PR i'm planning to introduce --resource-prefix to disambiguate with --namespace (used both for prefixes and k8s namespaces).

Maybe adding $RESOURCE_PREFIX would make more sense? Something like

export RESOURCE_PREFIX="xds-k8s-interop-tests"
export CLUSTER_NAME="${RESOURCE_PREFIX}-cluster"
# ...
export WORKLOAD_SA_NAME="${RESOURCE_PREFIX}"
# ...
gcloud compute firewall-rules create "${RESOURCE_PREFIX}-allow-health-checks"

tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):
Yes, already running kokoro using it

I0617 13:35:44.306092 140060747343616 base_runner.py:152] Granting roles/iam.workloadIdentityUser to serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for GCP Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
I0617 13:35:44.749932 140060747343616 iam.py:280] Role roles/iam.workloadIdentityUser granted to member serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Type annotations.

Thanks. We probably should revive that mypy PR.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Then... should we worry about the api_manager.iam('v1') taking unexpectedly long time? I'm not sure when the pulling of IAM resources happens. But we may want to either fast-fast or retry with a reasonable deadline.

This doesn't pull IAM resources, just discovery doc (aka API spec). Looking at google-api-python-client, for this operation the defaults are 1 retry and timeout 60s.
So far it wasn't causing issues. Do you think moving this out of __init__, and lazy-loading on demand would be a good solution?

Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng)


tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Yes, already running kokoro using it

I0617 13:35:44.306092 140060747343616 base_runner.py:152] Granting roles/iam.workloadIdentityUser to serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for GCP Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
I0617 13:35:44.749932 140060747343616 iam.py:280] Role roles/iam.workloadIdentityUser granted to member serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com

Done.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Thanks. We probably should revive that mypy PR.

Done.

Copy link
Copy Markdown
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work! Don't forget to remove the DO-NOT-MERGE contents.

Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng and @sergiitk)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Potentially, but I don't see how $K8S_NAMESPACE related to $WORKLOAD_SA_NAME.

Especially that we'll generate k8s namespace name on the flight soon, with including unique test run id. In that PR i'm planning to introduce --resource-prefix to disambiguate with --namespace (used both for prefixes and k8s namespaces).

Maybe adding $RESOURCE_PREFIX would make more sense? Something like

export RESOURCE_PREFIX="xds-k8s-interop-tests"
export CLUSTER_NAME="${RESOURCE_PREFIX}-cluster"
# ...
export WORKLOAD_SA_NAME="${RESOURCE_PREFIX}"
# ...
gcloud compute firewall-rules create "${RESOURCE_PREFIX}-allow-health-checks"

Yep. The new variable sounds great.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

Done.

Yeah, it would help this situation. I can work on that when I got some free cycles.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

Previously, sergiitk (Sergii Tkachenko) wrote…

This doesn't pull IAM resources, just discovery doc (aka API spec). Looking at google-api-python-client, for this operation the defaults are 1 retry and timeout 60s.
So far it wasn't causing issues. Do you think moving this out of __init__, and lazy-loading on demand would be a good solution?

If there are retries already in place, I'm good.

Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng)


tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Yep. The new variable sounds great.

Done.


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

Yeah, it would help this situation. I can work on that when I got some free cycles.

Thank you!


tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):

Previously, lidizheng (Lidi Zheng) wrote…

If there are retries already in place, I'm good.

Done.

Copy link
Copy Markdown
Member Author

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the review! Already reverted them in 9b4c3d3.

Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng)

@sergiitk sergiitk merged commit 433c5ea into grpc:master Jun 17, 2021
@sergiitk sergiitk deleted the xds-k8s-grant-workload-identity branch June 17, 2021 23:38
sergiitk added a commit to sergiitk/grpc that referenced this pull request Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang/Python release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants