xds-k8s: grant roles/iam.workloadIdentityUser automatically#26487
xds-k8s: grant roles/iam.workloadIdentityUser automatically#26487sergiitk merged 7 commits intogrpc:masterfrom
Conversation
lidizheng
left a comment
There was a problem hiding this comment.
Thanks for doing this! Great work on adding a new mechanism to interact with the GCP IAM service.
Do you have a Kokoro run link for this PR?
Reviewed 5 of 10 files at r1.
Reviewable status: 5 of 10 files reviewed, 8 unresolved discussions (waiting on @sergiitk)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
export ZONE="us-central1-a" # Dedicated GCP Service Account to use with workload identity. export WORKLOAD_SERVICE_ACCOUNT="xds-k8s-interop-tests@${PROJECT_ID}.iam.gserviceaccount.com"
I like previous version which allows people to pick their K8s namespace name. Is there a reason we want a fixed SA name?
tools/run_tests/xds_k8s_test_driver/README.md, line 108 at r1 (raw file):
project, **you can skip this step**. If you're configuring the test framework to run on a CI: use `roles/owner` account once to allow test framework to grant `roles/iam.workloadIdentityUser`.
Optional: we might also want to mention scope for VMs or GKE clusters. During my experiments, I once blocked by a VM unable to access most GCP API due to bad scope setting.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 71 at r1 (raw file):
@classmethod def from_response(cls, response):
Type annotation for response?
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 98 at r1 (raw file):
@classmethod def from_response(cls, response):
Type annotation for response?
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):
@dataclasses.dataclass(eq=False, frozen=True)
Just curious, why do we need eq=False?
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
super().__init__(api_manager.iam('v1'), project) # Shortcut to projects/*/serviceAccounts/ endpoints self._service_accounts = self.api.projects().serviceAccounts()
Do we need to worry about this line to hang or take unexpectedly long time?
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 300 at r1 (raw file):
@staticmethod def _replace_binding(policy, binding, new_binding):
nit: This can be a private function in this module. It's better to simplify the class logic, since they tend to grow in complexity naturally.
tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):
gcp_iam=self.gcp_iam, gcp_service_account=self.gcp_service_account, service_account_name=self.service_account_name)
Is it possible to share a service account across multiple namespaces? In that way, we can save a few seconds granting and revoking identity users.
sergiitk
left a comment
There was a problem hiding this comment.
Thanks for useful feedback! Kokoro run link isn't ready yet, I had to setup new SA on prod first. Will post it as soon as it passes.
Reviewable status: 5 of 10 files reviewed, 8 unresolved discussions (waiting on @lidizheng and @sergiitk)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
I like previous version which allows people to pick their K8s namespace name.
K8S_NAMESPACE was only used for gcloud compute firewall-rules create "${K8S_NAMESPACE}-allow-health-checks", which has no relation to $K8S_NAMESPACE. Don't remember why I even used it.
Is there a reason we want a fixed SA name?
I'll break it into
export WORKLOAD_SA_NAME="xds-k8s-interop-tests"
export WORKLOAD_SA_EMAIL="${WORKLOAD_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"Apparently gcloud iam service-accounts create can't accept full workload name (AKA email format).
tools/run_tests/xds_k8s_test_driver/README.md, line 108 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Optional: we might also want to mention scope for VMs or GKE clusters. During my experiments, I once blocked by a VM unable to access most GCP API due to bad scope setting.
AFAIK --scopes=cloud-platform only needed when Workload Identity is off.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Just curious, why do we need
eq=False?
Good catch! It used to contain unhashable field (members was a list), and I forgot that I swapped it for a frozen set.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Do we need to worry about this line to hang or take unexpectedly long time?
Not on this line, at this point it's basically building object from a loaded spec.
However, the line above could:
super().__init__(api_manager.iam('v1'), project)
api_manager.iam('v1') performs a network call to a discovery URL, to load the spec. I don't think we should fix it in this PR, but it's a valid concern. I'll take a note for my GCP API refactoring.
tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Is it possible to share a service account across multiple namespaces? In that way, we can save a few seconds granting and revoking identity users.
It's possible, but here's why we're not doing it
- Namespace is a part of Workload Identity member name:
serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME], so we'll still need to do grant/revoke at least once per different namespace - It's better to treat test client and test server as completely separate apps. Real world example would be something like a web server talking to a payment server, and each one will separate service account (with different set of permissions). This is reflected in PSM Security setup, where kubernetes service account is a part of spiffe identity:
spiffe://WORKLOAD_POOL/ns/NAMESPACE/sa/KUBERNETES_SERVICE_ACCOUNT, which becomes app identifier in the certificates (see https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md#22-path). And we want to test that different apps (with different owners) can talk to each other
sergiitk
left a comment
There was a problem hiding this comment.
Reviewable status: 3 of 11 files reviewed, 7 unresolved discussions (waiting on @lidizheng)
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 71 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Type annotation for response?
Done.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 98 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Type annotation for response?
Done.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 105 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Good catch! It used to contain unhashable field (
memberswas a list), and I forgot that I swapped it for a frozen set.
Done.
lidizheng
left a comment
There was a problem hiding this comment.
This PR looks great. Thanks for the changes. I will approve once we confirmed that Kokoro would be happy with this PR.
Reviewed 1 of 5 files at r2.
Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng and @sergiitk)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
I like previous version which allows people to pick their K8s namespace name.
K8S_NAMESPACEwas only used forgcloud compute firewall-rules create "${K8S_NAMESPACE}-allow-health-checks", which has no relation to$K8S_NAMESPACE. Don't remember why I even used it.Is there a reason we want a fixed SA name?
I'll break it into
export WORKLOAD_SA_NAME="xds-k8s-interop-tests" export WORKLOAD_SA_EMAIL="${WORKLOAD_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"Apparently
gcloud iam service-accounts createcan't accept full workload name (AKA email format).
Yep. I mean we could set:
export WORKLOAD_SA_NAME="${K8S_NAMESPACE}-workload-sa"
tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):
--project=grpc-testing --network=default-vpc --gcp_service_account=xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
Are the IAM policy bindings already set up for this SA?
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):
@functools.lru_cache(None) def iam(self, version):
Type annotations.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Not on this line, at this point it's basically building object from a loaded spec.
However, the line above could:super().__init__(api_manager.iam('v1'), project)
api_manager.iam('v1')performs a network call to a discovery URL, to load the spec. I don't think we should fix it in this PR, but it's a valid concern. I'll take a note for my GCP API refactoring.
Then... should we worry about the api_manager.iam('v1') taking unexpectedly long time? I'm not sure when the pulling of IAM resources happens. But we may want to either fast-fast or retry with a reasonable deadline.
tools/run_tests/xds_k8s_test_driver/framework/test_app/client_app.py, line 320 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
It's possible, but here's why we're not doing it
- Namespace is a part of Workload Identity member name:
serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME], so we'll still need to do grant/revoke at least once per different namespace- It's better to treat test client and test server as completely separate apps. Real world example would be something like a web server talking to a payment server, and each one will separate service account (with different set of permissions). This is reflected in PSM Security setup, where kubernetes service account is a part of spiffe identity:
spiffe://WORKLOAD_POOL/ns/NAMESPACE/sa/KUBERNETES_SERVICE_ACCOUNT, which becomes app identifier in the certificates (see https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md#22-path). And we want to test that different apps (with different owners) can talk to each other
Sounds good.
sergiitk
left a comment
There was a problem hiding this comment.
Tests passed! https://fusion2.corp.google.com/invocations/92ff9b67-0205-4f1f-a832-2f8b0963c0df
Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Yep. I mean we could set:
export WORKLOAD_SA_NAME="${K8S_NAMESPACE}-workload-sa"
Potentially, but I don't see how $K8S_NAMESPACE related to $WORKLOAD_SA_NAME.
Especially that we'll generate k8s namespace name on the flight soon, with including unique test run id. In that PR i'm planning to introduce --resource-prefix to disambiguate with --namespace (used both for prefixes and k8s namespaces).
Maybe adding $RESOURCE_PREFIX would make more sense? Something like
export RESOURCE_PREFIX="xds-k8s-interop-tests"
export CLUSTER_NAME="${RESOURCE_PREFIX}-cluster"
# ...
export WORKLOAD_SA_NAME="${RESOURCE_PREFIX}"
# ...
gcloud compute firewall-rules create "${RESOURCE_PREFIX}-allow-health-checks"
tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):
Yes, already running kokoro using it
I0617 13:35:44.306092 140060747343616 base_runner.py:152] Granting roles/iam.workloadIdentityUser to serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for GCP Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
I0617 13:35:44.749932 140060747343616 iam.py:280] Role roles/iam.workloadIdentityUser granted to member serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Type annotations.
Thanks. We probably should revive that mypy PR.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Then... should we worry about the
api_manager.iam('v1')taking unexpectedly long time? I'm not sure when the pulling of IAM resources happens. But we may want to either fast-fast or retry with a reasonable deadline.
This doesn't pull IAM resources, just discovery doc (aka API spec). Looking at google-api-python-client, for this operation the defaults are 1 retry and timeout 60s.
So far it wasn't causing issues. Do you think moving this out of __init__, and lazy-loading on demand would be a good solution?
sergiitk
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @lidizheng)
tools/run_tests/xds_k8s_test_driver/config/grpc-testing.cfg, line 4 at r2 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Yes, already running kokoro using it
I0617 13:35:44.306092 140060747343616 base_runner.py:152] Granting roles/iam.workloadIdentityUser to serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for GCP Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
I0617 13:35:44.749932 140060747343616 iam.py:280] Role roles/iam.workloadIdentityUser granted to member serviceAccount:grpc-testing.svc.id.goog[sergii-xds-k8s-pr26487/psm-grpc-server] for Service Account xds-k8s-interop-tests@grpc-testing.iam.gserviceaccount.com
Done.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Thanks. We probably should revive that mypy PR.
Done.
… reverted" This reverts commit 9a49696.
lidizheng
left a comment
There was a problem hiding this comment.
LGTM. Great work! Don't forget to remove the DO-NOT-MERGE contents.
Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng and @sergiitk)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Potentially, but I don't see how
$K8S_NAMESPACErelated to$WORKLOAD_SA_NAME.Especially that we'll generate k8s namespace name on the flight soon, with including unique test run id. In that PR i'm planning to introduce
--resource-prefixto disambiguate with--namespace(used both for prefixes and k8s namespaces).Maybe adding
$RESOURCE_PREFIXwould make more sense? Something likeexport RESOURCE_PREFIX="xds-k8s-interop-tests" export CLUSTER_NAME="${RESOURCE_PREFIX}-cluster" # ... export WORKLOAD_SA_NAME="${RESOURCE_PREFIX}" # ... gcloud compute firewall-rules create "${RESOURCE_PREFIX}-allow-health-checks"
Yep. The new variable sounds great.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
Done.
Yeah, it would help this situation. I can work on that when I got some free cycles.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
Previously, sergiitk (Sergii Tkachenko) wrote…
This doesn't pull IAM resources, just discovery doc (aka API spec). Looking at google-api-python-client, for this operation the defaults are 1 retry and timeout 60s.
So far it wasn't causing issues. Do you think moving this out of__init__, and lazy-loading on demand would be a good solution?
If there are retries already in place, I'm good.
sergiitk
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng)
tools/run_tests/xds_k8s_test_driver/README.md, line 53 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Yep. The new variable sounds great.
Done.
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/api.py, line 143 at r2 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
Yeah, it would help this situation. I can work on that when I got some free cycles.
Thank you!
tools/run_tests/xds_k8s_test_driver/framework/infrastructure/gcp/iam.py, line 188 at r1 (raw file):
Previously, lidizheng (Lidi Zheng) wrote…
If there are retries already in place, I'm good.
Done.
sergiitk
left a comment
There was a problem hiding this comment.
Thank for the review! Already reverted them in 9b4c3d3.
Reviewable status: 4 of 11 files reviewed, 2 unresolved discussions (waiting on @lidizheng)
roles/iam.workloadIdentityUserto workload identity memberserviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME]so kubernetes pods can use GCP service accountgcloud iam service-accounts add-iam-policy-bindingandgcloud iam service-accounts remove-iam-policy-bindingroles/iam.workloadIdentityUserautomatically (normally this requires project owner)This is needed to run multiple instances of the driver, each restricted to its own kubernetes namespace. b/177354746
This change is