Skip to content

feat: re-implement e2e managed tests#5444

Merged
moolen merged 2 commits intomainfrom
mj-e2e-managed
Oct 15, 2025
Merged

feat: re-implement e2e managed tests#5444
moolen merged 2 commits intomainfrom
mj-e2e-managed

Conversation

@moolen
Copy link
Copy Markdown
Member

@moolen moolen commented Oct 9, 2025

Test Runs

🟢 AWS Run
🟢 GCP Run
🟢 Azure Run

Re-implement E2E Managed Tests with Terraform Workspace Separation

This PR reinstates the long-neglected e2e managed tests for AWS, GCP, and Azure cloud providers.

Key Changes

1. Terraform Workspace Separation

For each cloud provider (AWS, GCP, Azure), the Terraform configuration is now split into two distinct workspaces:

  • infrastructure/: Provisions cloud infrastructure including:

    • Network resources (VPC, subnets, etc.)
    • Kubernetes clusters (EKS, GKE, AKS)
    • Workload Identity/IRSA configuration
    • Cloud Service accounts and IAM roles
  • kubernetes/: Provisions Kubernetes-level resources:

    • In-cluster configurations
    • Kubernetes service accounts
    • Workload identity bindings
    • Other Kubernetes resources

Why the separation? Terraform cannot handle provider configurations that depend on resources created by the same Terraform run. By separating infrastructure provisioning from Kubernetes resource management, we avoid circular dependency issues where the Kubernetes provider needs cluster credentials that don't exist until the cluster is created.

2. Restored E2E Managed Test Capability

The managed e2e tests can now be triggered via the /ok-to-test-managed slash command:

/ok-to-test-managed provider=aws sha=<commit-sha> 
/ok-to-test-managed provider=gcp sha=<commit-sha> 
/ok-to-test-managed provider=azure sha=<commit-sha>

This enables testing of External Secrets Operator against real cloud provider infrastructure in a controlled, on-demand manner.

3. Workflow Simplification

  • Removed the composite action (.github/actions/e2e-managed/action.yml)
  • Consolidated all steps into the workflow file as separate jobs (.github/workflows/e2e-managed.yml)

4. Updated Makefile

Added new Terraform targets that handle the two-workspace pattern:

  • tf.apply.%: Sequentially applies infrastructure then Kubernetes workspaces
  • tf.destroy.%: Sequentially destroys Kubernetes then infrastructure workspaces (in reverse order)
  • tf.plan.%: Plans both workspaces
  • tf.fmt: Format all Terraform files recursively

@github-actions github-actions bot added component/github-actions kind/feature Categorizes issue or PR as related to a new feature. size/l labels Oct 9, 2025
@moolen moolen force-pushed the mj-e2e-managed branch 4 times, most recently from f299d37 to aa18ba4 Compare October 9, 2025 22:08
Signed-off-by: Moritz Johner <beller.moritz@googlemail.com>
Comment on lines +15 to +21
# GCP variables
GCP_SERVICE_ACCOUNT_KEY: ${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}
GCP_SM_SA_GKE_JSON: ${{ secrets.GCP_SM_SA_GKE_JSON }}
GCP_GKE_CLUSTER: e2e
TF_VAR_GCP_GKE_CLUSTER: e2e
GCP_FED_REGION: ${{ secrets.GCP_FED_REGION }}
TF_VAR_GCP_FED_REGION: ${{ secrets.GCP_FED_REGION }}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite the mess with secrets, env vars and tf vars. The problem here is, that some configuration is shared among the e2e/suites/provider test package and the terraform code, e.g. GKE_CLUSTER name, service account names etc.

I chose to inherit the current approach, as i wasn't able to come up with something better at this stage as there are already a lot of changes.
I have considered:

  • using naming conventions in both tf and e2e test world, though that is very brittle and error-prone
  • having a .env config file somewhere - which makes things much worse, as it doesn't integrate well with GHA
  • run tf via terraform cloud and store configuration+secrets there, then use outputs and feed them into e2e tests. This is a lot of effort and we may run into issues with compatibility of the e2e test suite: the non-managed tests run in the same suite, hence they share the same env-variable interface as the managed tests. Therefore, we must keep the same interface for GHA secrets which are used in regular e2e tests and tf-based runs.

Open to ideas, or just leave it as-is for the time being.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already amazing work Moritz!

@moolen moolen marked this pull request as ready for review October 9, 2025 23:00
Comment on lines +150 to +151
- name: Login to Docker
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef # v3.6.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for for AWS? 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the e2e suite builds two images and pushes them to GHCR in this workflow. Then it boots up a EKS/GKE/AKS cluster which pull these images.

Comment on lines +334 to +336
ARM_CLIENT_ID: "${{ secrets.TFC_AZURE_CLIENT_ID }}"
ARM_SUBSCRIPTION_ID: "${{ secrets.TFC_AZURE_SUBSCRIPTION_ID }}"
ARM_TENANT_ID: "${{ secrets.TFC_AZURE_TENANT_ID }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come these are needed for only Azure yet not for AWS or GCP? 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both login actions of AWS and GCP export environment variables which are used by terraform.
Azure doesn't 😞
I guess we could export the ARM_* env vars at the top level of the workflow, though the whole authentication bits need to be restructured or reviewed - though in a separate PR.

We have to rely on static credentials (that's a part of what we test in our suite), but they shouldn't be lingering around as env vars in a workflow. They should be short-lived and bound to the lifecycle of the GHA run anyway. 🤷

@sonarqubecloud
Copy link
Copy Markdown

@moolen moolen merged commit 49debe8 into main Oct 15, 2025
33 of 34 checks passed
@moolen moolen deleted the mj-e2e-managed branch October 15, 2025 04:40
SamuelMolling pushed a commit to SamuelMolling/external-secrets that referenced this pull request Oct 24, 2025
Signed-off-by: Moritz Johner <beller.moritz@googlemail.com>
Signed-off-by: Samuel Molling <samuelmolling@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/github-actions kind/feature Categorizes issue or PR as related to a new feature. size/l

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants