Skip to content

refactor(test): restructure integration tests with tiered CI, parallel jobs, and Finch cross-platform smoke tests#8666

Merged
roger-zhangg merged 42 commits intodevelopfrom
test-tiering
Feb 20, 2026
Merged

refactor(test): restructure integration tests with tiered CI, parallel jobs, and Finch cross-platform smoke tests#8666
roger-zhangg merged 42 commits intodevelopfrom
test-tiering

Conversation

@roger-zhangg
Copy link
Copy Markdown
Member

@roger-zhangg roger-zhangg commented Feb 19, 2026

Which issue(s) does this change fix?

Addresses integration test CI performance and reliability issues. No specific issue number.

Why is this change necessary?

The integration test workflow was slow (~2+ hours), flaky due to cross-worker Docker interference, and lacked cross-platform validation. The container_runtime matrix (docker/finch/no-container) created too many combinations, and tests that required AWS credentials were mixed with local-only tests.

How does it address the issue?

1. Workflow restructuring (21 parallel matrix jobs)

  • Removed container_runtime matrix dimension (docker/finch/no-container)
  • Split large jobs into smaller parallel ones: build-x86-1/2, build-arm64, build-x86-container-1/2, build-arm64-container-1/2, terraform-build/start-api/invoke-start-lambda, package, deploy, sync-code/watch, local-invoke/start-api/start-lambda, other-and-e2e, cloud-based-tests, tier1-finch
  • Local-only jobs release test accounts early after ECR login

2. Credential-based test separation

  • @pytest.mark.requires_credential marker for tests needing AWS credentials
  • Build/local jobs use -m "not requires_credential" to exclude cloud tests
  • cloud-based-tests job collects them via -m requires_credential

3. Docker container cleanup isolation

  • Scoped container cleanup to per-test-class snapshots
  • Added xdist_group markers: durable, docker_images, remote_layers, docker_watcher, lambda_layers

4. Tier 1 cross-platform smoke tests (~60 curated tests)

A curated subset marked with @pytest.mark.tier1 runs on every OS/container-runtime combination via the tier1-finch job. Each test is a dedicated test_tier1_* method calling existing logic with one specific parameter set.

Category Test File
Build: Python test_tier1_python_build test_build_cmd_python.py
Build: Python (container) test_tier1_python_build_in_container test_build_cmd_python.py
Build: Java test_tier1_java_build test_build_cmd_java.py
Build: Java (container) test_tier1_java_build_in_container test_build_cmd_java.py
Build: Node.js test_tier1_node_build test_build_cmd_node.py
Build: Node.js (container) test_tier1_node_build_in_container test_build_cmd_node.py
Build: .NET test_tier1_dotnet_build test_build_cmd_dotnet.py
Build: .NET (container) test_tier1_dotnet_build_in_container test_build_cmd_dotnet.py
Build: Ruby test_building_ruby_3_2 (parameterized) test_build_cmd.py
Build: Rust test_tier1_rust_build test_build_cmd_rust.py
Build: Provided test_tier1_provided_build test_build_cmd_provided.py
Build: Provided (container) test_tier1_provided_build_in_container test_build_cmd_provided.py
Build: Nested stacks test_nested_build_invoke_in_container test_build_cmd.py
Build: Symlink TestBuildWithNestedStacks3LevelWithSymlink test_build_cmd.py
Build: Samconfig test_samconfig_parameters_are_overridden test_build_samconfig.py
Build: Terraform test_build_and_invoke_lambda_functions test_build_terraform_applications.py
Build: Layer test_tier1_layer_build test_build_cmd.py
ARM64: Python test_tier1_python_arm64_build test_build_cmd_arm64.py
ARM64: Java test_tier1_java_arm64_build test_build_cmd_arm64.py
ARM64: Node.js test_tier1_node_arm64_build test_build_cmd_arm64.py
ARM64: Ruby test_tier1_ruby_arm64_build test_build_cmd_arm64.py
ARM64: Provided test_tier1_provided_arm64_build test_build_cmd_arm64.py
ARM64: Rust test_tier1_rust_arm64_build test_build_cmd_arm64.py
local invoke test_invoke_returncode_is_zero test_integrations_cli.py
local invoke (layers) test_local_zip_layers test_integrations_cli.py
local invoke (durable) test_tier1_durable_invoke test_invoke_durable.py
local start-api test_calling_proxy_endpoint test_start_api.py
local start-lambda test_invoke_with_data test_start_lambda.py
local generate-event test_generate_event_substitution test_cli_integ.py
local callback test_tier1_callback test_callback.py
local execution test_tier1_execution test_execution.py
sam init test_init_command_passes_and_dir_created test_init_command.py
sam validate test_default_template_file_choice test_validate_command.py
sam deploy test_deploy_guided_zip test_deploy_command.py
sam delete test_tier1_delete test_delete_command.py
sam package test_tier1_package test_package_command_zip.py
sam sync test_tier1_sync_infra test_sync_infra.py

5. Test fixes and improvements

  • Fixed verify_pulled_image runtime mismatch (python3.12 → python3.11)
  • Fixed EventBridge schema registry tests with dynamic position lookup
  • Updated credential test runtimes (dotnet10, java25, python3.12, ruby3.4, nodejs22.x)
  • Made delete test robust against CloudFormation EarlyValidation failures
  • Fixed warm container SIGTERM test assertion

6. Infrastructure

  • tests/setup_testing_resources.py — credential setup script
  • tests/reset_testing_resources.py — account reset + S3 report upload
  • tests/setup_finch.sh — Finch installation script
  • Updated CONTRIBUTING.md with integration test guidelines

What side effects does this change have?

  • Some parameterized test cases moved to dedicated test_tier1_* methods (same coverage, no duplication)
  • Finch tests run as a new matrix entry (~15 min, parallel with other jobs)
  • build-arm64 job now sets up QEMU (needed for ARM64 tier1 local invoke)

Mandatory Checklist

PRs will only be reviewed after checklist is complete

  • Review the generative AI contribution guidelines
  • Add input/output type hints to new functions/methods
  • Write design document if needed
  • Write/update unit tests
  • Write/update integration tests
  • Write/update functional tests if needed
  • make pr passes
  • make update-reproducible-reqs if dependencies were changed
  • Write documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

… add test tiering, scope Docker cleanup

- Remove container_runtime matrix (docker/finch/no-container) in favor of explicit test_suite entries
- Split build tests into build-x86, build-arm64, build-x86-container, build-arm64-container
- Merge local-start1/local-start2 into local-start with -n 2 parallelism
- Add cloud-based-tests job using @pytest.mark.requires_credential marker
- Skip credential-requiring tests (SAR, layers, STS) in local-only jobs via SAM_CLI_NO_CREDENTIALS
- Scope Docker container/image cleanup to per-test-class snapshots to prevent cross-worker interference
- Extract credential setup/reset into setup_testing_resources.py and reset_testing_resources.py
- Update credential test runtimes: dotnet10, java25, python3.12, ruby3.4, nodejs22.x, provided.al2023 for Go
- Remove Free up disk space step
- Add uv availability check in Makefile init
…dates, schema registry lookup

- Add requires_credential marker to terraform S3 backend and layer tests
- Exclude requires_credential tests from terraform job, run in cloud-based-tests
- Remove AWS credentials from terraform job (no longer needed)
- Add Terraform install to cloud-based-tests job
- Fix java25 dir rename and pom.xml compiler version
- Add go.sum for Go STS credential test
- Remove global Docker image cleanup from WarmContainersRemoteLayers tests
- Add xdist_group markers for durable tests (port 9014) and rapid image tests
- Use --dist loadgroup for local-invoke and local-start jobs
- Fix EventBridge schema registry tests with dynamic position lookup
- Update CONTRIBUTING.md with integration test guidelines
… jobs

- All jobs now get credentials unconditionally (remove conditional OIDC/ECR/reset)
- Use -m "not requires_credential" in build/local jobs instead of SAM_CLI_NO_CREDENTIALS env var
- Remove SKIP_CREDENTIAL_TESTS and skipIf patterns from test files
- Remove requires_credential from terraform tests (run in terraform jobs directly)
- Split terraform into terraform-build (-n 4) and terraform-local (sequential)
- Mark TestBuildCommand_LayerBuilds as requires_credential
- Add xdist groups: durable (callback, execution), remote_layers, docker_watcher
- Add TestSamPython36HelloWorldIntegrationImages to docker_images group
- Update CONTRIBUTING.md with simplified marker-only approach
…st groups

- Fix verify_pulled_image/verify_docker_container_cleanedup to use python3.11
  matching the actual template runtimes (was incorrectly hardcoded to python3.12)
- Make test_delete_no_prompts_with_s3_prefix_present_zip robust against deploy
  failures (CloudFormation EarlyValidation hooks)
- Add xdist groups for TestSamPython36HelloWorldIntegrationImages (docker_images),
  TestLocalCallback/TestLocalExecution (durable),
  WarmContainersRemoteLayers* (remote_layers),
  Watching*Image*/DockerFileLocation* (docker_watcher)
- Split local-start into local-start-api and local-start-lambda
- Split sync into sync-code (-n 2) and sync-watch (sequential)
- Split package-delete-deploy into package (+ delete) and deploy
- Remove Docker image cleanup from layer tests (images persist, reused)
- Add lambda_layers xdist group to all layer test classes
- Add flaky(reruns=3) at class level for TestLayerVersion
- Update Node.js STS SDK to ^3.700.0 (fix @smithy/protocol-http missing)
- build-x86 -> build-x86-1 (general) + build-x86-2 (language-specific)
- build-x86-container -> build-x86-container-1 + build-x86-container-2
- build-arm64-container -> build-arm64-container-1 (non-java) + build-arm64-container-2 (java)
- terraform-local -> terraform-start-api + terraform-invoke-start-lambda
- Update all setup step conditions for new job names
- Remove samcli image cleanup from layer tests, add lambda_layers xdist group
- Move test_build_cmd_python.py and test_build_cmd_java.py to build-x86-1/container-1
- Move test_sync_build_in_source.py from sync-code to sync-watch
…ts as requires_credential

- Add early reset step after ECR login for build/local jobs to free test account
- Skip final reset for jobs that already released early
- Mark TestSamPython36HelloWorldIntegrationImages and TestDeleteOldRapidImages as requires_credential
…iner assertion

- Merge S3 report upload into reset_testing_resources.py (always uploads, conditionally resets)
- Use SKIP_ACCOUNT_RESET env var for local-only jobs instead of workflow condition
- Delete standalone upload_test_report.py
- Fix TestWarmContainersHandlesSigTermInterrupt to use assertGreaterEqual for container count
- Add @pytest.mark.tier1 to 23 test classes across all feature areas
- Fix S3 report upload: use configure-aws-credentials role-chaining for OIDC->RoleA->RoleB
- Simplify upload_test_reports to use default credentials (set by workflow)
- Add tier1 markers for: durable, layers, sync, deploy, terraform, regression, callback, execution
- Update TIER1_TESTS.md with complete coverage table
…ust import

- Add tier1-finch to matrix with conditional Finch setup, ECR login, and runtime
- Move setup_finch.sh from scripts/ to tests/
- Add tier1-finch to all toolchain setup conditions
- Remove separate tier1-finch job (now part of matrix)
- Fix missing pytest import in test_build_cmd_rust.py
… cases, add missing commands

- Move tier1 from class-level to method-level with dedicated test_tier1_* methods
- Add container + non-container tier1 for each runtime (Python, Java, Node, Dotnet, Rust, Provided)
- Remove duplicated parameterized cases covered by tier1 methods
- Add tier1 for: sam delete, sam package, symlink builds, layer builds, sam sync
- Fix missing pytest imports in delete and package tests
- Revert cargo-lambda to pip install
- Update CONTRIBUTING.md and TIER1_TESTS.md
- Fix tier1 methods that called parameterized methods (inline logic instead)
- Fix dotnet validate_build_command params (mode=None, use_container=True)
- Fix Python tier1 skip condition to check template and codeuri
- Add ARM64 tier1 build tests: Python, Java, Node, Ruby, Provided, Rust
- Update TIER1_TESTS.md with ARM64 section
…ams, skip ARM64 without Docker

- Restore test_tier1_rust_build_in_container (was accidentally deleted)
- Fix dotnet container tier1 to use dotnet8 (dotnet10 container image not available)
- Fix layer tier1 to use correct overrides (LayerContentUri, python3.11)
- Add skipIf(SKIP_DOCKER_TESTS) to ARM64 tier1 tests (need Docker for invoke)
- All container tier1 methods have _in_container suffix for -k filter compatibility
- Dotnet container: use mount_mode=MountMode.WRITE (not use_container=True)
- Layer build: use python3.12 (matching original test_build_single_layer)
- Rust: remove container tier1 (cargo-lambda doesn't support container builds)
- These match the exact parameter combinations that pass in the original tests
…emove temp docs

- Revert ARM64 tier1 names (no _in_container since use_container=False)
- Enable QEMU for build-arm64 job (needed for local invoke)
- Fix mypy import error in setup_testing_resources.py
- Remove Rust container tier1 (cargo-lambda doesn't support container builds)
- Fix dotnet container tier1 to use MountMode.WRITE
- Fix layer tier1 to use python3.12
- Remove CLOUD_VS_LOCAL_REPORT.md and TIER1_TESTS.md
@roger-zhangg roger-zhangg requested a review from a team as a code owner February 19, 2026 09:02
@roger-zhangg
Copy link
Copy Markdown
Member Author

@@ -854,6 +834,9 @@ def tearDownClass(cls):
],
)
@skipIf(SKIP_LAYERS_TESTS, "Skip layers tests in Appveyor only")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this remain true?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the skip ifs lets review after we are actually moved out

…al jobs

- Add tests/free_disk_space.sh: cleanup if <25GB free, reduce swap to 1GB, nohup rm
- Mark git function tests and WarmContainersRemoteLayers as requires_credential
- Skip Get testing resources for non-credential jobs, clear AWS creds after ECR login
- Add skipIf(SKIP_DOCKER_TESTS) to dotnet tier1 non-container test
@roger-zhangg
Copy link
Copy Markdown
Member Author

roger-zhangg commented Feb 19, 2026

vicheey
vicheey previously approved these changes Feb 19, 2026
Comment on lines +181 to +182
if container.id not in cls._pre_existing_container_ids:
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're now deleting all the containers that didn't exist when this class started. But in theory if there are things running in parallel, we could still be deleting a container created by a different class, as long as it was created after this class started, right?

So basically the change is that we're just "not deleting the containers that existed when this class started". I guess that makes a difference?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct. To make it more bulletproof probably we need to know what exactly are the containers created. But that seems not easy.

- name: Initialize project
run: |
export CONTAINER_RUNTIME=${{ matrix.container_runtime }}
if [[ "${{ matrix.test_suite }}" == "build-x86-1" || "${{ matrix.test_suite }}" == "build-x86-2" || "${{ matrix.test_suite }}" == "build-arm64" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too important, but we probably want to be consistent if we're doing contains(fromJSON( in all the other ifs here.

@roger-zhangg roger-zhangg added this pull request to the merge queue Feb 20, 2026
Merged via the queue into develop with commit 8b878af Feb 20, 2026
63 of 64 checks passed
roger-zhangg added a commit that referenced this pull request Feb 23, 2026
Re-add the samcli/lambda-* Docker image cleanup in tearDown for
TestLayerVersionBase and TestLayerVersionThatDoNotCreateCache.

This was removed in PR #8666 to avoid cross-test interference in
parallel runs, but these test classes are already serialized via
xdist_group markers. Without the cleanup, stale cached images
cause layer version tests to use outdated layers.
github-merge-queue bot pushed a commit that referenced this pull request Feb 23, 2026
* Restore layer test tearDown image cleanup removed in #8666

Re-add the samcli/lambda-* Docker image cleanup in tearDown for
TestLayerVersionBase and TestLayerVersionThatDoNotCreateCache.

This was removed in PR #8666 to avoid cross-test interference in
parallel runs, but these test classes are already serialized via
xdist_group markers. Without the cleanup, stale cached images
cause layer version tests to use outdated layers.

* limit chardet < 6 in cargo lambda

* dep

* reuse cleanup_samcli_images

* nit
@roger-zhangg roger-zhangg deleted the test-tiering branch February 27, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants