Parallel apply of multiple component instances sharing the same Terraform component is not possible

### Describe the Feature

Atmos cannot apply multiple component instances that share the same Terraform component (`metadata.component`) in parallel. All instances write to the same component source directory, causing lock contention, checksum races, and corrupted provider binaries. The existing `provision.workdir.enabled` feature does not solve this — it isolates by `<stack>-<component>`, so all instances of the same component within the same stack still share one workdir.


### Expected Behavior

Running multiple `atmos terraform apply` commands in parallel for component instances that share the same base component should work without file conflicts. Each instance already has its own Terraform workspace and separate remote state — the only barrier is local filesystem contention that atmos should manage internally.

### Use Case

We have 12 ElastiCache clusters, all referencing `metadata.component: elasticache`, deployed to the same stack. Each has its own Terraform workspace and separate S3 state file. Applying them sequentially is slow. They are completely independent resources with no dependencies between them — there is no reason they can't run concurrently.

This pattern is common: many instances of the same component type (N Redis clusters, N IAM roles, N S3 buckets) in a single stack, all sharing one Terraform module.

### Describe Ideal Solution

**Option A:** The workdir path should incorporate the **full component instance path**, not just the base `metadata.component` name. The workdirs should be:

```
.workdir/terraform/<stack>-elasticache-redis-cluster-1
.workdir/terraform/<stack>-elasticache-redis-cluster-2
.workdir/terraform/<stack>-elasticache-redis-cluster-3
```

Instead of all mapping to:

```
.workdir/terraform/<stack>-elasticache
```

**Option B:** A built-in parallel apply mechanism:

```bash
atmos terraform apply --parallel \
  components/elasticache/redis-cluster-1 \
  components/elasticache/redis-cluster-2 \
  -s my-stack
```


### Alternatives Considered

_No response_

### Additional Context

## Investigation details

### Root cause analysis

When atmos runs `terraform apply` for a component, it writes several files to the component source directory:

1. **`.terraform/`** — provider binaries, module cache, local state lock (`terraform.tfstate`)
2. **`.terraform.lock.hcl`** — provider dependency checksums
3. **`backend.tf.json`** — generated backend configuration
4. **`providers_override.tf.json`** — generated provider overrides
5. **`*.terraform.tfvars.json`** — generated variable files
6. **`*.planfile`** — plan output files

When 12 processes write to the same directory simultaneously, we observed three distinct failure modes.

### Test 1: Naive parallel apply (no isolation)

```bash
for component in "${COMPONENTS[@]}"; do
  atmos terraform apply "$component" -s "$STACK" &
done
wait
```

**Result:** Most processes fail. `.terraform` lock file contention, provider checksum mismatches on `.terraform.lock.hcl`, and corrupted generated files from concurrent writes.

### Test 2: `TF_DATA_DIR` isolation

`TF_DATA_DIR` is an official Terraform env var that redirects the `.terraform` directory to a custom path. We gave each parallel process its own:

```bash
for component in "${COMPONENTS[@]}"; do
  TF_DATA_DIR="/tmp/work/tf-data/$(basename "$component")" \
    atmos terraform apply "$component" -s "$STACK" &
done
```

**Result: 7/12 succeeded, 5/12 failed.** `TF_DATA_DIR` isolates the `.terraform` directory, but `.terraform.lock.hcl` lives in the component source directory, NOT inside `.terraform`. So all 12 processes still race on writing that file.

#### Failure mode A: provider checksum mismatch (4 failures)

```
Error: Required plugins are not installed

the cached package for registry.terraform.io/hashicorp/aws 6.31.0
does not match any of the checksums recorded in the dependency lock file
```

Process A writes checksums to `.terraform.lock.hcl`, process B overwrites them, then process A's cached provider no longer matches. Classic TOCTOU race.

#### Failure mode B: corrupt provider binary (1 failure)

```
Error: Failed to load plugin schemas
Could not load the schema for provider registry.terraform.io/hashicorp/aws:
failed to instantiate provider
Unrecognized remote plugin message: Failed to read any lines from plugin's stdout
```

Multiple processes downloaded the AWS provider to `TF_PLUGIN_CACHE_DIR` simultaneously. One process read a partially-written binary. The architecture check passed (darwin arm64 matches arm64) — the binary was simply incomplete.

### Test 3: `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + pre-init (working workaround)

```bash
export TF_PLUGIN_CACHE_DIR="/tmp/work/plugin-cache"

# Single init to populate .terraform.lock.hcl and provider cache BEFORE parallel runs
TF_DATA_DIR="/tmp/work/tf-data/first" \
  atmos terraform init "${COMPONENTS[0]}" -s "$STACK"

# Now parallel applies — lock file and cache are already warm
for component in "${COMPONENTS[@]}"; do
  TF_DATA_DIR="/tmp/work/tf-data/$(basename "$component")" \
    atmos terraform apply "$component" -s "$STACK" &
done
```

**Result: 12/12 succeeded.** The pre-init populates `.terraform.lock.hcl` and the plugin cache before any parallel process runs. Subsequent inits read the lock file and symlink from the cache — no concurrent writes.

This works but is a hack. It requires the caller to understand Terraform internals (`TF_DATA_DIR`, `TF_PLUGIN_CACHE_DIR`) and manage temp directories, cleanup, and process lifecycle outside of atmos.

### Test 4: `provision.workdir.enabled: true` (atmos native feature — DOES NOT WORK for this case)

After discovering the [Component Workdir Isolation](https://atmos.tools/changelog/component-workdir-isolation#concurrent-workflows) feature, we enabled it on all 12 components:

```yaml
components/elasticache/redis-cluster-1:
  metadata:
    component: elasticache
  provision:
    workdir:
      enabled: true
```

Then ran a simple parallel apply with no `TF_DATA_DIR` workarounds:

```bash
for component in "${COMPONENTS[@]}"; do
  atmos terraform apply "$component" -s "$STACK" &
done
```

**Result: 1/12 succeeded, 11/12 failed.** All 12 components resolved to the **exact same workdir**:

```
.workdir/terraform/<stack>-elasticache
```

The workdir path is derived from `<stack>-<component>`, where `<component>` is the `metadata.component` value (`elasticache`). Since all 12 instances share the same stack and the same base component, they all map to one directory.

The 11 failures all hit the same local state lock:

```
Error: Error locking state: Error acquiring the state lock
Error message: resource temporarily unavailable
Lock Info:
  ID:        2c072bde-8527-00a3-49bb-7940faa90d7f
  Path:      .terraform/terraform.tfstate
  Operation: backend from plan
  Version:   1.14.3
```

The workdir feature solves a different problem: **same component across different stacks** (e.g. `dev-vpc` vs `prod-vpc` → different workdirs). It does **not** solve **multiple instances of the same component within the same stack**, because the workdir path doesn't incorporate the component instance path — only the base component name.

## Current workaround

Our working solution is a bash script that combines `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + a serial pre-init step. It works but shouldn't be necessary — atmos should handle this natively.

I'll drop here our shell script to apply it in parallel because I think someone else could take benefits on it:

```sh
#!/bin/bash
#
# Apply multiple atmos components in parallel.
#
# Usage:
#   ./scripts/multiple-applies.sh <stack> <component> [component ...]
#
# Example:
#   ./scripts/multiple-applies.sh my-stack-dev \
#     infrastructure/dev/us-west-2/elasticache/redis-cluster-1 \
#     infrastructure/dev/us-west-2/elasticache/redis-cluster-2

set -e

STACK="$1"; shift 2>/dev/null || true
COMPONENTS=("$@")

if [ -z "$STACK" ] || [ ${#COMPONENTS[@]} -eq 0 ]; then
  echo "Usage: $0 <stack> <component> [component ...]"
  exit 1
fi

WORK_DIR="/tmp/multiple-applies-$$"
LOG_DIR="${WORK_DIR}/logs"
mkdir -p "$LOG_DIR"

export TF_PLUGIN_CACHE_DIR="${WORK_DIR}/plugin-cache"
mkdir -p "$TF_PLUGIN_CACHE_DIR"

PIDS=()
NAMES=()

cleanup() {
  echo ""
  echo "Interrupted — killing background jobs..."
  for pid in "${PIDS[@]}"; do kill "$pid" 2>/dev/null; done
  wait
  exit 1
}
trap cleanup INT TERM

echo "Applying ${#COMPONENTS[@]} components to ${STACK} in parallel..."

# Single init to warm up plugin cache and .terraform.lock.hcl before parallel runs
echo "Initializing providers..."
TF_DATA_DIR="${WORK_DIR}/tf-data/$(basename "${COMPONENTS[0]}")" \
  atmos terraform init "${COMPONENTS[0]}" -s "$STACK"

for component in "${COMPONENTS[@]}"; do
  name=$(basename "$component")
  echo "Starting: ${name}"
  TF_DATA_DIR="${WORK_DIR}/tf-data/${name}" \
    atmos terraform apply "$component" -s "$STACK" \
    > "$LOG_DIR/${name}.log" 2>&1 &
  PIDS+=($!)
  NAMES+=("$name")
done

TOTAL=${#COMPONENTS[@]}
echo ""
echo "All ${TOTAL} applies launched. To follow a specific component:"
echo "  tail -f ${LOG_DIR}/<name>.log"
echo ""

FAILED=0
for i in "${!PIDS[@]}"; do
  if wait "${PIDS[$i]}"; then
    echo "[SUCCESS] ${NAMES[$i]} ($((i + 1))/${TOTAL})"
  else
    echo "[FAILED]  ${NAMES[$i]} ($((i + 1))/${TOTAL}) — see ${LOG_DIR}/${NAMES[$i]}.log"
    FAILED=$((FAILED + 1))
  fi
done

echo ""
echo "=========================================="
if [ "$FAILED" -eq 0 ]; then
  echo "All ${TOTAL} components applied successfully!"
else
  echo "${FAILED}/${TOTAL} components failed."
fi
echo "Logs: ${LOG_DIR}"
echo "Cleanup: rm -rf ${WORK_DIR}"
echo "=========================================="

exit "$FAILED"
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel apply of multiple component instances sharing the same Terraform component is not possible #2091

Describe the Feature

Expected Behavior

Use Case

Describe Ideal Solution

Alternatives Considered

Additional Context

Investigation details

Root cause analysis

Test 1: Naive parallel apply (no isolation)

Test 2: `TF_DATA_DIR` isolation

Failure mode A: provider checksum mismatch (4 failures)

Failure mode B: corrupt provider binary (1 failure)

Test 3: `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + pre-init (working workaround)

Test 4: `provision.workdir.enabled: true` (atmos native feature — DOES NOT WORK for this case)

Current workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Parallel apply of multiple component instances sharing the same Terraform component is not possible #2091

Description

Describe the Feature

Expected Behavior

Use Case

Describe Ideal Solution

Alternatives Considered

Additional Context

Investigation details

Root cause analysis

Test 1: Naive parallel apply (no isolation)

Test 2: TF_DATA_DIR isolation

Failure mode A: provider checksum mismatch (4 failures)

Failure mode B: corrupt provider binary (1 failure)

Test 3: TF_DATA_DIR + TF_PLUGIN_CACHE_DIR + pre-init (working workaround)

Test 4: provision.workdir.enabled: true (atmos native feature — DOES NOT WORK for this case)

Current workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Test 2: `TF_DATA_DIR` isolation

Test 3: `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + pre-init (working workaround)

Test 4: `provision.workdir.enabled: true` (atmos native feature — DOES NOT WORK for this case)