Skip to content

Refactor integration handling to support multi-instance configurations#527

Closed
Devesh36 wants to merge 2 commits into
Tracer-Cloud:mainfrom
Devesh36:issue/475
Closed

Refactor integration handling to support multi-instance configurations#527
Devesh36 wants to merge 2 commits into
Tracer-Cloud:mainfrom
Devesh36:issue/475

Conversation

@Devesh36

Copy link
Copy Markdown
Collaborator

Issue #475 — Multi-Account/Region/Cluster Support Implementation
Problem
Real deployments have multiple clusters, regions, teams, and accounts for the same provider. The current integration model only supported one backend per provider.
Implemented Solution

  1. Data Model (app/integrations/models.py)
  • Added IntegrationInstance model with fields:
    • name — user-friendly identifier (e.g., "prod", "staging")
    • tags — flexible metadata list (e.g., "prod", "us-east-1")
    • region — AWS region
    • account_id — AWS account ID
    • credentials — integration credentials
  • Added EffectiveMultiInstanceIntegration container
  1. Store Updates (app/integrations/store.py)
  • Migrated from v1 to v2 schema with instances array
  • Added auto-migration (_migrate_to_v2()) for backward compatibility
  • New helper functions:
    • get_integration(service, name=None, tags=None) — filtered lookup
    • get_integrations(service, name=None, tags=None) — returns all matches
    • upsert_integration(service, entry, instance_name, instance_tags) — adds instances
    • remove_integration(service, instance_name=None) — removes instance or all
  1. Environment Variables (app/nodes/resolve_integrations/node.py)
  • Added *_INSTANCES JSON format support for multi-instance loading
  • Backward compatible with existing single-instance env vars
    Usage:

Multi-instance JSON

GRAFANA_INSTANCES='[{"name":"prod","url":"https://prod.grafana.net","api_key":"...","tags":["prod"]},{"name":"staging","url":"https://staging.grafana.net","api_key":"...","tags":["staging"]}]'

Single-instance (existing - still works)

GRAFANA_INSTANCE_URL=...
GRAFANA_READ_TOKEN=...
4. Integration Resolution (_classify_integrations)

  • Added _get_instances_from_integration() helper for v1→v2 conversion
  • All services now return instance_name and tags in resolved config
  • Supported services: grafana, datadog, honeycomb, coralogix, aws, github, sentry, gitlab, mongodb, postgresql, vercel, opsgenie, mongodb_atlas
  1. Code Usage Example

First instance (backward compatible)

grafana = get_integration("grafana")

Filter by name

grafana_prod = get_integration("grafana", name="prod")

Filter by tags (matches if any tag overlaps)

grafana_prod = get_integration("grafana", tags=["prod"])
Done When (from issue)

  1. ✅ Two instances of the same provider can be loaded together
  2. ✅ Instances can be selected by name or tags
  3. ✅ Existing single instance configs still work (auto-migration)
    Tests
  • Updated existing tests to use new instances[0]["credentials"] format
  • All 147 integration tests pass
    Files Changed
  • app/integrations/models.py — Added IntegrationInstance model
  • app/integrations/store.py — Multi-instance support + migration
  • app/nodes/resolve_integrations/node.py — JSON parsing + classification
  • tests/nodes/resolve_integrations/test_opsgenie.py — Test fixes
  • tests/nodes/resolve_integrations/test_vercel.py — Test fixes

/fix #475

- Introduced helper functions to extract credentials and instances from integrations, accommodating both legacy and new formats.
- Updated the `_classify_integrations` function to process multiple instances for various services, including Grafana, AWS, Datadog, Honeycomb, Coralogix, GitHub, Sentry, MongoDB, PostgreSQL, MongoDB Atlas, Vercel, and OpsGenie.
- Modified the `_load_env_integrations` function to support multi-instance environment variable formats, allowing for backward compatibility with single-instance formats.
- Adjusted tests for OpsGenie and Vercel to reflect changes in the data structure, ensuring proper access to instance credentials.
@greptile-apps

greptile-apps Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors integration handling to support multiple named, tagged instances per provider (e.g., separate prod/staging Grafana clusters), migrating the store from v1 to a v2 schema with an instances array and adding backward-compatible env-var loading. The store, model, and most of the classification logic look solid, but there is one P1 defect in the AWS resolution path.

  • P1 — AWS role-based auth silently broken for all env-loaded and new v2 store entries: _classify_integrations reads role_arn/external_id from the top-level integration dict (integration.get(\"role_arn\", \"\")), but _add_aws() now stores them inside instance.credentials. This causes AWSIntegrationConfig._require_auth_method to raise, which is silently swallowed, leaving AWS unresolved. Only migrated v1 store data (which retains the top-level key) is unaffected.
  • P2 — _add_coralogix contains a redundant nested loop that double-validates every instance; functionally correct today but fragile.

Confidence Score: 4/5

Not safe to merge until the AWS role_arn resolution bug is fixed; all other findings are style/quality.

One P1 defect: role-based AWS auth is silently broken for any user loading credentials via AWS_ROLE_ARN env var or new v2 store entries, because the classification step reads role_arn from the wrong dict level. The fix is a one-liner. All previously flagged issues (merge-by-service data loss, remove_integration persistence, last-instance-wins) appear to be addressed.

app/nodes/resolve_integrations/node.py — the AWS block in _classify_integrations (lines 155-182) and the _add_coralogix function (lines 678-725).

Important Files Changed

Filename Overview
app/nodes/resolve_integrations/node.py Core classification and env-loading logic updated for multi-instance support; contains a P1 bug (AWS role_arn read from integration level instead of credentials), a duplicated nested loop in _add_coralogix, and a missing break in the generic else branch.
app/integrations/store.py Multi-instance store with v1→v2 migration, filtered get_integration/get_integrations, upsert_integration, and fixed remove_integration (previously flagged persistence bug is now resolved).
app/integrations/models.py Adds IntegrationInstance and EffectiveMultiInstanceIntegration models; clean, well-validated additions.
tests/nodes/resolve_integrations/test_opsgenie.py Tests updated to assert on instances[0].credentials for env-loading; store helper still uses v1 format (handled transparently by _get_instances_from_integration).
tests/nodes/resolve_integrations/test_vercel.py Tests updated similarly to opsgenie; env-loading assertions use new instances[0].credentials format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[node_resolve_integrations] --> B{Auth token?}
    B -- webhook/JWT --> C[Remote API: get_all_integrations]
    B -- no token --> D[_resolve_from_local_sources]

    D --> E[load_integrations from store]
    D --> F[_load_env_integrations]

    E --> G[_migrate_to_v2: v1 → instances array]
    G --> H[v2 store records]

    F --> I{*_INSTANCES JSON env var present?}
    I -- yes --> J[Parse JSON → single record with instances array]
    I -- no --> K[Single-instance env vars → default instance]
    J --> L[env records]
    K --> L

    H --> M[_merge_integrations_by_service: store overrides env]
    L --> M

    C --> N[_classify_integrations]
    M --> N

    N --> O{service key}
    O -- grafana/datadog/etc --> P[Loop instances, break on first valid]
    O -- aws --> Q[Loop instances, no break, reads role_arn from integration level]
    O -- unknown --> R[Loop instances, no break]

    P --> S[resolved dict]
    Q --> S
    R --> S
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 155-182

Comment:
**AWS `role_arn`/`external_id` read from wrong dict level**

`raw_config` pulls `role_arn` and `external_id` from the top-level integration record (`integration.get("role_arn", "")`), but the new multi-instance format stores them inside `instance.credentials` (set by `_add_aws()`). For any env-loaded or v2-store AWS integration, these keys are absent at the integration level, so both fields resolve to `""`. `AWSIntegrationConfig._require_auth_method` then raises a `ValueError`, which is silently swallowed by `except Exception: continue`, and the AWS integration is never resolved. Only migrated v1 store entries — which happen to preserve the top-level `role_arn` key — still work. Static-key auth is unaffected because `access_key_id`/`secret_access_key` are correctly read from `credentials`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 678-725

Comment:
**Duplicate nested loop in `_add_coralogix`**

The outer `for inst in instances_json:` loop (line 682) exists solely to find the first valid instance. When it succeeds, it resets `instances_list = []` and immediately runs an inner loop (line 696) with identical filtering and validation logic — shadowing the outer `inst` variable — then returns. The outer loop is redundant; it exists because the inner loop was accidentally left nested inside it rather than replacing it. The function works by accident: the inner loop correctly processes all instances, and the `return` prevents the outer loop from running a second time. This makes the code fragile and adds unnecessary double-validation. The outer loop body (lines 682–694) should be removed.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 438-445

Comment:
**Generic `else` branch overwrites without `break`**

The fallback `else` block iterates all instances but unconditionally overwrites `resolved[key]` on each iteration, so only the last instance wins. Every named service handler in this function now uses `break` after setting `resolved[key]`; the generic branch is the only exception. Adding a `break` here would make the behaviour consistent.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "feat: enhance integration removal and lo..." | Re-trigger Greptile

Comment thread app/integrations/store.py
Comment on lines 233 to 255
data = _load_raw()
before = len(data.get("integrations", []))
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]

if instance_name is None:
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]
else:
for integration in data.get("integrations", []):
if integration.get("service") == service:
instances = integration.get("instances", [])
integration["instances"] = [i for i in instances if i.get("name") != instance_name]

data["integrations"] = [
i
for i in data.get("integrations", [])
if i.get("service") != service or i.get("instances")
]

removed = len(data["integrations"]) < before
if removed:
_save(data)
return removed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Instance removal never persisted when service still has other instances

When instance_name is provided, _save is only called if the top-level integration record count decreases (len(data["integrations"]) < before). But if a service still has other instances after the removal, its record stays in the list and the count is unchanged — so removed = False and _save is never called. The instance is deleted in-memory but the file on disk is never updated.

Concretely: a grafana record with instances ["prod", "staging"] after remove_integration("grafana", "prod") still writes ["prod", "staging"] to disk, because before=1 and after=1.

Suggested change
data = _load_raw()
before = len(data.get("integrations", []))
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]
if instance_name is None:
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]
else:
for integration in data.get("integrations", []):
if integration.get("service") == service:
instances = integration.get("instances", [])
integration["instances"] = [i for i in instances if i.get("name") != instance_name]
data["integrations"] = [
i
for i in data.get("integrations", [])
if i.get("service") != service or i.get("instances")
]
removed = len(data["integrations"]) < before
if removed:
_save(data)
return removed
if instance_name is None:
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]
removed = len(data["integrations"]) < before
else:
removed = False
for integration in data.get("integrations", []):
if integration.get("service") == service:
instances = integration.get("instances", [])
filtered = [i for i in instances if i.get("name") != instance_name]
if len(filtered) < len(instances):
removed = True
integration["instances"] = filtered
data["integrations"] = [
i
for i in data.get("integrations", [])
if i.get("service") != service or i.get("instances")
]
if removed:
_save(data)
return removed
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/store.py
Line: 233-255

Comment:
**Instance removal never persisted when service still has other instances**

When `instance_name` is provided, `_save` is only called if the top-level integration record count decreases (`len(data["integrations"]) < before`). But if a service still has other instances after the removal, its record stays in the list and the count is unchanged — so `removed = False` and `_save` is never called. The instance is deleted in-memory but the file on disk is never updated.

Concretely: a grafana record with instances `["prod", "staging"]` after `remove_integration("grafana", "prod")` still writes `["prod", "staging"]` to disk, because `before=1` and `after=1`.

```suggestion
    if instance_name is None:
        data["integrations"] = [
            i for i in data.get("integrations", []) if i.get("service") != service
        ]
        removed = len(data["integrations"]) < before
    else:
        removed = False
        for integration in data.get("integrations", []):
            if integration.get("service") == service:
                instances = integration.get("instances", [])
                filtered = [i for i in instances if i.get("name") != instance_name]
                if len(filtered) < len(instances):
                    removed = True
                integration["instances"] = filtered

        data["integrations"] = [
            i
            for i in data.get("integrations", [])
            if i.get("service") != service or i.get("instances")
        ]

    if removed:
        _save(data)
    return removed
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +488 to +516
def _add_grafana():
instances_json = _parse_instances_json("GRAFANA_INSTANCES")
if instances_json:
for inst in instances_json:
if not inst.get("url") and not inst.get("endpoint"):
continue
config = {
"endpoint": inst.get("url") or inst.get("endpoint", ""),
"api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(),
}
try:
GrafanaIntegrationConfig.model_validate(config)
except Exception:
continue
integrations.append(
{
"id": f"env-grafana-{inst.get('name', 'default')}",
"service": "grafana",
"status": "active",
"instances": [
{
"name": inst.get("name", "default"),
"tags": inst.get("tags", []),
"credentials": config,
}
],
}
)
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Multi-instance JSON env vars lose all but the last instance after merging

Each _add_grafana() call (and all equivalent _add_* functions) appends a separate integration record per instance to integrations, all with service: "grafana". Downstream, _merge_integrations_by_service iterates the list and writes merged_by_service[service] = integration, so every record overwrites the previous one for the same service key — only the last appended instance survives into the pipeline.

Setting GRAFANA_INSTANCES='[{"name":"prod",...},{"name":"staging",...}]' silently discards prod entirely. This directly contradicts the stated goal of multi-instance support from environment variables.

The fix is to consolidate all parsed instances into a single integration record with an instances array, mirroring the store format:

if instances_json:
    instances_list = []
    for inst in instances_json:
        if not inst.get("url") and not inst.get("endpoint"):
            continue
        config = {
            "endpoint": inst.get("url") or inst.get("endpoint", ""),
            "api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(),
        }
        try:
            GrafanaIntegrationConfig.model_validate(config)
        except Exception:
            continue
        instances_list.append({
            "name": inst.get("name", "default"),
            "tags": inst.get("tags", []),
            "credentials": config,
        })
    if instances_list:
        integrations.append({
            "id": "env-grafana-multi",
            "service": "grafana",
            "status": "active",
            "instances": instances_list,
        })
    return

The same pattern applies to _add_datadog, _add_honeycomb, _add_coralogix, _add_aws, _add_github, and _add_sentry.

Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 488-516

Comment:
**Multi-instance JSON env vars lose all but the last instance after merging**

Each `_add_grafana()` call (and all equivalent `_add_*` functions) appends a **separate integration record per instance** to `integrations`, all with `service: "grafana"`. Downstream, `_merge_integrations_by_service` iterates the list and writes `merged_by_service[service] = integration`, so every record overwrites the previous one for the same service key — only the last appended instance survives into the pipeline.

Setting `GRAFANA_INSTANCES='[{"name":"prod",...},{"name":"staging",...}]'` silently discards `prod` entirely. This directly contradicts the stated goal of multi-instance support from environment variables.

The fix is to consolidate all parsed instances into a **single** integration record with an `instances` array, mirroring the store format:

```python
if instances_json:
    instances_list = []
    for inst in instances_json:
        if not inst.get("url") and not inst.get("endpoint"):
            continue
        config = {
            "endpoint": inst.get("url") or inst.get("endpoint", ""),
            "api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(),
        }
        try:
            GrafanaIntegrationConfig.model_validate(config)
        except Exception:
            continue
        instances_list.append({
            "name": inst.get("name", "default"),
            "tags": inst.get("tags", []),
            "credentials": config,
        })
    if instances_list:
        integrations.append({
            "id": "env-grafana-multi",
            "service": "grafana",
            "status": "active",
            "instances": instances_list,
        })
    return
```

The same pattern applies to `_add_datadog`, `_add_honeycomb`, `_add_coralogix`, `_add_aws`, `_add_github`, and `_add_sentry`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 123 to 153
if key in ("grafana", "grafana_local"):
try:
grafana_config = GrafanaIntegrationConfig.model_validate(
{
"endpoint": credentials.get("endpoint", ""),
"api_key": credentials.get("api_key", ""),
"integration_id": integration.get("id", ""),
for instance in instances:
credentials = instance.get("credentials", {})
try:
grafana_config = GrafanaIntegrationConfig.model_validate(
{
"endpoint": credentials.get("endpoint", ""),
"api_key": credentials.get("api_key", ""),
"integration_id": integration.get("id", ""),
}
)
except Exception:
continue
if not grafana_config.endpoint:
continue
instance_name = instance.get("name", "default")
if grafana_config.is_local:
resolved["grafana_local"] = {
"endpoint": grafana_config.endpoint,
"api_key": "",
"integration_id": grafana_config.integration_id,
"instance_name": instance_name,
"tags": instance.get("tags", []),
}
elif grafana_config.api_key and grafana_config.api_key != "local":
resolved["grafana"] = {
**grafana_config.model_dump(),
"instance_name": instance_name,
"tags": instance.get("tags", []),
}
)
except Exception:
continue
if not grafana_config.endpoint:
continue
if grafana_config.is_local:
# Always treat localhost Grafana as grafana_local (Loki only, anonymous auth)
resolved["grafana_local"] = {
"endpoint": grafana_config.endpoint,
"api_key": "",
"integration_id": grafana_config.integration_id,
}
elif grafana_config.api_key and grafana_config.api_key != "local":
resolved["grafana"] = grafana_config.model_dump()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Last instance silently wins in _classify_integrations for all services

Every service block loops for instance in instances: but unconditionally overwrites resolved[key] on each iteration. When an integration record genuinely has multiple instances (which the store now supports), only the final valid instance is surfaced to the pipeline — all earlier ones are discarded without any indication.

For example, a grafana record with instances ["prod", "staging"] will always resolve to staging's credentials. The same pattern repeats for datadog, honeycomb, coralogix, sentry, gitlab, mongodb, postgresql, vercel, and opsgenie.

If the design intent is "pick the first valid instance," add a break after setting resolved.

Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 123-153

Comment:
**Last instance silently wins in `_classify_integrations` for all services**

Every service block loops `for instance in instances:` but unconditionally overwrites `resolved[key]` on each iteration. When an integration record genuinely has multiple instances (which the store now supports), only the final valid instance is surfaced to the pipeline — all earlier ones are discarded without any indication.

For example, a grafana record with instances `["prod", "staging"]` will always resolve to staging's credentials. The same pattern repeats for datadog, honeycomb, coralogix, sentry, gitlab, mongodb, postgresql, vercel, and opsgenie.

If the design intent is "pick the first valid instance," add a `break` after setting `resolved`.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread app/integrations/store.py
Comment on lines +90 to 127
def get_integration(
service: str,
name: str | None = None,
tags: list[str] | None = None,
) -> dict[str, Any] | None:
"""Return the first active integration for a service matching criteria, or None.

Args:
service: The service name (e.g., "grafana", "datadog")
name: Filter by instance name (optional)
tags: Filter by instance tags (optional, matches if any tag overlaps)

Returns:
Integration dict with instances, or None if not found.
"""
for i in load_integrations():
if i.get("service") == service and i.get("status") == "active":
if i.get("service") != service or i.get("status") != "active":
continue

instances = i.get("instances", [])
if not instances:
continue

if name is None and tags is None:
return i

for instance in instances:
instance_name = instance.get("name", "default")
instance_tags = instance.get("tags", [])

if name is not None and instance_name != name:
continue
if tags is not None and not set(tags) & set(instance_tags):
continue

return i

return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 get_integration returns full record (all instances) when filtering by name or tag

When a caller passes name="prod", the function finds an integration that contains a matching instance, but returns the entire integration record — including instances that don't match (e.g., staging). Callers expecting only the prod instance data will inadvertently have access to all instances.

Consider returning only the matched instance inside the record, or document explicitly that the full record is returned and callers must filter instances themselves.

Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/store.py
Line: 90-127

Comment:
**`get_integration` returns full record (all instances) when filtering by name or tag**

When a caller passes `name="prod"`, the function finds an integration that contains a matching instance, but returns the **entire integration record** — including instances that don't match (e.g., `staging`). Callers expecting only the `prod` instance data will inadvertently have access to all instances.

Consider returning only the matched instance inside the record, or document explicitly that the full record is returned and callers must filter `instances` themselves.

How can I resolve this? If you propose a fix, please make it concise.

except Exception:
continue
instances_list = []
for inst in instances_json:
@Devesh36 Devesh36 closed this Apr 11, 2026
Comment on lines 155 to 182
elif key == "aws":
if "aws" in resolved:
continue
raw_config: dict[str, Any] = {
"region": credentials.get("region", "us-east-1"),
"role_arn": integration.get("role_arn", ""),
"external_id": integration.get("external_id", ""),
"integration_id": integration.get("id", ""),
}
if credentials.get("access_key_id") and credentials.get("secret_access_key"):
raw_config["credentials"] = {
"access_key_id": credentials.get("access_key_id", ""),
"secret_access_key": credentials.get("secret_access_key", ""),
"session_token": credentials.get("session_token", ""),
for instance in instances:
credentials = instance.get("credentials", {})
raw_config: dict[str, Any] = {
"region": credentials.get("region", "us-east-1"),
"role_arn": integration.get("role_arn", ""),
"external_id": integration.get("external_id", ""),
"integration_id": integration.get("id", ""),
}
try:
resolved["aws"] = AWSIntegrationConfig.model_validate(raw_config).model_dump(
exclude_none=True
)
except Exception:
continue
if credentials.get("access_key_id") and credentials.get("secret_access_key"):
raw_config["credentials"] = {
"access_key_id": credentials.get("access_key_id", ""),
"secret_access_key": credentials.get("secret_access_key", ""),
"session_token": credentials.get("session_token", ""),
}
try:
resolved["aws"] = {
**AWSIntegrationConfig.model_validate(raw_config).model_dump(
exclude_none=True
),
"instance_name": instance.get("name", "default"),
"tags": instance.get("tags", []),
}
except Exception:
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 AWS role_arn/external_id read from wrong dict level

raw_config pulls role_arn and external_id from the top-level integration record (integration.get("role_arn", "")), but the new multi-instance format stores them inside instance.credentials (set by _add_aws()). For any env-loaded or v2-store AWS integration, these keys are absent at the integration level, so both fields resolve to "". AWSIntegrationConfig._require_auth_method then raises a ValueError, which is silently swallowed by except Exception: continue, and the AWS integration is never resolved. Only migrated v1 store entries — which happen to preserve the top-level role_arn key — still work. Static-key auth is unaffected because access_key_id/secret_access_key are correctly read from credentials.

Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 155-182

Comment:
**AWS `role_arn`/`external_id` read from wrong dict level**

`raw_config` pulls `role_arn` and `external_id` from the top-level integration record (`integration.get("role_arn", "")`), but the new multi-instance format stores them inside `instance.credentials` (set by `_add_aws()`). For any env-loaded or v2-store AWS integration, these keys are absent at the integration level, so both fields resolve to `""`. `AWSIntegrationConfig._require_auth_method` then raises a `ValueError`, which is silently swallowed by `except Exception: continue`, and the AWS integration is never resolved. Only migrated v1 store entries — which happen to preserve the top-level `role_arn` key — still work. Static-key auth is unaffected because `access_key_id`/`secret_access_key` are correctly read from `credentials`.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P0 Support multiple accounts, regions, and clusters in one deployment

2 participants