Refactor integration handling to support multi-instance configurations#527
Refactor integration handling to support multi-instance configurations#527Devesh36 wants to merge 2 commits into
Conversation
- Introduced helper functions to extract credentials and instances from integrations, accommodating both legacy and new formats. - Updated the `_classify_integrations` function to process multiple instances for various services, including Grafana, AWS, Datadog, Honeycomb, Coralogix, GitHub, Sentry, MongoDB, PostgreSQL, MongoDB Atlas, Vercel, and OpsGenie. - Modified the `_load_env_integrations` function to support multi-instance environment variable formats, allowing for backward compatibility with single-instance formats. - Adjusted tests for OpsGenie and Vercel to reflect changes in the data structure, ensuring proper access to instance credentials.
Greptile SummaryThis PR refactors integration handling to support multiple named, tagged instances per provider (e.g., separate
Confidence Score: 4/5Not safe to merge until the AWS role_arn resolution bug is fixed; all other findings are style/quality. One P1 defect: role-based AWS auth is silently broken for any user loading credentials via AWS_ROLE_ARN env var or new v2 store entries, because the classification step reads role_arn from the wrong dict level. The fix is a one-liner. All previously flagged issues (merge-by-service data loss, remove_integration persistence, last-instance-wins) appear to be addressed. app/nodes/resolve_integrations/node.py — the AWS block in _classify_integrations (lines 155-182) and the _add_coralogix function (lines 678-725). Important Files Changed
|
| data = _load_raw() | ||
| before = len(data.get("integrations", [])) | ||
| data["integrations"] = [ | ||
| i for i in data.get("integrations", []) if i.get("service") != service | ||
| ] | ||
|
|
||
| if instance_name is None: | ||
| data["integrations"] = [ | ||
| i for i in data.get("integrations", []) if i.get("service") != service | ||
| ] | ||
| else: | ||
| for integration in data.get("integrations", []): | ||
| if integration.get("service") == service: | ||
| instances = integration.get("instances", []) | ||
| integration["instances"] = [i for i in instances if i.get("name") != instance_name] | ||
|
|
||
| data["integrations"] = [ | ||
| i | ||
| for i in data.get("integrations", []) | ||
| if i.get("service") != service or i.get("instances") | ||
| ] | ||
|
|
||
| removed = len(data["integrations"]) < before | ||
| if removed: | ||
| _save(data) | ||
| return removed |
There was a problem hiding this comment.
Instance removal never persisted when service still has other instances
When instance_name is provided, _save is only called if the top-level integration record count decreases (len(data["integrations"]) < before). But if a service still has other instances after the removal, its record stays in the list and the count is unchanged — so removed = False and _save is never called. The instance is deleted in-memory but the file on disk is never updated.
Concretely: a grafana record with instances ["prod", "staging"] after remove_integration("grafana", "prod") still writes ["prod", "staging"] to disk, because before=1 and after=1.
| data = _load_raw() | |
| before = len(data.get("integrations", [])) | |
| data["integrations"] = [ | |
| i for i in data.get("integrations", []) if i.get("service") != service | |
| ] | |
| if instance_name is None: | |
| data["integrations"] = [ | |
| i for i in data.get("integrations", []) if i.get("service") != service | |
| ] | |
| else: | |
| for integration in data.get("integrations", []): | |
| if integration.get("service") == service: | |
| instances = integration.get("instances", []) | |
| integration["instances"] = [i for i in instances if i.get("name") != instance_name] | |
| data["integrations"] = [ | |
| i | |
| for i in data.get("integrations", []) | |
| if i.get("service") != service or i.get("instances") | |
| ] | |
| removed = len(data["integrations"]) < before | |
| if removed: | |
| _save(data) | |
| return removed | |
| if instance_name is None: | |
| data["integrations"] = [ | |
| i for i in data.get("integrations", []) if i.get("service") != service | |
| ] | |
| removed = len(data["integrations"]) < before | |
| else: | |
| removed = False | |
| for integration in data.get("integrations", []): | |
| if integration.get("service") == service: | |
| instances = integration.get("instances", []) | |
| filtered = [i for i in instances if i.get("name") != instance_name] | |
| if len(filtered) < len(instances): | |
| removed = True | |
| integration["instances"] = filtered | |
| data["integrations"] = [ | |
| i | |
| for i in data.get("integrations", []) | |
| if i.get("service") != service or i.get("instances") | |
| ] | |
| if removed: | |
| _save(data) | |
| return removed |
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/store.py
Line: 233-255
Comment:
**Instance removal never persisted when service still has other instances**
When `instance_name` is provided, `_save` is only called if the top-level integration record count decreases (`len(data["integrations"]) < before`). But if a service still has other instances after the removal, its record stays in the list and the count is unchanged — so `removed = False` and `_save` is never called. The instance is deleted in-memory but the file on disk is never updated.
Concretely: a grafana record with instances `["prod", "staging"]` after `remove_integration("grafana", "prod")` still writes `["prod", "staging"]` to disk, because `before=1` and `after=1`.
```suggestion
if instance_name is None:
data["integrations"] = [
i for i in data.get("integrations", []) if i.get("service") != service
]
removed = len(data["integrations"]) < before
else:
removed = False
for integration in data.get("integrations", []):
if integration.get("service") == service:
instances = integration.get("instances", [])
filtered = [i for i in instances if i.get("name") != instance_name]
if len(filtered) < len(instances):
removed = True
integration["instances"] = filtered
data["integrations"] = [
i
for i in data.get("integrations", [])
if i.get("service") != service or i.get("instances")
]
if removed:
_save(data)
return removed
```
How can I resolve this? If you propose a fix, please make it concise.| def _add_grafana(): | ||
| instances_json = _parse_instances_json("GRAFANA_INSTANCES") | ||
| if instances_json: | ||
| for inst in instances_json: | ||
| if not inst.get("url") and not inst.get("endpoint"): | ||
| continue | ||
| config = { | ||
| "endpoint": inst.get("url") or inst.get("endpoint", ""), | ||
| "api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(), | ||
| } | ||
| try: | ||
| GrafanaIntegrationConfig.model_validate(config) | ||
| except Exception: | ||
| continue | ||
| integrations.append( | ||
| { | ||
| "id": f"env-grafana-{inst.get('name', 'default')}", | ||
| "service": "grafana", | ||
| "status": "active", | ||
| "instances": [ | ||
| { | ||
| "name": inst.get("name", "default"), | ||
| "tags": inst.get("tags", []), | ||
| "credentials": config, | ||
| } | ||
| ], | ||
| } | ||
| ) | ||
| return |
There was a problem hiding this comment.
Multi-instance JSON env vars lose all but the last instance after merging
Each _add_grafana() call (and all equivalent _add_* functions) appends a separate integration record per instance to integrations, all with service: "grafana". Downstream, _merge_integrations_by_service iterates the list and writes merged_by_service[service] = integration, so every record overwrites the previous one for the same service key — only the last appended instance survives into the pipeline.
Setting GRAFANA_INSTANCES='[{"name":"prod",...},{"name":"staging",...}]' silently discards prod entirely. This directly contradicts the stated goal of multi-instance support from environment variables.
The fix is to consolidate all parsed instances into a single integration record with an instances array, mirroring the store format:
if instances_json:
instances_list = []
for inst in instances_json:
if not inst.get("url") and not inst.get("endpoint"):
continue
config = {
"endpoint": inst.get("url") or inst.get("endpoint", ""),
"api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(),
}
try:
GrafanaIntegrationConfig.model_validate(config)
except Exception:
continue
instances_list.append({
"name": inst.get("name", "default"),
"tags": inst.get("tags", []),
"credentials": config,
})
if instances_list:
integrations.append({
"id": "env-grafana-multi",
"service": "grafana",
"status": "active",
"instances": instances_list,
})
returnThe same pattern applies to _add_datadog, _add_honeycomb, _add_coralogix, _add_aws, _add_github, and _add_sentry.
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 488-516
Comment:
**Multi-instance JSON env vars lose all but the last instance after merging**
Each `_add_grafana()` call (and all equivalent `_add_*` functions) appends a **separate integration record per instance** to `integrations`, all with `service: "grafana"`. Downstream, `_merge_integrations_by_service` iterates the list and writes `merged_by_service[service] = integration`, so every record overwrites the previous one for the same service key — only the last appended instance survives into the pipeline.
Setting `GRAFANA_INSTANCES='[{"name":"prod",...},{"name":"staging",...}]'` silently discards `prod` entirely. This directly contradicts the stated goal of multi-instance support from environment variables.
The fix is to consolidate all parsed instances into a **single** integration record with an `instances` array, mirroring the store format:
```python
if instances_json:
instances_list = []
for inst in instances_json:
if not inst.get("url") and not inst.get("endpoint"):
continue
config = {
"endpoint": inst.get("url") or inst.get("endpoint", ""),
"api_key": inst.get("api_key") or os.getenv("GRAFANA_READ_TOKEN", "").strip(),
}
try:
GrafanaIntegrationConfig.model_validate(config)
except Exception:
continue
instances_list.append({
"name": inst.get("name", "default"),
"tags": inst.get("tags", []),
"credentials": config,
})
if instances_list:
integrations.append({
"id": "env-grafana-multi",
"service": "grafana",
"status": "active",
"instances": instances_list,
})
return
```
The same pattern applies to `_add_datadog`, `_add_honeycomb`, `_add_coralogix`, `_add_aws`, `_add_github`, and `_add_sentry`.
How can I resolve this? If you propose a fix, please make it concise.| if key in ("grafana", "grafana_local"): | ||
| try: | ||
| grafana_config = GrafanaIntegrationConfig.model_validate( | ||
| { | ||
| "endpoint": credentials.get("endpoint", ""), | ||
| "api_key": credentials.get("api_key", ""), | ||
| "integration_id": integration.get("id", ""), | ||
| for instance in instances: | ||
| credentials = instance.get("credentials", {}) | ||
| try: | ||
| grafana_config = GrafanaIntegrationConfig.model_validate( | ||
| { | ||
| "endpoint": credentials.get("endpoint", ""), | ||
| "api_key": credentials.get("api_key", ""), | ||
| "integration_id": integration.get("id", ""), | ||
| } | ||
| ) | ||
| except Exception: | ||
| continue | ||
| if not grafana_config.endpoint: | ||
| continue | ||
| instance_name = instance.get("name", "default") | ||
| if grafana_config.is_local: | ||
| resolved["grafana_local"] = { | ||
| "endpoint": grafana_config.endpoint, | ||
| "api_key": "", | ||
| "integration_id": grafana_config.integration_id, | ||
| "instance_name": instance_name, | ||
| "tags": instance.get("tags", []), | ||
| } | ||
| elif grafana_config.api_key and grafana_config.api_key != "local": | ||
| resolved["grafana"] = { | ||
| **grafana_config.model_dump(), | ||
| "instance_name": instance_name, | ||
| "tags": instance.get("tags", []), | ||
| } | ||
| ) | ||
| except Exception: | ||
| continue | ||
| if not grafana_config.endpoint: | ||
| continue | ||
| if grafana_config.is_local: | ||
| # Always treat localhost Grafana as grafana_local (Loki only, anonymous auth) | ||
| resolved["grafana_local"] = { | ||
| "endpoint": grafana_config.endpoint, | ||
| "api_key": "", | ||
| "integration_id": grafana_config.integration_id, | ||
| } | ||
| elif grafana_config.api_key and grafana_config.api_key != "local": | ||
| resolved["grafana"] = grafana_config.model_dump() | ||
|
|
There was a problem hiding this comment.
Last instance silently wins in
_classify_integrations for all services
Every service block loops for instance in instances: but unconditionally overwrites resolved[key] on each iteration. When an integration record genuinely has multiple instances (which the store now supports), only the final valid instance is surfaced to the pipeline — all earlier ones are discarded without any indication.
For example, a grafana record with instances ["prod", "staging"] will always resolve to staging's credentials. The same pattern repeats for datadog, honeycomb, coralogix, sentry, gitlab, mongodb, postgresql, vercel, and opsgenie.
If the design intent is "pick the first valid instance," add a break after setting resolved.
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 123-153
Comment:
**Last instance silently wins in `_classify_integrations` for all services**
Every service block loops `for instance in instances:` but unconditionally overwrites `resolved[key]` on each iteration. When an integration record genuinely has multiple instances (which the store now supports), only the final valid instance is surfaced to the pipeline — all earlier ones are discarded without any indication.
For example, a grafana record with instances `["prod", "staging"]` will always resolve to staging's credentials. The same pattern repeats for datadog, honeycomb, coralogix, sentry, gitlab, mongodb, postgresql, vercel, and opsgenie.
If the design intent is "pick the first valid instance," add a `break` after setting `resolved`.
How can I resolve this? If you propose a fix, please make it concise.| def get_integration( | ||
| service: str, | ||
| name: str | None = None, | ||
| tags: list[str] | None = None, | ||
| ) -> dict[str, Any] | None: | ||
| """Return the first active integration for a service matching criteria, or None. | ||
|
|
||
| Args: | ||
| service: The service name (e.g., "grafana", "datadog") | ||
| name: Filter by instance name (optional) | ||
| tags: Filter by instance tags (optional, matches if any tag overlaps) | ||
|
|
||
| Returns: | ||
| Integration dict with instances, or None if not found. | ||
| """ | ||
| for i in load_integrations(): | ||
| if i.get("service") == service and i.get("status") == "active": | ||
| if i.get("service") != service or i.get("status") != "active": | ||
| continue | ||
|
|
||
| instances = i.get("instances", []) | ||
| if not instances: | ||
| continue | ||
|
|
||
| if name is None and tags is None: | ||
| return i | ||
|
|
||
| for instance in instances: | ||
| instance_name = instance.get("name", "default") | ||
| instance_tags = instance.get("tags", []) | ||
|
|
||
| if name is not None and instance_name != name: | ||
| continue | ||
| if tags is not None and not set(tags) & set(instance_tags): | ||
| continue | ||
|
|
||
| return i | ||
|
|
||
| return None |
There was a problem hiding this comment.
get_integration returns full record (all instances) when filtering by name or tag
When a caller passes name="prod", the function finds an integration that contains a matching instance, but returns the entire integration record — including instances that don't match (e.g., staging). Callers expecting only the prod instance data will inadvertently have access to all instances.
Consider returning only the matched instance inside the record, or document explicitly that the full record is returned and callers must filter instances themselves.
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/store.py
Line: 90-127
Comment:
**`get_integration` returns full record (all instances) when filtering by name or tag**
When a caller passes `name="prod"`, the function finds an integration that contains a matching instance, but returns the **entire integration record** — including instances that don't match (e.g., `staging`). Callers expecting only the `prod` instance data will inadvertently have access to all instances.
Consider returning only the matched instance inside the record, or document explicitly that the full record is returned and callers must filter `instances` themselves.
How can I resolve this? If you propose a fix, please make it concise.| except Exception: | ||
| continue | ||
| instances_list = [] | ||
| for inst in instances_json: |
| elif key == "aws": | ||
| if "aws" in resolved: | ||
| continue | ||
| raw_config: dict[str, Any] = { | ||
| "region": credentials.get("region", "us-east-1"), | ||
| "role_arn": integration.get("role_arn", ""), | ||
| "external_id": integration.get("external_id", ""), | ||
| "integration_id": integration.get("id", ""), | ||
| } | ||
| if credentials.get("access_key_id") and credentials.get("secret_access_key"): | ||
| raw_config["credentials"] = { | ||
| "access_key_id": credentials.get("access_key_id", ""), | ||
| "secret_access_key": credentials.get("secret_access_key", ""), | ||
| "session_token": credentials.get("session_token", ""), | ||
| for instance in instances: | ||
| credentials = instance.get("credentials", {}) | ||
| raw_config: dict[str, Any] = { | ||
| "region": credentials.get("region", "us-east-1"), | ||
| "role_arn": integration.get("role_arn", ""), | ||
| "external_id": integration.get("external_id", ""), | ||
| "integration_id": integration.get("id", ""), | ||
| } | ||
| try: | ||
| resolved["aws"] = AWSIntegrationConfig.model_validate(raw_config).model_dump( | ||
| exclude_none=True | ||
| ) | ||
| except Exception: | ||
| continue | ||
| if credentials.get("access_key_id") and credentials.get("secret_access_key"): | ||
| raw_config["credentials"] = { | ||
| "access_key_id": credentials.get("access_key_id", ""), | ||
| "secret_access_key": credentials.get("secret_access_key", ""), | ||
| "session_token": credentials.get("session_token", ""), | ||
| } | ||
| try: | ||
| resolved["aws"] = { | ||
| **AWSIntegrationConfig.model_validate(raw_config).model_dump( | ||
| exclude_none=True | ||
| ), | ||
| "instance_name": instance.get("name", "default"), | ||
| "tags": instance.get("tags", []), | ||
| } | ||
| except Exception: | ||
| continue | ||
|
|
There was a problem hiding this comment.
AWS
role_arn/external_id read from wrong dict level
raw_config pulls role_arn and external_id from the top-level integration record (integration.get("role_arn", "")), but the new multi-instance format stores them inside instance.credentials (set by _add_aws()). For any env-loaded or v2-store AWS integration, these keys are absent at the integration level, so both fields resolve to "". AWSIntegrationConfig._require_auth_method then raises a ValueError, which is silently swallowed by except Exception: continue, and the AWS integration is never resolved. Only migrated v1 store entries — which happen to preserve the top-level role_arn key — still work. Static-key auth is unaffected because access_key_id/secret_access_key are correctly read from credentials.
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/nodes/resolve_integrations/node.py
Line: 155-182
Comment:
**AWS `role_arn`/`external_id` read from wrong dict level**
`raw_config` pulls `role_arn` and `external_id` from the top-level integration record (`integration.get("role_arn", "")`), but the new multi-instance format stores them inside `instance.credentials` (set by `_add_aws()`). For any env-loaded or v2-store AWS integration, these keys are absent at the integration level, so both fields resolve to `""`. `AWSIntegrationConfig._require_auth_method` then raises a `ValueError`, which is silently swallowed by `except Exception: continue`, and the AWS integration is never resolved. Only migrated v1 store entries — which happen to preserve the top-level `role_arn` key — still work. Static-key auth is unaffected because `access_key_id`/`secret_access_key` are correctly read from `credentials`.
How can I resolve this? If you propose a fix, please make it concise.
Issue #475 — Multi-Account/Region/Cluster Support Implementation
Problem
Real deployments have multiple clusters, regions, teams, and accounts for the same provider. The current integration model only supported one backend per provider.
Implemented Solution
Usage:
Multi-instance JSON
GRAFANA_INSTANCES='[{"name":"prod","url":"https://prod.grafana.net","api_key":"...","tags":["prod"]},{"name":"staging","url":"https://staging.grafana.net","api_key":"...","tags":["staging"]}]'
Single-instance (existing - still works)
GRAFANA_INSTANCE_URL=...
GRAFANA_READ_TOKEN=...
4. Integration Resolution (_classify_integrations)
First instance (backward compatible)
grafana = get_integration("grafana")
Filter by name
grafana_prod = get_integration("grafana", name="prod")
Filter by tags (matches if any tag overlaps)
grafana_prod = get_integration("grafana", tags=["prod"])
Done When (from issue)
Tests
Files Changed
/fix #475