Skip to content

Conversation

@onematchfox
Copy link
Contributor

@onematchfox onematchfox commented Nov 25, 2025

This PR aims to make it possible to scale the number of replicas for the controller - enabling a HA deployment of it. The current shortcomings it address include:

  • support the use of environment variables for configuring the controller thereby enabling the use of a Secret to configure a Postgres database URL
  • ensuring that an error is returned if someone tries to use the Helm chart to deploy multiple controller replicas whilst still using SQLite
  • enabling leader election if multiple replicas are configured when deploying via Helm
  • ensuring that A2A API is available across all replicas when leader election is enabled
  • added RBAC resources needed for leader election
  • fixes upserts on postgres

Code can be tested locally by running:

$ make kagent-addon-install
$ KAGENT_HELM_EXTRA_ARGS="--set controller.replicas=3 --set database.type=postgres --set database.postgres.url=postgres://postgres:kagent@postgres.kagent.svc.cluster.local:5432/kagent" \
  make helm-install

Copilot AI review requested due to automatic review settings November 25, 2025 11:31
@onematchfox onematchfox marked this pull request as draft November 25, 2025 11:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables high availability (HA) deployments of the kagent controller by supporting multiple replicas with leader election and shared state via PostgreSQL. The changes refactor configuration management to use environment variables (allowing Secrets for sensitive database credentials), add leader election RBAC resources, and restructure A2A API registration to work across all replicas rather than only in the reconciliation loop.

  • Introduces environment variable-based configuration with the LoadFromEnv function, enabling Secret-based database URL configuration
  • Adds Helm validation preventing SQLite use with multiple controller replicas, and auto-enables leader election when replicas > 1
  • Refactors A2A handler registration from reconciler-based to cache informer-based via A2ARegistrar, ensuring A2A APIs are available on all replicas regardless of leader election status

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
helm/kagent/values.yaml Adds envFrom field to controller configuration and fixes indentation in volume/volumeMount comment examples
helm/kagent/templates/rbac/leader-election-role.yaml New Role granting permissions for leader election (leases and events)
helm/kagent/templates/rbac/leader-election-rolebinding.yaml New RoleBinding associating leader election Role with controller ServiceAccount
helm/kagent/templates/controller-deployment.yaml Refactors to use ConfigMap-based environment variables, conditionalizes volumes/volumeMounts, and removes inline command args
helm/kagent/templates/controller-configmap.yaml New ConfigMap consolidating all controller configuration as environment variables, including leader election flag based on replica count
helm/kagent/templates/_helpers.tpl Adds validation preventing SQLite use with multiple replicas
go/pkg/app/app_test.go Comprehensive tests for LoadFromEnv function covering string, bool, duration, and quantity flag types
go/pkg/app/app.go Implements LoadFromEnv to load configuration from environment variables and restructures A2A registration to use new registrar
go/internal/controller/translator/agent/adk_api_translator.go Extracts agent card building logic to buildAgentCard method and adds TranslateAgentCard interface method for use by A2A registrar
go/internal/controller/reconciler/reconciler.go Removes A2A reconciler dependency as A2A registration is now handled by separate registrar component
go/internal/controller/a2a/a2a_reconciler.go Deleted file - reconciler-based A2A registration replaced with cache-based registrar
go/internal/a2a/a2a_registrar.go New registrar component using cache informer to register/update/remove A2A handlers on all replicas
contrib/addons/postgres.yaml Adds PostgreSQL deployment for testing HA controller setup with shared database
Makefile Adds Postgres to addon installation and removes outdated Istio reference from comments
Comments suppressed due to low confidence (1)

helm/kagent/templates/controller-deployment.yaml:35

  • The sqlite-volume is defined unconditionally (lines 32-34) even when database.type is "postgres". This will create an unused volume. The volume definition should also be conditionally included based on the database type.

Suggestion: Wrap the sqlite-volume definition with {{- if eq .Values.database.type "sqlite" }} and update the outer condition on line 30 to be: {{- if or (eq .Values.database.type "sqlite") (gt (len .Values.controller.volumes) 0) }}

      - name: sqlite-volume
        emptyDir:
          sizeLimit: 500Mi
          medium: Memory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@onematchfox onematchfox force-pushed the controller-scaling branch 6 times, most recently from 194dccf to 72dac59 Compare November 25, 2025 13:24
@onematchfox onematchfox marked this pull request as ready for review November 25, 2025 13:42
@onematchfox onematchfox changed the title Support scaling controller deployment feat(controller): support HA deployment Nov 25, 2025
@onematchfox onematchfox force-pushed the controller-scaling branch 2 times, most recently from ec1572f to c0f9be4 Compare November 25, 2025 14:11
EItanya pushed a commit that referenced this pull request Dec 1, 2025
This PR ensures that the log level specified when installing kagent is
respected by the controller. Currently this is not respected because
when `Development: true` is set, controller-runtime's zap logger enables
verbosity level (`V(1)`) logs by default, which get logged as
[`log.V(1).Info(...)`](https://github.com/kagent-dev/kagent/blob/9438c9c0f2c79daf632555df1d7d3cb2d04b7b81/go/internal/httpserver/handlers/health.go#L21)
but displayed as `DEBUG` in the output. `-zap-log-level` controls the
minimum log level (info, error, etc.), but does NOT control the
verbosity threshold for V(n) logs.

Note:
- Switching away from development logging by default also switches the
default log format from `ConsoleEncoder` to `JsonEncoder` (see examples
below)[^1].

Pre this PR:

```
2025-11-26T07:07:20Z	INFO	Starting KAgent Controller	{"version": "v0.7.5-1-g9438c9c0", "git_commit": "9438c9c0", "build_date": "2025-11-26"}
2025-11-26T07:07:20Z	INFO	Config	{"config": {"Metrics":{"Addr":"0","CertPath":"","CertName":"tls.crt","CertKey":"tls.key"},"Streaming":{"MaxBufSize":"1Mi","InitialBufSize":"4Ki","Timeout":600000000000},"LeaderElection":false,"ProbeAddr":":8082","SecureMetrics":true,"EnableHTTP2":false,"DefaultModelConfig":{"Namespace":"kagent","Name":"default-model-config"},"HttpServerAddr":":8083","WatchNamespaces":"","A2ABaseUrl":"http://127.0.0.1:8083","Database":{"Type":"sqlite","Path":"/sqlite-volume/kagent.db","Url":"postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud"}}}
2025-11-26T07:07:20Z	INFO	Updating GOMAXPROCS=2: determined from CPU quota
2025-11-26T07:07:20Z	INFO	setup	Starting KAgent Controller	{"version": "v0.7.5-1-g9438c9c0", "git_commit": "9438c9c0", "build_date": "2025-11-26"}
2025-11-26T07:07:20Z	INFO	setup	Watching all namespaces (no valid namespaces specified)
2025-11-26T07:07:20Z	INFO	MCPServer CRD not found - controller will not be started	{"controller": "mcpserver"}
2025-11-26T07:07:20Z	INFO	Could not find CRD for tool server - API integration will be disabled	{"toolServerType": "MCPServer.kagent.dev"}
2025-11-26T07:07:20Z	INFO	setup	starting manager
2025-11-26T07:07:20Z	INFO	starting server	{"name": "health probe", "addr": "[::]:8082"}
2025-11-26T07:07:20Z	INFO	http-server	Starting HTTP server	{"address": ":8083"}
2025-11-26T07:07:20Z	INFO	Starting EventSource	{"controller": "service", "controllerGroup": "", "controllerKind": "Service", "source": "kind source: *v1.Service"}
2025-11-26T07:07:20Z	INFO	Starting Controller	{"controller": "service", "controllerGroup": "", "controllerKind": "Service"}
2025-11-26T07:07:20Z	INFO	Starting workers	{"controller": "service", "controllerGroup": "", "controllerKind": "Service", "worker count": 1}
2025-11-26T07:07:20Z	INFO	Starting EventSource	{"controller": "modelconfig", "controllerGroup": "kagent.dev", "controllerKind": "ModelConfig", "source": "kind source: *v1.Secret"}
2025-11-26T07:07:20Z	INFO	Starting EventSource	{"controller": "agent", "controllerGroup": "kagent.dev", "controllerKind": "Agent", "source": "kind source: *v1.Service"}
2025-11-26T07:07:20Z	INFO	setup	starting pprof server
2025-11-26T07:07:20Z	INFO	setup	pprof server started	{"address": ":6060"}
...
2025-11-26T07:07:20Z	INFO	Starting Controller	{"controller": "agent", "controllerGroup": "kagent.dev", "controllerKind": "Agent"}
2025-11-26T07:07:20Z	INFO	Starting workers	{"controller": "agent", "controllerGroup": "kagent.dev", "controllerKind": "Agent", "worker count": 1}
2025-11-26T07:07:20Z	INFO	Starting Controller	{"controller": "remotemcpserver", "controllerGroup": "kagent.dev", "controllerKind": "RemoteMCPServer"}
2025-11-26T07:07:20Z	INFO	Starting workers	{"controller": "remotemcpserver", "controllerGroup": "kagent.dev", "controllerKind": "RemoteMCPServer", "worker count": 1}
2025-11-26T07:07:48Z	DEBUG	http	Request started	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:34006"}
2025-11-26T07:07:48Z	DEBUG	http.health-handler	Handling health check request	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:34006"}
2025-11-26T07:07:48Z	INFO	http	Request completed	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:34006", "status": 200, "duration": "117.034µs"}
2025-11-26T07:07:49Z	DEBUG	http	Request started	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:47646"}
2025-11-26T07:07:49Z	DEBUG	http.health-handler	Handling health check request	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:47646"}
2025-11-26T07:07:49Z	INFO	http	Request completed	{"method": "GET", "path": "/health", "remote_addr": "10.244.0.1:47646", "status": 200, "duration": "115.967µs"}

```

Post this PR (without log level being overriden as we do when running
locally - note the missing "debug" logs):

```
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"setup","msg":"Starting KAgent Controller","version":"v0.7.5-3-g44e8c6d3","git_commit":"44e8c6d3","build_date":"2025-11-26","config":{"Metrics":{"Addr":"0","CertPath":"","CertName":"tls.crt","CertKey":"tls.key"},"Streaming":{"MaxBufSize":"1Mi","InitialBufSize":"4Ki","Timeout":600000000000},"LeaderElection":false,"ProbeAddr":":8082","SecureMetrics":true,"EnableHTTP2":false,"DefaultModelConfig":{"Namespace":"kagent","Name":"default-model-config"},"HttpServerAddr":":8083","WatchNamespaces":"","A2ABaseUrl":"http://127.0.0.1:8083","Database":{"Type":"sqlite","Path":"/sqlite-volume/kagent.db","Url":"postgres://postgres:kagent@db.kagent.svc.cluster.local:5432/crud"}}}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Updating GOMAXPROCS=2: determined from CPU quota"}
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"setup","msg":"Watching all namespaces (no valid namespaces specified)"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"MCPServer CRD not found - controller will not be started","controller":"mcpserver"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Could not find CRD for tool server - API integration will be disabled","toolServerType":"MCPServer.kagent.dev"}
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"starting server","name":"health probe","addr":"[::]:8082"}
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"http-server","msg":"Starting HTTP server","address":":8083"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting EventSource","controller":"service","controllerGroup":"","controllerKind":"Service","source":"kind source: *v1.Service"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting Controller","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting workers","controller":"service","controllerGroup":"","controllerKind":"Service","worker count":1}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting EventSource","controller":"modelconfig","controllerGroup":"kagent.dev","controllerKind":"ModelConfig","source":"kind source: *v1.Secret"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting EventSource","controller":"agent","controllerGroup":"kagent.dev","controllerKind":"Agent","source":"kind source: *v1.Service"}
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"setup","msg":"starting pprof server"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting EventSource","controller":"agent","controllerGroup":"kagent.dev","controllerKind":"Agent","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2025-11-26T07:32:28Z","logger":"setup","msg":"pprof server started","address":":6060"}
...
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting Controller","controller":"modelconfig","controllerGroup":"kagent.dev","controllerKind":"ModelConfig"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting workers","controller":"modelconfig","controllerGroup":"kagent.dev","controllerKind":"ModelConfig","worker count":1}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting Controller","controller":"agent","controllerGroup":"kagent.dev","controllerKind":"Agent"}
{"level":"info","ts":"2025-11-26T07:32:28Z","msg":"Starting workers","controller":"agent","controllerGroup":"kagent.dev","controllerKind":"Agent","worker count":1}
{"level":"info","ts":"2025-11-26T07:32:55Z","logger":"http","msg":"Request completed","method":"GET","path":"/health","remote_addr":"10.244.0.1:34516","status":200,"duration":0.000057265}
{"level":"info","ts":"2025-11-26T07:32:55Z","logger":"http","msg":"Request completed","method":"GET","path":"/health","remote_addr":"10.244.0.1:34524","status":200,"duration":0.000050881}
```

[^1]: If anyone is opposed to JSON logs locally then we can reconfigure
these locally but this will require updates to the Helm chart that I'm
trying to avoid pending the switch to using env vars instead of args in
#1133. Specifically, I don't want to add `controller.args` to the values
now when that could subsequently be removed (breaking change) in the
linked PR.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
EItanya pushed a commit that referenced this pull request Dec 1, 2025
Split this out of #1133 to try reduce the size of that PR - but also
because it's not strictly related to being able to scale the controller
- it simply manifested when needing to switch to postgres when running
multiple controller replicas.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox onematchfox force-pushed the controller-scaling branch 3 times, most recently from aa726b5 to 14211de Compare December 2, 2025 08:15
EItanya pushed a commit that referenced this pull request Dec 2, 2025
Another artifact of #1133. No need for the sqlite volume+mount when
database is set to postgres.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
EItanya pushed a commit that referenced this pull request Dec 2, 2025
…te database (#1144)

Running multiple controller replicas when using a local SQLite database
will lead to errors as API requests will inevitably end up being handled
by a replicas that does not have the local state (e.g. A2A session).
This check/error hopefully prevents users from making this mistake.

Split out from #1133

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
EItanya added a commit that referenced this pull request Dec 2, 2025
Enables local testing using postgres as a backing store for controller.

Split out from #1133 (with added docs).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
EItanya pushed a commit that referenced this pull request Dec 2, 2025
**Yet another PR split out from #1133 to try reduce review burden** -
keeping that one open for now as all of these other PRs are ultimately
working towards that goal.

This PR refactors the kagent controller to support the use of
environment variables for configuration in addition to command-line
arguments. It also updates the Helm chart to make use of env vars
instead of command line args and adds the ability for user's to supply
their own environment variables with custom configuration. This allows
users to supply sensitive configuration (e.g. postgres database url) via
secrets instead of exposing these via `args`. Env vars are also easier
to patch when working with rendered manifests if needed.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
onematchfox and others added 3 commits December 2, 2025 15:58
Removed A2A "reconciler" and replaced with a registrar that is responsible for registering/deregistering mux handlers. Main controller still manages cluster resources and DB/status.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
…oller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox
Copy link
Contributor Author

Closing since everything in here has now been split out and only 2 open PRs remain.

@onematchfox onematchfox closed this Dec 2, 2025
EItanya pushed a commit that referenced this pull request Dec 2, 2025
This PR enables leader election on the controller if it is configured
with one than 1 replica to ensure that only 1 replica is actively
reconciling watched manifests. It also ensures that the necessary RBAC
manifests are created.

Final part of #1133 (excluding #1138).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox onematchfox deleted the controller-scaling branch December 3, 2025 10:17
EItanya pushed a commit that referenced this pull request Dec 3, 2025
…econcilation (#1138)

**Decided to split this out of
#1133 to try make review a
little easier as it's a chunky commit that can live in isolation of the
rest of the changes in that PR**

This change separates A2A handler registration from the main `Agent`
controller reconciliation loop by introducing a dedicated `A2ARegistrar`
that manages the A2A routing table independently from the main
controller.

Currently, A2A handler registration is tightly coupled to the `Agent`
controller's reconciliation loop, which performs the following
operations:
1. Reconcile Kubernetes resources (Deployment, Service, etc.)
2. Store agent metadata in database
3. Register A2A handler in routing table
4. Update resource status

This coupling is problematic for a number of reasons:
1. Breaks horizontal scaling - with leader election enabled (required to
prevent duplicate reconciliation), only the leader pod performs
reconciliation and registers A2A handlers. When API requests hit
non-leader replicas, they fail because those replicas lack the necessary
handler registrations.
2. Could be argued that this violates separation of concerns - the
controller handles both cluster resource management (its core
responsibility) and API routing configuration (an orthogonal concern).
3. Makes future architectural changes (e.g., splitting API and control
plane) unnecessarily complex.

This PR attempts to address those concerns ensuring that all controller
replicas, when scaled, will maintain consistent A2A routing tables
enabling transparent load balancing across replicas. A2A logic is also
consolidated into a dedicated package rather than scattered across
controller code ensuring a clean separation of API and control plane
such that these could be split into independent deployments without
significant refactoring in future.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
…ent-dev#1137)

Split this out of kagent-dev#1133 to try reduce the size of that PR - but also
because it's not strictly related to being able to scale the controller
- it simply manifested when needing to switch to postgres when running
multiple controller replicas.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
…-dev#1140)

Another artifact of kagent-dev#1133. No need for the sqlite volume+mount when
database is set to postgres.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
…te database (kagent-dev#1144)

Running multiple controller replicas when using a local SQLite database
will lead to errors as API requests will inevitably end up being handled
by a replicas that does not have the local state (e.g. A2A session).
This check/error hopefully prevents users from making this mistake.

Split out from kagent-dev#1133

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
Enables local testing using postgres as a backing store for controller.

Split out from kagent-dev#1133 (with added docs).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
**Yet another PR split out from kagent-dev#1133 to try reduce review burden** -
keeping that one open for now as all of these other PRs are ultimately
working towards that goal.

This PR refactors the kagent controller to support the use of
environment variables for configuration in addition to command-line
arguments. It also updates the Helm chart to make use of env vars
instead of command line args and adds the ability for user's to supply
their own environment variables with custom configuration. This allows
users to supply sensitive configuration (e.g. postgres database url) via
secrets instead of exposing these via `args`. Env vars are also easier
to patch when working with rendered manifests if needed.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
This PR enables leader election on the controller if it is configured
with one than 1 replica to ensure that only 1 replica is actively
reconciling watched manifests. It also ensures that the necessary RBAC
manifests are created.

Final part of kagent-dev#1133 (excluding kagent-dev#1138).

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
GTRekter pushed a commit to GTRekter/kagent that referenced this pull request Dec 4, 2025
…econcilation (kagent-dev#1138)

**Decided to split this out of
kagent-dev#1133 to try make review a
little easier as it's a chunky commit that can live in isolation of the
rest of the changes in that PR**

This change separates A2A handler registration from the main `Agent`
controller reconciliation loop by introducing a dedicated `A2ARegistrar`
that manages the A2A routing table independently from the main
controller.

Currently, A2A handler registration is tightly coupled to the `Agent`
controller's reconciliation loop, which performs the following
operations:
1. Reconcile Kubernetes resources (Deployment, Service, etc.)
2. Store agent metadata in database
3. Register A2A handler in routing table
4. Update resource status

This coupling is problematic for a number of reasons:
1. Breaks horizontal scaling - with leader election enabled (required to
prevent duplicate reconciliation), only the leader pod performs
reconciliation and registers A2A handlers. When API requests hit
non-leader replicas, they fail because those replicas lack the necessary
handler registrations.
2. Could be argued that this violates separation of concerns - the
controller handles both cluster resource management (its core
responsibility) and API routing configuration (an orthogonal concern).
3. Makes future architectural changes (e.g., splitting API and control
plane) unnecessarily complex.

This PR attempts to address those concerns ensuring that all controller
replicas, when scaled, will maintain consistent A2A routing tables
enabling transparent load balancing across replicas. A2A logic is also
consolidated into a dedicated package rather than scattered across
controller code ensuring a clean separation of API and control plane
such that these could be split into independent deployments without
significant refactoring in future.

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Ivan Porta <porta.ivan@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant