Skip to content

Conversation

@onematchfox
Copy link
Contributor

@onematchfox onematchfox commented Aug 7, 2025

Controllers shouldn't be updating objects they do not own/manage. Instead, controllers should explicitly watch the unowned resources that they do not own and ensure that any affected objects that they do own are queued for reconciliation. Ref

This PR refactors the existing implementation so that controllers, such as the toolserver-controller, do not reconcile agents, and instead, the agent controller watches for changes to toolserver objects. For now, I've left all relationships in place with the most major difference being the removal of the secrets controller, whereby agents were reconciled based on secret updates. Now, the modelconfig controller watches for secret updates, and the agent controller, in turn, watches for model config updates. With these changes in place, we can also start introducing the use of predicates to limit unnecessary reconciliation and/or even revisiting whether these relationships are needed at all.

Side notes:

  • In the current implementation (both existing code and post this PR), updates to related resources don't have much of an effect. I imagine, for example, that watching secrets was put in place with the idea being that if the API key for a model is updated, then agents would be restarted to ensure they use the new config. This isn't going to happen - not in the current code nor as a result of these changes. Perhaps it did in the past (I haven't gone back in history to look at how things worked when agents were defined in autogen). But going forward, we should either try and make sure that changes are correctly propagated or remove the watch as it just results in unnecessary reconciliation. Another example would be agents use of memory since memory isn't reflected anywhere in the representation of an agent at the moment but perhaps this is was just missed when switching over to ADK?
  • I've left the shared Reconciler in place for the time being, although arguably this whole situation could have been avoided if the controller code had been kept isolated (with shared functionality/code exposed through a package where needed). I did start looking at extracting out controller-specific code, but stopped when I noticed the "dbclient Mutex" that is shared between the functions used by AgentReconciler and ToolServerReconciler

Related to #700

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Agent watches ModelConfig -> ModelConfig watches Secret

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
This isn't needed any more - we only ever reconcile a single agent at a time now.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox onematchfox marked this pull request as ready for review August 7, 2025 07:47
Copilot AI review requested due to automatic review settings August 7, 2025 07:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the controller architecture to follow Kubernetes best practices by having controllers watch secondary resources instead of directly updating unowned resources. The main change removes cross-controller reconciliation where controllers like toolserver-controller would directly reconcile agents they don't own.

Key changes:

  • Removes the secrets controller that directly reconciled agents based on secret updates
  • Adds watch relationships where agent controller watches for changes to toolserver, modelconfig, and memory resources
  • Refactors finder methods to return resources instead of error-prone reconciliation chains

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
go/controller/internal/reconciler/reconciler.go Removes cross-controller reconciliation logic and refactors finder methods to support watching pattern
go/controller/internal/controller/secret_controller.go Completely removes the secret controller
go/controller/internal/controller/modelconfig_controller.go Adds watch for Secret resources to trigger ModelConfig reconciliation
go/controller/internal/controller/agent_controller.go Adds watches for Memory, ModelConfig, and ToolServer resources to trigger Agent reconciliation
go/controller/cmd/main.go Removes registration of the deleted SecretReconciler

ctx,
modelConfig,
a.reconcileAgents(ctx, agents...),
err,
Copy link

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err variable from the secret validation is being passed as the reconciliation error to reconcileModelConfigStatus. This will mark the ModelConfig as failed when the secret doesn't exist, but the ModelConfig reconciliation itself may have succeeded. The error should only be passed if it represents an actual reconciliation failure.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should only be passed if it represents an actual reconciliation failure.

This is desired behaviour. If the secret doesn't exist, this should reflect in the status of ModelConfig.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox
Copy link
Contributor Author

onematchfox commented Aug 7, 2025

An example of what the changes in this PR enable can be seen here, where we can start to make use of GenerationChangedPredicate to limit unnecessary reconciliation based on updates to status subresources. This would not be nearly as effective if applied against main as it stands today, because agents are explicitly reconciled based on updates to secondary resources. Note: That commit only applies the predicate to primary resources for the time being since it may, for example, be desirable to reconcile an Agent if the Status of a ModelConfig changes.

@EItanya
Copy link
Contributor

EItanya commented Aug 7, 2025

First of all, awesome work! I'll be honest that I'm not the foremost expert on controller-runtime, so super thankful for these fixes/pointers.

In terms of your side points:

  • You are correct that was the original intention, but awesome catch that it's not working anymore. In terms of the Memory object you are also correct. In the move to ADK, that particular object didn't move cleanly, I will create an issue explaining what happened, and add a note to the release.
  • TBH I didn't add the mutex originally, I have an idea of why it's there, but now that we're backed by a DB I'm 99.9999% sure it's unnecessary and we should remove it.

I'm happy to work with you to make more of the proposed changes you've mentioned as I think they would be very important for the consistency of the reconcilers.

@EItanya EItanya merged commit 1899ea1 into kagent-dev:main Aug 7, 2025
10 of 11 checks passed
UriZafrir pushed a commit to UriZafrir/kagent that referenced this pull request Aug 7, 2025
…d resources (kagent-dev#703)

* fix(controller): watch secrets from agents controller

Ref: https://book.kubebuilder.io/reference/watching-resources/secondary-resources-not-owned
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch memory from agents controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch toolservers from agents controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch modelconfig from agent controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch secrets from model config controller

Agent watches ModelConfig -> ModelConfig watches Secret

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): consistent error logging

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): remove `reconcileAgents`

This isn't needed any more - we only ever reconcile a single agent at a time now.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): ensure api key secret exists for model config

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): explicitly set error to nil for memory status

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>
peterj added a commit that referenced this pull request Aug 8, 2025
* feat: implement backend for models ui

Signed-off-by: urizaf-work <uri.zafrir@kaltura.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* add missing providers so they all appear in ui

Signed-off-by: urizaf-work <uri.zafrir@kaltura.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* fix(ui): correctly display args for tool calls in chat (#688)

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* update READMEs based on new architecture (#684)

* update READMEs based on new architecture

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

* PR comments

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

* Update ui/README.md

---------

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Peter Jausovec <peterj@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* [FIX ] - fixes adk performance tuning (#689)

* - fixes adk performance tuning
- dependency versions update

Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>

* fix VERSION in case forked repository without tags

Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>

* revert uv version

Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>

* updated golden e2e

Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>

* fix helm unit tests

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

---------

Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
Signed-off-by: urizaf <urizaf@gmail.com>

* feat: make streaming buffer size configurable (#696)

* feat: make streaming buffer size configurable

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

* switch to resource quantities for buffer size

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

---------

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Signed-off-by: urizaf <urizaf@gmail.com>

* eitanya/fix-python-release (#698)

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Signed-off-by: urizaf <urizaf@gmail.com>

* EP-685-kmcp (#686)

* EP-685-kmcp

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

* Update design/EP-685-kmcp.md

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>

---------

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Lin Sun <lin.sun@solo.io>
Signed-off-by: urizaf <urizaf@gmail.com>

* use the types defined in pkg/client/model.go

Signed-off-by: urizaf <urizaf@gmail.com>

* fix(ui): correct link to switch agent from within chat (#667)

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Co-authored-by: Peter Jausovec <peterj@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* fix(ui): display description for agent tools when editing an agent (#692)

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Co-authored-by: Peter Jausovec <peterj@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* fix(controller): watch secondary resources instead of updating unowned resources (#703)

* fix(controller): watch secrets from agents controller

Ref: https://book.kubebuilder.io/reference/watching-resources/secondary-resources-not-owned
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch memory from agents controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch toolservers from agents controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch modelconfig from agent controller

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): watch secrets from model config controller

Agent watches ModelConfig -> ModelConfig watches Secret

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): consistent error logging

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): remove `reconcileAgents`

This isn't needed any more - we only ever reconcile a single agent at a time now.

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* fix(controller): ensure api key secret exists for model config

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

* refactor(controller): explicitly set error to nil for memory status

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>

---------

Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: urizaf <urizaf@gmail.com>

* change to correct Gemini icon

Signed-off-by: urizaf <urizaf@gmail.com>

---------

Signed-off-by: urizaf-work <uri.zafrir@kaltura.com>
Signed-off-by: urizaf <urizaf@gmail.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Signed-off-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Co-authored-by: urizaf-work <uri.zafrir@kaltura.com>
Co-authored-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Peter Jausovec <peterj@users.noreply.github.com>
Co-authored-by: Dmytro Rashko <dmitriy.rashko@amdocs.com>
Co-authored-by: Lin Sun <lin.sun@solo.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@onematchfox onematchfox deleted the refactor-controller branch September 5, 2025 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants