Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRestructures config loading to return a LoadResult (Config + RawProviders), adds global resilience with per-provider overrides and cache RefreshInterval, changes ProviderFactory API from Register→Add to pass ProviderOptions (Hooks + Resilience) to provider constructors, and threads Retry/CircuitBreaker into llmclient and providers. Changes
Sequence DiagramsequenceDiagram
participant App as App
participant Loader as Config Loader
participant Resolver as Provider Resolver
participant Factory as Provider Factory
participant Provider as Provider
participant LLM as LLM Client
App->>Loader: Load() → LoadResult{Config, RawProviders}
Loader->>Resolver: resolveProviders(RawProviders, Config.Resilience)
Resolver-->>Factory: finalized ProviderConfigs
App->>Factory: Init(ctx, LoadResult, Factory)
Factory->>Provider: Create(apiKey, ProviderOptions{Hooks, Resilience})
Provider->>LLM: llmclient.New(cfg with Retry + CircuitBreaker)
LLM-->>Provider: client
Provider-->>Factory: provider instance
Factory-->>App: InitResult (registry, cache, router)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
internal/providers/gemini/gemini_test.go (1)
33-39: 🧹 Nitpick | 🔵 TrivialLGTM — consider adding a resilience-propagation test for the new feature.
The constructor change is correct. However,
TestNew_ReturnsProvideronly asserts non-nil, giving no coverage for the primary new behavior: that a populatedProviderOptions.Resilience.Retryis wired into the underlyingllmclient.Config.Retry. Consider a focused test:💡 Suggested addition
func TestNew_PropagatesResilienceConfig(t *testing.T) { // Verify the new ProviderOptions.Resilience path is exercised. // Use NewWithHTTPClient to access the concrete type. hooks := llmclient.Hooks{} p := NewWithHTTPClient("test-api-key", nil, hooks) if p.client == nil { t.Fatal("client should not be nil") } // Further assertions on p.client internals would require // an exported accessor or reflection — adding one guards // against silent zero-value misconfiguration. }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini_test.go` around lines 33 - 39, Add a unit test that asserts ProviderOptions.Resilience.Retry is propagated into the underlying llmclient config: create a test (e.g., TestNew_PropagatesResilienceConfig) that constructs the provider via NewWithHTTPClient (so you can inspect the concrete provider), supply a ProviderOptions with a non-default Resilience.Retry configuration, then assert that p.client (or the exported accessor/reflection on the concrete provider) contains the same llmclient.Config.Retry values; if necessary add a small exported accessor on the provider to read the client's config for test visibility.internal/providers/xai/xai.go (1)
49-60: 🧹 Nitpick | 🔵 Trivial
NewWithHTTPClientstill uses rawllmclient.Hooks— not updated toProviderOptions.The primary
Newconstructor now acceptsProviderOptionsand configures resilience (retry + circuit breaker), butNewWithHTTPClientstill takes barehooks llmclient.Hooksand usesDefaultConfig, skipping the resilience configuration entirely. If this path is only for unit tests, document that intent; otherwise it's an inconsistency that could cause subtle behavior differences.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/xai/xai.go` around lines 49 - 60, The NewWithHTTPClient constructor currently takes raw llmclient.Hooks and uses llmclient.DefaultConfig, skipping the resilience setup performed by New; update NewWithHTTPClient to accept ProviderOptions (or at least a ProviderOptions-like param) and apply the same resilience configuration (retries/circuit breaker) when building cfg—i.e., mirror the logic in New that merges options into ProviderOptions and sets up cfg with retries and breaker before calling llmclient.NewWithHTTPClient and p.setHeaders; if this helper is intentionally test-only, add a clear comment on NewWithHTTPClient stating it bypasses resilience for tests and keep the signature but document it.internal/providers/gemini/gemini.go (1)
144-161:⚠️ Potential issue | 🟠 Major
ListModelscreates a newllmclient.Clienton every call — inefficient and missing resilience config.A new
llmclient.New(...)is instantiated per invocation ofListModels. This:
- Allocates a new HTTP client and circuit breaker each time (allocation churn, no state continuity for the CB).
- Uses
DefaultConfigwithout the retry/circuit-breaker settings fromopts.Resilience, so the models endpoint lacks the resilience guarantees the main client has.Create the models client once in the
Newconstructor and store it on theProviderstruct, configured with the same resilience settings.Proposed fix sketch
In the
Providerstruct, add amodelsClient *llmclient.Client, then initialise it inNew:type Provider struct { client *llmclient.Client - hooks llmclient.Hooks + modelsClient *llmclient.Client apiKey string modelsURL string } func New(apiKey string, opts providers.ProviderOptions) core.Provider { p := &Provider{ apiKey: apiKey, - hooks: opts.Hooks, modelsURL: defaultModelsBaseURL, } cfg := llmclient.Config{ /* ... main client cfg ... */ } p.client = llmclient.New(cfg, p.setHeaders) + + modelsCfg := llmclient.Config{ + ProviderName: "gemini", + BaseURL: defaultModelsBaseURL, + Retry: opts.Resilience.Retry, + Hooks: opts.Hooks, + CircuitBreaker: &llmclient.CircuitBreakerConfig{ + FailureThreshold: 5, + SuccessThreshold: 2, + Timeout: 30 * time.Second, + }, + } + p.modelsClient = llmclient.New(modelsCfg, func(req *http.Request) { + q := req.URL.Query() + q.Add("key", p.apiKey) + req.URL.RawQuery = q.Encode() + }) return p }Then in
ListModels, usep.modelsClient.Do(...)directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 144 - 161, ListModels currently constructs a new llmclient.Client on every call which creates allocation churn and omits resilience settings; add a modelsClient *llmclient.Client field to the Provider struct, initialize it in the New constructor using llmclient.DefaultConfig("gemini", p.modelsURL) and copy the same resilience/retry/circuit-breaker settings from opts.Resilience (and set Hooks), and include the existing query-parameter hook that adds p.apiKey; then modify ListModels to call p.modelsClient.Do(...) (or the equivalent request method) instead of creating a new client instance so the models endpoint shares the same resilience config and a single long-lived client.config/config.go (1)
586-609:⚠️ Potential issue | 🟡 MinorDead code: Ollama-specific check at line 588 is redundant.
Lines 584–586 already skip providers with
apiKey == "" && baseURL == "". The Ollama-specific check at line 588 (if kp.providerType == "ollama" && apiKey == "" && baseURL == "") will never be true at that point.🧹 Suggested fix
if apiKey == "" && baseURL == "" { continue } - if kp.providerType == "ollama" && apiKey == "" && baseURL == "" { - continue - } - existing, exists := cfg.Providers[kp.name]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/config.go` around lines 586 - 609, The ollama-specific guard (if kp.providerType == "ollama" && apiKey == "" && baseURL == "") is dead code because earlier logic already skips any provider when apiKey == "" && baseURL == ""; remove that redundant check from the loop and rely on the existing skip logic, keeping the rest of the merge/update logic for cfg.Providers, existing (ProviderConfigInput) and kp.providerType unchanged.internal/providers/factory_test.go (1)
177-244: 🧹 Nitpick | 🔵 Trivial
TestProviderFactory_SetHooksandTestProviderFactory_HooksPassedToBuilderare duplicates.Both tests perform identical steps: set hooks → register builder → create → assert
receivedOpts.Hooks.OnRequestStart != nil. Remove one of them to avoid maintenance overhead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/factory_test.go` around lines 177 - 244, Remove the duplicate test by deleting either TestProviderFactory_SetHooks or TestProviderFactory_HooksPassedToBuilder (they perform identical steps: NewProviderFactory, SetHooks, Register with the same builder, Create, and assert receivedOpts.Hooks.OnRequestStart), keeping a single test (e.g., TestProviderFactory_SetHooks) that verifies hooks are passed to the builder; ensure the remaining test still uses NewProviderFactory(), SetHooks(mockHooks), Register(Registration{ Type: "test", New: func(apiKey string, opts ProviderOptions) core.Provider { ... } }), Create(cfg) and the final assertion that receivedOpts.Hooks.OnRequestStart != nil so behavior is preserved.internal/providers/ollama/ollama.go (1)
54-63: 🧹 Nitpick | 🔵 Trivial
NewWithHTTPClientis test-only; consider standardizing to match production pattern.This constructor is used exclusively in tests but accepts raw
llmclient.Hooksinstead ofProviderOptions, bypassing the resilience config (it callsllmclient.DefaultConfiginstead of usingopts.Resilience.Retrylike the productionNewconstructor). For consistency with other providers and the factory pattern, update the signature to acceptProviderOptionsand extract hooks from it, or add a doc comment explaining the divergence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/ollama/ollama.go` around lines 54 - 63, NewWithHTTPClient currently accepts llmclient.Hooks and bypasses production resilience settings; change its signature to accept ProviderOptions (e.g., func NewWithHTTPClient(apiKey string, httpClient *http.Client, opts ProviderOptions) *Provider), then extract hooks from opts (opts.Hooks) and apply the resilience retry settings to the llmclient config (cfg.Retry = opts.Resilience.Retry) before calling llmclient.NewWithHTTPClient(..., cfg, p.setHeaders); this makes NewWithHTTPClient consistent with New and the factory pattern. If you intentionally want it test-only instead, add a clear doc comment on NewWithHTTPClient explaining why it accepts raw llmclient.Hooks and that it intentionally bypasses opts.Resilience.Retry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/llmclient/client.go`:
- Around line 20-21: The llmclient package currently imports gomodel/config to
use RetryConfig which creates an upward dependency; remove that coupling by
either (A) defining a RetryConfig type inside internal/llmclient (e.g., type
RetryConfig struct {...}) and change references in client.go to use that local
type, or (B) extract a small shared type into a new internal package (e.g.,
internal/types with RetryConfig) and have both gomodel/config and
internal/llmclient import that shared package; update the imports in client.go
to point to the new local/shared type and adjust config initialization code to
map/configure the new type accordingly.
In `@internal/providers/factory_test.go`:
- Around line 274-310: Update the test
TestProviderFactory_Create_PassesResilienceConfig to also assert that
InitialBackoff, MaxBackoff, and BackoffFactor from the config.ResilienceConfig
are propagated into the received ProviderOptions
(receivedOpts.Resilience.Retry.InitialBackoff,
receivedOpts.Resilience.Retry.MaxBackoff,
receivedOpts.Resilience.Retry.BackoffFactor) when calling
ProviderFactory.Create; this ensures ProviderConfig → ProviderOptions copying in
New (registration New func) covers all resilience fields and prevents selective
field-copying bugs.
In `@internal/providers/init.go`:
- Around line 40-50: Update the stale Go doc for the exported function Init to
reflect that background refresh scheduling is configurable rather than hourly:
change the "Background refresh scheduling (hourly)" line to reference
cfg.Cache.RefreshInterval (or "configured interval via
cfg.Cache.RefreshInterval") and ensure the comment still mentions that the
caller must call InitResult.Close() on shutdown; locate the comment above func
Init in internal/providers/init.go and update the text accordingly.
In `@internal/providers/ollama/ollama.go`:
- Around line 42-46: Duplicate hardcoded circuit breaker settings are used
across provider constructors; extract a shared default and wire it into provider
setup. Add a package-level default (e.g., defaultCircuitBreaker :=
llmclient.CircuitBreakerConfig{FailureThreshold:5, SuccessThreshold:2,
Timeout:30*time.Second}) or add a field to ResilienceConfig like
DefaultCircuitBreaker llmclient.CircuitBreakerConfig, then update each provider
New function (the constructors in internal/providers/ollama/ollama.go and the
corresponding openai/xai/groq/gemini providers) to use the shared default and
allow an override from the provider-specific ResilienceConfig when present (use
the config value if non-zero, otherwise fall back to the default).
In `@internal/providers/xai/xai.go`:
- Around line 34-44: The CircuitBreaker settings are hardcoded in
llmclient.Config (the CircuitBreaker field with FailureThreshold,
SuccessThreshold, Timeout) and must be made configurable like Retry via
opts.Resilience; add a CircuitBreakerConfig field to the ResilienceConfig type
(or use config.CircuitBreakerConfig) and wire opts.Resilience.CircuitBreaker
into the llmclient.Config for the xai provider (and mirror the change for other
providers), using the existing defaults (5,2,30s) when
opts.Resilience.CircuitBreaker is nil; additionally extract a helper (e.g.,
defaultLLMClientConfig or providers.defaultLLMClientConfig) that builds the
llmclient.Config from providerName, baseURL, opts.Hooks, opts.Resilience.Retry
and opts.Resilience.CircuitBreaker to remove the duplicated CircuitBreaker block
across providers.
---
Outside diff comments:
In `@config/config.go`:
- Around line 586-609: The ollama-specific guard (if kp.providerType == "ollama"
&& apiKey == "" && baseURL == "") is dead code because earlier logic already
skips any provider when apiKey == "" && baseURL == ""; remove that redundant
check from the loop and rely on the existing skip logic, keeping the rest of the
merge/update logic for cfg.Providers, existing (ProviderConfigInput) and
kp.providerType unchanged.
In `@internal/providers/factory_test.go`:
- Around line 177-244: Remove the duplicate test by deleting either
TestProviderFactory_SetHooks or TestProviderFactory_HooksPassedToBuilder (they
perform identical steps: NewProviderFactory, SetHooks, Register with the same
builder, Create, and assert receivedOpts.Hooks.OnRequestStart), keeping a single
test (e.g., TestProviderFactory_SetHooks) that verifies hooks are passed to the
builder; ensure the remaining test still uses NewProviderFactory(),
SetHooks(mockHooks), Register(Registration{ Type: "test", New: func(apiKey
string, opts ProviderOptions) core.Provider { ... } }), Create(cfg) and the
final assertion that receivedOpts.Hooks.OnRequestStart != nil so behavior is
preserved.
In `@internal/providers/gemini/gemini_test.go`:
- Around line 33-39: Add a unit test that asserts
ProviderOptions.Resilience.Retry is propagated into the underlying llmclient
config: create a test (e.g., TestNew_PropagatesResilienceConfig) that constructs
the provider via NewWithHTTPClient (so you can inspect the concrete provider),
supply a ProviderOptions with a non-default Resilience.Retry configuration, then
assert that p.client (or the exported accessor/reflection on the concrete
provider) contains the same llmclient.Config.Retry values; if necessary add a
small exported accessor on the provider to read the client's config for test
visibility.
In `@internal/providers/gemini/gemini.go`:
- Around line 144-161: ListModels currently constructs a new llmclient.Client on
every call which creates allocation churn and omits resilience settings; add a
modelsClient *llmclient.Client field to the Provider struct, initialize it in
the New constructor using llmclient.DefaultConfig("gemini", p.modelsURL) and
copy the same resilience/retry/circuit-breaker settings from opts.Resilience
(and set Hooks), and include the existing query-parameter hook that adds
p.apiKey; then modify ListModels to call p.modelsClient.Do(...) (or the
equivalent request method) instead of creating a new client instance so the
models endpoint shares the same resilience config and a single long-lived
client.
In `@internal/providers/ollama/ollama.go`:
- Around line 54-63: NewWithHTTPClient currently accepts llmclient.Hooks and
bypasses production resilience settings; change its signature to accept
ProviderOptions (e.g., func NewWithHTTPClient(apiKey string, httpClient
*http.Client, opts ProviderOptions) *Provider), then extract hooks from opts
(opts.Hooks) and apply the resilience retry settings to the llmclient config
(cfg.Retry = opts.Resilience.Retry) before calling
llmclient.NewWithHTTPClient(..., cfg, p.setHeaders); this makes
NewWithHTTPClient consistent with New and the factory pattern. If you
intentionally want it test-only instead, add a clear doc comment on
NewWithHTTPClient explaining why it accepts raw llmclient.Hooks and that it
intentionally bypasses opts.Resilience.Retry.
In `@internal/providers/xai/xai.go`:
- Around line 49-60: The NewWithHTTPClient constructor currently takes raw
llmclient.Hooks and uses llmclient.DefaultConfig, skipping the resilience setup
performed by New; update NewWithHTTPClient to accept ProviderOptions (or at
least a ProviderOptions-like param) and apply the same resilience configuration
(retries/circuit breaker) when building cfg—i.e., mirror the logic in New that
merges options into ProviderOptions and sets up cfg with retries and breaker
before calling llmclient.NewWithHTTPClient and p.setHeaders; if this helper is
intentionally test-only, add a clear comment on NewWithHTTPClient stating it
bypasses resilience for tests and keep the signature but document it.
---
Duplicate comments:
In `@internal/providers/anthropic/anthropic.go`:
- Around line 41-53: The New function currently hardcodes a
llmclient.CircuitBreakerConfig in Provider creation which duplicates logic
across providers; update New (and where cfg is built) to use a circuit breaker
config coming from opts.Resilience (e.g., opts.Resilience.CircuitBreaker) if
provided, otherwise fall back to the existing defaults (FailureThreshold:5,
SuccessThreshold:2, Timeout:30*time.Second); ensure you reference
llmclient.CircuitBreakerConfig and set cfg.CircuitBreaker to the chosen config
so the provider honors per-instance resilience settings instead of the hardcoded
values.
In `@internal/providers/gemini/gemini.go`:
- Around line 46-56: The circuit breaker values are hardcoded in the
llmclient.Config construction for cfg; instead make the code reuse the
resilience settings from opts (e.g., opts.Resilience or a shared helper) rather
than embedding literal values: read the circuit breaker config from
opts.Resilience.CircuitBreaker (or construct it from opts.Resilience fields) and
assign that to cfg.CircuitBreaker so Gemini uses the same configurable values as
other providers (refer to llmclient.Config, CircuitBreakerConfig, cfg, and
opts.Resilience).
In `@internal/providers/groq/groq.go`:
- Around line 35-47: The New function in Provider (New, Provider) hardcodes a
llmclient.CircuitBreakerConfig (FailureThreshold, SuccessThreshold, Timeout)
duplicating logic across providers; change it to read the circuit breaker
settings from the incoming opts (e.g., opts.Resilience.CircuitBreaker or a
shared providers.DefaultCircuitBreakerConfig) and pass that value into
llmclient.Config.CircuitBreaker so callers can override and to remove duplicated
hardcoded values; update New to fallback to a shared default config when opts
does not provide one and ensure types match llmclient.CircuitBreakerConfig.
In `@internal/providers/openai/openai.go`:
- Around line 34-44: This block duplicates a hardcoded CircuitBreakerConfig when
constructing llmclient.Config (ProviderName: "openai", CircuitBreaker:
&llmclient.CircuitBreakerConfig{FailureThreshold:5,SuccessThreshold:2,Timeout:30*time.Second}),
so replace the duplicated literal with a shared helper or reuse the resilience
settings from opts: either call a common helper like buildLLMClientConfig(opts,
"openai") that returns an llmclient.Config with the CircuitBreakerConfig
standardized, or read values from opts.Resilience.CircuitBreaker and assign them
to cfg.CircuitBreaker to avoid DRY; update the openai config construction (cfg
variable and its CircuitBreaker field) to use that shared helper or opt-driven
values and remove the hardcoded numeric literals.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
internal/providers/factory_test.go (1)
177-244: 🧹 Nitpick | 🔵 Trivial
TestProviderFactory_SetHooksandTestProviderFactory_HooksPassedToBuilderare identical.Both tests create a factory, set hooks, register a provider capturing
ProviderOptions, callCreate, and assertreceivedOpts.Hooks.OnRequestStart != nil. One of these should be removed or differentiated (e.g., testOnRequestEndin the second).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/factory_test.go` around lines 177 - 244, These two tests (TestProviderFactory_SetHooks and TestProviderFactory_HooksPassedToBuilder) are duplicates; either remove one or change the second to validate a different hook. For a minimal change, modify TestProviderFactory_HooksPassedToBuilder so its mockHooks sets OnRequestEnd (instead of OnRequestStart) and assert receivedOpts.Hooks.OnRequestEnd != nil after calling factory.Create; keep the same Registration.New capture of ProviderOptions and use NewProviderFactory, SetHooks, ProviderOptions, Registration, Create and receivedOpts to locate where to change the mock and the assertion.internal/app/app.go (1)
46-58:⚠️ Potential issue | 🟡 MinorMissing nil check on
cfg.AppConfig.Config— potential nil-pointer dereference.Line 47 validates
cfg.AppConfig != nil, but line 54 dereferencescfg.AppConfig.Configwithout checking it. IfLoad()or test code ever returns aLoadResultwith a nilConfig, this will panic.🛡️ Proposed fix
if cfg.AppConfig == nil { return nil, fmt.Errorf("app config is required") } + if cfg.AppConfig.Config == nil { + return nil, fmt.Errorf("app config is required") + } if cfg.Factory == nil { return nil, fmt.Errorf("factory is required") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/app/app.go` around lines 46 - 58, The constructor New currently dereferences cfg.AppConfig.Config without checking for nil; update New to validate that cfg.AppConfig is non-nil and that cfg.AppConfig.Config is also non-nil before using it (the variable appCfg and the App struct initialization). If cfg.AppConfig.Config is nil return a descriptive error (e.g., "app config content is required") so the function fails gracefully instead of panicking.internal/providers/init.go (1)
145-186: 🧹 Nitpick | 🔵 Trivial
initializeProvidersnever returns a non-nil error.The function signature returns
(int, error)but always returnsnilfor the error (line 186). The caller at line 67 checkserr != niland performs cleanup, but this branch is unreachable. Either remove the error return value, or consider returning an error for fatal conditions (e.g., if all providers fail with non-availability errors).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/init.go` around lines 145 - 186, The function initializeProviders currently always returns nil error; change it to return a meaningful error when initialization is fatal: track create errors (e.g., capture the first non-availability error from factory.Create) while iterating over providerMap in initializeProviders, continue skipping providers on availability failures as before, and after the loop if count == 0 and you recorded at least one create error return 0 and a wrapped error (e.g., fmt.Errorf("failed to initialize any providers: %w", firstErr)); otherwise return count, nil; use the existing symbols factory.Create, core.AvailabilityChecker / checker.CheckAvailability, registry.RegisterProviderWithType and ProviderConfig/ProviderFactory/ModelRegistry to locate the logic to modify.config/config_test.go (1)
10-22: 🧹 Nitpick | 🔵 Trivial
t.Setenv+os.Unsetenvpattern is correct but could use a brief comment.The double-call ensures the variable is truly unset (not just empty) during the test, while
t.Setenv's cleanup restores the original value afterward. Consider a one-line comment explaining the intent for future maintainers unfamiliar with this pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/config_test.go` around lines 10 - 22, Add a one-line comment above clearProviderEnvVars explaining why both t.Setenv and os.Unsetenv are used: t.Setenv("") ensures the test restores the original env after completion while os.Unsetenv actually removes the variable for the duration of the test, so we call both to guarantee the variable is truly unset but still restored by the test harness; reference the clearProviderEnvVars function and the t.Setenv/os.Unsetenv calls when placing the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/gomodel/main.go`:
- Around line 57-62: The code now uses factory.Add (seen in main.go calls to
factory.Add with providers like openai.Registration, anthropic.Registration,
etc.) but the documentation still references factory.Register; update the docs
(e.g., CLAUDE.md and any README or guideline files) to reflect the new API name
factory.Add (or note the alias if both exist) and adjust examples to show
factory.Add(provider.Registration) so documentation and code are consistent.
In `@config/config_test.go`:
- Around line 331-376: Add unit tests in internal/providers/config_test.go to
cover resolveProviders, buildProviderConfigs, buildProviderConfig,
filterEmptyProviders, and applyProviderEnvVars: write focused table-driven tests
that validate resilience merge behavior (global vs per-provider overrides)
including partial overrides, nil resilience, nil Retry, and zero-value Retry
fields; test filterEmptyProviders special-case handling for "ollama" vs other
empty providers; and test applyProviderEnvVars env-var precedence and overlay
semantics by setting env vars and verifying resulting provider configs.
Reference the functions resolveProviders, buildProviderConfigs,
buildProviderConfig, filterEmptyProviders, and applyProviderEnvVars when adding
tests and assert expected RawProviders/provider configs for each scenario.
In `@config/config.go`:
- Around line 286-310: The ResilienceConfig and RetryConfig types (RetryConfig,
ResilienceConfig, DefaultRetryConfig) must be made overrideable via environment
variables and the env override routine must support their types: add `env`
struct tags to each RetryConfig field (MaxRetries, InitialBackoff, MaxBackoff,
BackoffFactor, JitterFactor) and to ResilienceConfig.Retry so env keys can be
discovered, then extend applyEnvOverrides to handle float64 (use
strconv.ParseFloat) and time.Duration (use time.ParseDuration) in addition to
string/bool/int parsing so InitialBackoff/MaxBackoff and
BackoffFactor/JitterFactor are applied when env vars are present; keep existing
behavior for ints/strings/bools and ensure parsing errors are handled/logged or
returned consistently.
In `@internal/app/app.go`:
- Around line 38-42: Add Go doc comments for the exported fields of the Config
struct: document AppConfig (e.g., "AppConfig holds the loaded application
configuration used to initialize the App" referencing config.LoadResult) and
Factory (e.g., "Factory provides the providers.ProviderFactory used to construct
provider instances for the App"); update the comment style to full sentences and
ensure each exported field in Config has its own line comment immediately above
it.
In `@internal/providers/config.go`:
- Around line 10-18: Remove the misleading yaml struct tags from the resolved
internal type ProviderConfig (defined in internal/providers/config.go) because
this type is produced by buildProviderConfig and not unmarshalled from YAML;
update the struct declaration for ProviderConfig to keep the fields Type,
APIKey, BaseURL, Models, and Resilience but remove the `yaml:"..."` tags so the
code reflects its role as a resolved/internal config type.
- Around line 47-80: The current applyProviderEnvVars function shallow-copies
RawProviderConfig entries, causing RawProviderConfig.Resilience (a
*RawResilienceConfig) to be shared between original and copy; change the initial
copy logic so each entry gets a deep copy of Resilience: when iterating the raw
map in applyProviderEnvVars, create a local copy of v (e.g., copy := v) and if
v.Resilience != nil allocate a new RawResilienceConfig, copy the pointed value
into it, and set copy.Resilience = &newValue before assigning result[k] = copy;
do the same defensive clone before mutating existing (the existing variable) so
updates to APIKey/BaseURL never leave Resilience aliased.
- Around line 82-96: The filterEmptyProviders function currently skips Ollama
unless p.BaseURL != "" so providers relying on Ollama's built-in default are
dropped; update the logic in filterEmptyProviders to treat any provider with
p.Type == "ollama" as valid (i.e., remove the BaseURL check) so the Ollama entry
is added to result unconditionally, while keeping the existing APIKey check for
other providers (references: function filterEmptyProviders, variables p.Type,
p.BaseURL, p.APIKey, and map result).
In `@internal/providers/factory.go`:
- Around line 49-54: In ProviderFactory.Add, validate the incoming Registration
before storing it: ensure reg.Type is not empty and reg.New (the constructor) is
not nil; if either check fails, do not insert into f.builders and surface a
clear error (either return an error from Add or panic/log.Fatalf with a
descriptive message) so that Create (which calls the constructor) cannot later
panic on a nil function; update the Add implementation to perform these guards
referencing ProviderFactory.Add, the Registration struct fields Type and New,
and the Create path that invokes the stored constructor.
In `@internal/providers/init.go`:
- Around line 51-56: Init currently lacks a nil check for the result parameter
and will panic when accessing result.RawProviders or result.Config.Resilience;
update the Init function to validate that result is not nil at the top (similar
to the existing factory nil-check) and return a descriptive error if it is nil
before calling resolveProviders or referencing result.Config.Resilience.
---
Outside diff comments:
In `@config/config_test.go`:
- Around line 10-22: Add a one-line comment above clearProviderEnvVars
explaining why both t.Setenv and os.Unsetenv are used: t.Setenv("") ensures the
test restores the original env after completion while os.Unsetenv actually
removes the variable for the duration of the test, so we call both to guarantee
the variable is truly unset but still restored by the test harness; reference
the clearProviderEnvVars function and the t.Setenv/os.Unsetenv calls when
placing the comment.
In `@internal/app/app.go`:
- Around line 46-58: The constructor New currently dereferences
cfg.AppConfig.Config without checking for nil; update New to validate that
cfg.AppConfig is non-nil and that cfg.AppConfig.Config is also non-nil before
using it (the variable appCfg and the App struct initialization). If
cfg.AppConfig.Config is nil return a descriptive error (e.g., "app config
content is required") so the function fails gracefully instead of panicking.
In `@internal/providers/factory_test.go`:
- Around line 177-244: These two tests (TestProviderFactory_SetHooks and
TestProviderFactory_HooksPassedToBuilder) are duplicates; either remove one or
change the second to validate a different hook. For a minimal change, modify
TestProviderFactory_HooksPassedToBuilder so its mockHooks sets OnRequestEnd
(instead of OnRequestStart) and assert receivedOpts.Hooks.OnRequestEnd != nil
after calling factory.Create; keep the same Registration.New capture of
ProviderOptions and use NewProviderFactory, SetHooks, ProviderOptions,
Registration, Create and receivedOpts to locate where to change the mock and the
assertion.
In `@internal/providers/init.go`:
- Around line 145-186: The function initializeProviders currently always returns
nil error; change it to return a meaningful error when initialization is fatal:
track create errors (e.g., capture the first non-availability error from
factory.Create) while iterating over providerMap in initializeProviders,
continue skipping providers on availability failures as before, and after the
loop if count == 0 and you recorded at least one create error return 0 and a
wrapped error (e.g., fmt.Errorf("failed to initialize any providers: %w",
firstErr)); otherwise return count, nil; use the existing symbols
factory.Create, core.AvailabilityChecker / checker.CheckAvailability,
registry.RegisterProviderWithType and
ProviderConfig/ProviderFactory/ModelRegistry to locate the logic to modify.
---
Duplicate comments:
In `@internal/providers/factory_test.go`:
- Around line 274-310: The test
TestProviderFactory_Create_PassesResilienceConfig only asserts MaxRetries and
JitterFactor and should assert the remaining RetryConfig fields to prevent
selective-copy regressions; update the test that constructs cfg (ProviderConfig
with Resilience ResilienceConfig) and after factory.Create(...) add assertions
on receivedOpts (the ProviderOptions captured in the New func) to check
Retry.InitialBackoff, Retry.MaxBackoff, and Retry.BackoffFactor match the values
set in the resilience variable so all five RetryConfig fields are validated.
In `@internal/providers/init.go`:
- Around line 43-48: The doc comment mentioning "Background refresh scheduling
(hourly)" is stale; update the comment that documents background refresh
behavior to state the refresh interval is configurable via
result.Config.Cache.RefreshInterval with a fallback/default of 1 hour rather
than hardcoding "hourly"—locate the comment block that lists "Background refresh
scheduling (hourly)" (near the provider init logic and cache usage) and change
the wording to explicitly mention the configurable RefreshInterval and its
default.
There was a problem hiding this comment.
Pull request overview
This pull request adds resilience configuration management to the GOModel gateway, enabling global retry settings with optional per-provider overrides. The changes introduce a clean separation between configuration loading and provider resolution, moving provider-specific logic out of the config package and into the providers package.
Changes:
- Introduced
ResilienceConfigandRetryConfigtypes shared between config and llmclient packages - Refactored provider initialization to use
ProviderOptionspattern (bundles hooks + resilience config) - Split configuration loading into
LoadResultcontaining both app config and raw provider data - Added provider config resolution logic (env var overlay, credential filtering, resilience merging)
- Renamed factory method from
Register()toAdd()for consistency
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| config/config.go | Added ResilienceConfig, RetryConfig, RawProviderConfig types; refactored Load() to return LoadResult |
| config/config.example.yaml | Added resilience configuration example with global and per-provider override documentation |
| config/config_test.go | Updated tests for LoadResult return type; added test for resilience config merging |
| config/config_helpers_test.go | Removed obsolete provider filtering tests; added CACHE_REFRESH_INTERVAL test |
| config/config_example_test.go | Simplified test to use LoadResult |
| internal/providers/config.go | New file with provider resolution logic (env vars, filtering, resilience merging) |
| internal/providers/factory.go | Introduced ProviderOptions pattern; renamed Register() to Add() |
| internal/providers/factory_test.go | Updated tests for ProviderOptions; added resilience config passing test |
| internal/providers/init.go | Refactored Init() to accept LoadResult and handle provider resolution |
| internal/providers/*/*.go | Updated all provider constructors to use ProviderOptions instead of Hooks |
| internal/llmclient/client.go | Refactored to use config.RetryConfig instead of individual retry fields |
| internal/llmclient/client_test.go | Updated tests to access retry settings via config.Retry field |
| internal/app/app.go | Updated to use LoadResult and pass to providers.Init() |
| cmd/gomodel/main.go | Updated to use LoadResult and factory.Add() instead of Register() |
| tests/integration/setup_test.go | Updated test setup to use LoadResult and factory.Add() |
| CLAUDE.md | Updated documentation for resilience config and factory.Add() usage (one typo remains) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CLAUDE.md (1)
108-108:⚠️ Potential issue | 🟡 MinorStale
factory.Register()references — update tofactory.Add()Line 50 was correctly updated to
factory.Add(), but lines 108 and 180 still reference the oldfactory.Register()name. A developer following the "Adding a Provider" guide (step 3) or the Key Details list will use the wrong function name.📝 Proposed fix
-3. Register in `cmd/gomodel/main.go` via `factory.Register({name}.Registration)` +3. Register in `cmd/gomodel/main.go` via `factory.Add({name}.Registration)`-1. Providers are registered explicitly via `factory.Register()` in main.go — order matters, first registered wins for duplicate model names +1. Providers are registered explicitly via `factory.Add()` in main.go — order matters, first registered wins for duplicate model namesAlso applies to: 180-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` at line 108, Update the documentation references from the old factory.Register() to the current factory.Add() so users call the correct function; specifically change occurrences that instruct to "Register in cmd/gomodel/main.go via factory.Register({name}.Registration)" to "Add in cmd/gomodel/main.go via factory.Add({name}.Registration)" and ensure any other mentions (e.g., in the "Adding a Provider" step and Key Details list) use factory.Add() consistently instead of factory.Register().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 194: The documentation references a non-existent type
ProviderConfigInput; update the text to use the actual per-provider override
type RawProviderConfig (as defined in config/config.go) so readers aren't
confused. Change the sentence that currently reads "Provider-level overrides use
nullable `ProviderConfigInput`" to instead reference `RawProviderConfig`, and
keep the rest of the sentence intact (the merge via `buildProviderConfig()` and
use of `config.RetryConfig` stay the same).
---
Outside diff comments:
In `@CLAUDE.md`:
- Line 108: Update the documentation references from the old factory.Register()
to the current factory.Add() so users call the correct function; specifically
change occurrences that instruct to "Register in cmd/gomodel/main.go via
factory.Register({name}.Registration)" to "Add in cmd/gomodel/main.go via
factory.Add({name}.Registration)" and ensure any other mentions (e.g., in the
"Adding a Provider" step and Key Details list) use factory.Add() consistently
instead of factory.Register().
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
docs/adr/0001-explicit-provider-registration.md (1)
28-28:⚠️ Potential issue | 🟡 MinorStale reference: "Register calls" should be "Add calls".
Line 13 was updated to
factory.Add(...)but the Consequences section still says "6 explicit Register calls." This inconsistency could confuse readers.📝 Proposed fix
-- Slightly more boilerplate in main.go (6 explicit Register calls) +- Slightly more boilerplate in main.go (6 explicit Add calls)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/adr/0001-explicit-provider-registration.md` at line 28, Update the stale phrasing in the Consequences section: replace the phrase "6 explicit Register calls" with "6 explicit Add calls" (or rephrase to "6 explicit factory.Add calls") to match the change on line 13 where factory.Add(...) is used; edit the text in the Consequences paragraph to use "Add" instead of "Register" so the terminology is consistent with the implementation (look for the literal "Register calls" in the ADR and update to "Add calls" referencing factory.Add).internal/providers/openai/openai.go (1)
44-55: 🧹 Nitpick | 🔵 Trivial
NewWithHTTPClientis test-only code; consider aligning test resilience configuration with production.This constructor uses
llmclient.DefaultConfigwith onlyHooksconfigured, meaning tests use hardcoded retry/circuit-breaker defaults rather than the application's resilience configuration. To keep test behavior aligned with production, these test helpers could acceptProviderOptionsorconfig.ResilienceConfigto apply the same resilience settings used in production. This pattern appears consistently across all provider implementations (openai, xai, groq, gemini, anthropic, ollama).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/openai/openai.go` around lines 44 - 55, NewWithHTTPClient currently builds a llmclient config with hardcoded defaults (using llmclient.DefaultConfig) so tests don't pick up the app's resilience settings; modify NewWithHTTPClient (and similar provider constructors) to accept and forward a Producer-level resilience config (e.g., add a ProviderOptions or config.ResilienceConfig parameter) and use that to populate the llmclient config instead of relying on DefaultConfig (i.e., set retry/circuit-breaker/timeouts on the cfg you pass to llmclient.NewWithHTTPClient); update the Provider struct/constructor signature (Provider, NewWithHTTPClient) to store or consume the resilience settings so tests and production share the same resilience behavior.internal/providers/gemini/gemini.go (3)
140-157:⚠️ Potential issue | 🟡 Minor
ListModelscreates a temporary client without resilience settings.The
modelsClientat line 146 usesllmclient.DefaultConfig(...)which doesn't carry the retry or circuit-breaker config from the provider'sProviderOptions. Model listing requests to the native Gemini API won't benefit from the configured resilience settings, unlike chat/completion requests which go throughp.client.Consider storing the resilience config on the
Providerstruct and applying it to the models client config as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 140 - 157, ListModels creates a temporary modelsClient using llmclient.DefaultConfig("gemini", p.modelsURL) which omits the Provider's resilience settings, so propagate the same retry/circuit-breaker config used by p.client (or stored ProviderOptions) into modelsCfg before creating modelsClient; update the Provider struct to persist the resilience config (or expose p.client.Config/resilience fields), then copy those settings into modelsCfg (e.g., attach retry, timeout, and circuit-breaker hooks/transport options) so model listing benefits from the same resilience policies as chat/completion paths.
57-72: 🧹 Nitpick | 🔵 Trivial
NewWithHTTPClientuses oldDefaultConfigpath — same gap as Ollama.Same issue as in
ollama.go: this constructor doesn't wire resilience config. If it's a test-only path, a comment would help clarify intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 57 - 72, NewWithHTTPClient currently builds cfg via llmclient.DefaultConfig(...) but doesn't wire the resilience settings like the primary constructor does; update NewWithHTTPClient so the local cfg includes the same resilience configuration used by the standard Provider constructor before creating p.client (i.e., populate cfg.Resilience or call the same helper used elsewhere), referencing NewWithHTTPClient, Provider, cfg and the p.client = llmclient.NewWithHTTPClient(...) line; if this path is intended only for tests, add a clear comment stating it's test-only instead of changing behavior.
84-86:⚠️ Potential issue | 🟡 MinorStandardize
X-Request-IDheader casing to uppercase D across all providers.Gemini and Anthropic use
X-Request-Id(lowercase d) while Groq, XAI, Ollama, and the rest of the codebase useX-Request-ID(uppercase D). The coding guideline specifiesX-Request-IDas the standard format. Although HTTP headers are case-insensitive, align to the documented standard to maintain consistency across providers and the audit log propagation chain.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 84 - 86, Update the header key used in the Gemini provider to match the project's standard `X-Request-ID` casing: locate the usage where core.GetRequestID(req.Context()) is retrieved and req.Header.Set("X-Request-Id", ...) is called in the function handling outgoing requests, and change the header name string to "X-Request-ID" so it matches other providers and the documented convention.internal/providers/factory_test.go (1)
199-265: 🧹 Nitpick | 🔵 Trivial
TestProviderFactory_SetHooksandTestProviderFactory_HooksPassedToBuildertest the same behavior.These two tests are effectively identical — both register a provider with hooks, call
Create, and assertreceivedOpts.Hooks.OnRequestStart != nil. Consider consolidating them to reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/factory_test.go` around lines 199 - 265, Two tests (TestProviderFactory_SetHooks and TestProviderFactory_HooksPassedToBuilder) are duplicates; remove one and keep a single clear test that verifies ProviderFactory.SetHooks results in hooks being passed into the provider builder via ProviderOptions. Update the remaining test (e.g., keep TestProviderFactory_HooksPassedToBuilder) to perform the existing setup: create NewProviderFactory(), call SetHooks(mockHooks), Add a Registration with New capturing receivedOpts, call factory.Create(cfg) and assert receivedOpts.Hooks.OnRequestStart != nil; delete the redundant TestProviderFactory_SetHooks to avoid duplicated coverage.internal/llmclient/client.go (3)
140-145: 🧹 Nitpick | 🔵 Trivial
getBaseURLis redundant withBaseURL()and carries a misleading comment.Both methods are byte-for-byte identical, yet
getBaseURL's comment says "already holding lock or single-threaded" while the function actually acquiresc.mu.RLock(). If any goroutine callsRWMutex.Lockwhile the lock is already held by one or more readers, concurrent calls toRWMutex.RLockwill block, and recursive read-locking is prohibited. The misleading comment risks a future caller holding a write lock calling this method (deadlock), or believing the lock is unnecessary (data race). RemovegetBaseURLand useBaseURL()directly frombuildRequest.♻️ Proposed fix
-// getBaseURL returns the base URL for internal use (already holding lock or single-threaded) -func (c *Client) getBaseURL() string { - c.mu.RLock() - defer c.mu.RUnlock() - return c.config.BaseURL -}In
buildRequest(line 522):- url := c.getBaseURL() + req.Endpoint + url := c.BaseURL() + req.Endpoint🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/llmclient/client.go` around lines 140 - 145, Remove the redundant getBaseURL method from type Client (which misleadingly acquires a read-lock despite its comment) and update buildRequest to call the existing BaseURL() accessor instead; ensure all other callers of getBaseURL are replaced with BaseURL(), and verify no code retains the misleading comment or extra read-lock acquisition (references: getBaseURL, BaseURL, buildRequest, Client.mu, config.BaseURL).
89-124: 🧹 Nitpick | 🔵 TrivialDuplicate circuit-breaker initialization in
NewandNewWithHTTPClient.Lines 96–102 and 115–121 are identical. Extract to a private helper to avoid future divergence.
♻️ Proposed refactor
+func initCircuitBreaker(cfg Config) *circuitBreaker { + if cfg.CircuitBreaker.FailureThreshold > 0 { + return newCircuitBreaker( + cfg.CircuitBreaker.FailureThreshold, + cfg.CircuitBreaker.SuccessThreshold, + cfg.CircuitBreaker.Timeout, + ) + } + return nil +} + func New(cfg Config, headerSetter HeaderSetter) *Client { c := &Client{ httpClient: httpclient.NewDefaultHTTPClient(), config: cfg, headerSetter: headerSetter, + circuitBreaker: initCircuitBreaker(cfg), } - - if cfg.CircuitBreaker.FailureThreshold > 0 { - c.circuitBreaker = newCircuitBreaker( - cfg.CircuitBreaker.FailureThreshold, - cfg.CircuitBreaker.SuccessThreshold, - cfg.CircuitBreaker.Timeout, - ) - } return c } func NewWithHTTPClient(httpClient *http.Client, cfg Config, headerSetter HeaderSetter) *Client { c := &Client{ httpClient: httpClient, config: cfg, headerSetter: headerSetter, + circuitBreaker: initCircuitBreaker(cfg), } - - if cfg.CircuitBreaker.FailureThreshold > 0 { - c.circuitBreaker = newCircuitBreaker( - cfg.CircuitBreaker.FailureThreshold, - cfg.CircuitBreaker.SuccessThreshold, - cfg.CircuitBreaker.Timeout, - ) - } return c }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/llmclient/client.go` around lines 89 - 124, Both New and NewWithHTTPClient duplicate the circuit-breaker init block; extract that logic into a private helper (e.g., initCircuitBreaker or setCircuitBreaker) that accepts the Client pointer and Config (or reads c.config) and sets c.circuitBreaker when cfg.CircuitBreaker.FailureThreshold > 0, then call this helper from both New and NewWithHTTPClient to remove the duplicated lines in functions New and NewWithHTTPClient.
235-301:⚠️ Potential issue | 🟠 MajorCircuit breaker is not re-evaluated between retry iterations — open circuit is bypassed.
Allow()is called once before the retry loop (line 235). Inside the loop,RecordFailure()(lines 267–268, 274–276) can open the circuit. No subsequent retry iteration re-checksAllow()before callingdoRequest(). Any failure that trips the breaker mid-loop still sends the remaining retry attempts directly to a degraded provider, defeating the purpose of the circuit breaker.This is especially dangerous with a low
FailureThreshold(e.g., 1–2 in tests) or under concurrent load where failures from other goroutines open the circuit between retries of a single request.🛡️ Proposed fix: re-check Allow() at the top of each retry iteration
for attempt := 0; attempt < maxAttempts; attempt++ { + // Re-check circuit breaker before each retry to abort if it opened mid-loop. + if attempt > 0 && c.circuitBreaker != nil && !c.circuitBreaker.Allow() { + err := core.NewProviderError(c.config.ProviderName, http.StatusServiceUnavailable, + "circuit breaker is open - provider temporarily unavailable", nil) + callEndHook(http.StatusServiceUnavailable, err) + return nil, err + } if attempt > 0 { // Calculate backoff duration with jitter🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/llmclient/client.go` around lines 235 - 301, The circuit breaker Allow() is only checked once before the retry loop so retries can proceed even if RecordFailure() opens the circuit mid-loop; inside the for attempt loop in client.go (the block using calculateBackoff, doRequest, isRetryable, RecordFailure, RecordSuccess and callEndHook) add a re-check at the top of each iteration: call c.circuitBreaker.Allow() (if c.circuitBreaker != nil) and if it returns false create the same provider error used earlier (core.NewProviderError with http.StatusServiceUnavailable and the existing message), call callEndHook with StatusServiceUnavailable and that error, and return nil,error to stop further retries.config/config_test.go (1)
26-46:⚠️ Potential issue | 🟡 MinorAdd missing resilience, cache refresh, and admin/guardrails environment variables to the cleanup function.
The cleanup function omits several environment variables that have
env:struct tags inconfig.go:
CACHE_REFRESH_INTERVALRETRY_MAX_RETRIES,RETRY_INITIAL_BACKOFF,RETRY_MAX_BACKOFF,RETRY_BACKOFF_FACTOR,RETRY_JITTER_FACTORCIRCUIT_BREAKER_FAILURE_THRESHOLD,CIRCUIT_BREAKER_SUCCESS_THRESHOLD,CIRCUIT_BREAKER_TIMEOUTADMIN_ENDPOINTS_ENABLED,ADMIN_UI_ENABLED,GUARDRAILS_ENABLEDWithout clearing these, tests that set them will leak state to subsequent tests, causing non-deterministic failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/config_test.go` around lines 26 - 46, The clearAllConfigEnvVars helper is missing several env vars used in config.go; update the slice inside clearAllConfigEnvVars (and ensure clearProviderEnvVars remains called) to also unset CACHE_REFRESH_INTERVAL, the retry set RETRY_MAX_RETRIES RETRY_INITIAL_BACKOFF RETRY_MAX_BACKOFF RETRY_BACKOFF_FACTOR RETRY_JITTER_FACTOR, the circuit breaker set CIRCUIT_BREAKER_FAILURE_THRESHOLD CIRCUIT_BREAKER_SUCCESS_THRESHOLD CIRCUIT_BREAKER_TIMEOUT, and the admin/guardrails flags ADMIN_ENDPOINTS_ENABLED ADMIN_UI_ENABLED GUARDRAILS_ENABLED so tests that modify those env vars are reliably cleaned up by clearAllConfigEnvVars.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 181: The phrase "Router requires initialized registry — check
`ModelCount() > 0` before routing" has a minor grammar issue; update it to read
"Router requires an initialized registry — check `ModelCount() > 0` before
routing" by inserting "an" before "initialized" in the CLAUDE.md line containing
that exact sentence.
In `@config/config.go`:
- Around line 515-528: The reflect.Int64 switch branch currently assumes every
int64 field is a time.Duration; update the branch in the config parsing function
to first check the concrete field type (e.g., compare field.Type or field.Type()
to reflect.TypeOf(time.Duration(0))). If the field is time.Duration, parse with
time.ParseDuration and SetInt as done; otherwise handle a plain int64 (e.g., use
strconv.ParseInt and fieldVal.SetInt) or return a clear error. Update or add a
brief comment explaining the assumption and the disambiguation using
reflect.TypeOf(time.Duration(0)) and reference the existing symbols: the
reflect.Int64 case, field.Type/field.Type(), time.ParseDuration,
fieldVal.SetInt, and strconv.ParseInt.
In `@internal/app/app.go`:
- Around line 58-62: The code dereferences cfg.AppConfig.Config into appCfg
without validating it, risking a nil-pointer panic (later used as
appCfg.Usage.Enabled); update the New App initialization to check whether
cfg.AppConfig.Config is nil after confirming cfg.AppConfig is non-nil, and
return an error (or set a safe default) if it is nil before constructing
&App{config: appCfg}; ensure references to appCfg (e.g., in App methods or where
Usage.Enabled is read) are only done when appCfg is non-nil.
In `@internal/llmclient/client_test.go`:
- Around line 731-733: Update the TestDefaultConfig to assert the concrete
expected defaults for the circuit breaker instead of only checking
FailureThreshold != 0: verify that the DefaultCircuitBreakerConfig() (or the
config returned in TestDefaultConfig) sets FailureThreshold == 5,
SuccessThreshold == 2, and Timeout == 30*time.Second (reference the
CircuitBreaker struct fields FailureThreshold, SuccessThreshold, Timeout and the
DefaultCircuitBreakerConfig/ TestDefaultConfig functions to locate where to
change the assertions).
In `@internal/llmclient/client.go`:
- Around line 57-64: Add Go doc comments for the exported Config fields Retry
and CircuitBreaker by updating the Config struct declarations (Config.Retry and
Config.CircuitBreaker) to include short descriptive comments above each field
following Go doc conventions (e.g., "Retry specifies ...", "CircuitBreaker
configures ..."), ensuring the comments explain purpose and behaviour for public
API consumers and match style used for other exported fields like ProviderName
and BaseURL.
In `@internal/providers/config_test.go`:
- Around line 341-346: The tests TestApplyProviderEnvVars_SkipsWhenNoEnvVars and
TestResolveProviders_NoProvidersNoEnvVars are fragile because
applyProviderEnvVars reads real process env vars; before calling
applyProviderEnvVars (or resolving providers) call t.Setenv for each known
provider environment variable (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY,
COHERE_API_KEY, AZURE_OPENAI_API_KEY, etc.) to explicitly clear them in the test
harness so the function returns an empty map and the assertion is deterministic;
update both tests to clear that same canonical list of provider env var names
using t.Setenv("",) for each variable prior to assertions.
---
Outside diff comments:
In `@config/config_test.go`:
- Around line 26-46: The clearAllConfigEnvVars helper is missing several env
vars used in config.go; update the slice inside clearAllConfigEnvVars (and
ensure clearProviderEnvVars remains called) to also unset
CACHE_REFRESH_INTERVAL, the retry set RETRY_MAX_RETRIES RETRY_INITIAL_BACKOFF
RETRY_MAX_BACKOFF RETRY_BACKOFF_FACTOR RETRY_JITTER_FACTOR, the circuit breaker
set CIRCUIT_BREAKER_FAILURE_THRESHOLD CIRCUIT_BREAKER_SUCCESS_THRESHOLD
CIRCUIT_BREAKER_TIMEOUT, and the admin/guardrails flags ADMIN_ENDPOINTS_ENABLED
ADMIN_UI_ENABLED GUARDRAILS_ENABLED so tests that modify those env vars are
reliably cleaned up by clearAllConfigEnvVars.
In `@docs/adr/0001-explicit-provider-registration.md`:
- Line 28: Update the stale phrasing in the Consequences section: replace the
phrase "6 explicit Register calls" with "6 explicit Add calls" (or rephrase to
"6 explicit factory.Add calls") to match the change on line 13 where
factory.Add(...) is used; edit the text in the Consequences paragraph to use
"Add" instead of "Register" so the terminology is consistent with the
implementation (look for the literal "Register calls" in the ADR and update to
"Add calls" referencing factory.Add).
In `@internal/llmclient/client.go`:
- Around line 140-145: Remove the redundant getBaseURL method from type Client
(which misleadingly acquires a read-lock despite its comment) and update
buildRequest to call the existing BaseURL() accessor instead; ensure all other
callers of getBaseURL are replaced with BaseURL(), and verify no code retains
the misleading comment or extra read-lock acquisition (references: getBaseURL,
BaseURL, buildRequest, Client.mu, config.BaseURL).
- Around line 89-124: Both New and NewWithHTTPClient duplicate the
circuit-breaker init block; extract that logic into a private helper (e.g.,
initCircuitBreaker or setCircuitBreaker) that accepts the Client pointer and
Config (or reads c.config) and sets c.circuitBreaker when
cfg.CircuitBreaker.FailureThreshold > 0, then call this helper from both New and
NewWithHTTPClient to remove the duplicated lines in functions New and
NewWithHTTPClient.
- Around line 235-301: The circuit breaker Allow() is only checked once before
the retry loop so retries can proceed even if RecordFailure() opens the circuit
mid-loop; inside the for attempt loop in client.go (the block using
calculateBackoff, doRequest, isRetryable, RecordFailure, RecordSuccess and
callEndHook) add a re-check at the top of each iteration: call
c.circuitBreaker.Allow() (if c.circuitBreaker != nil) and if it returns false
create the same provider error used earlier (core.NewProviderError with
http.StatusServiceUnavailable and the existing message), call callEndHook with
StatusServiceUnavailable and that error, and return nil,error to stop further
retries.
In `@internal/providers/factory_test.go`:
- Around line 199-265: Two tests (TestProviderFactory_SetHooks and
TestProviderFactory_HooksPassedToBuilder) are duplicates; remove one and keep a
single clear test that verifies ProviderFactory.SetHooks results in hooks being
passed into the provider builder via ProviderOptions. Update the remaining test
(e.g., keep TestProviderFactory_HooksPassedToBuilder) to perform the existing
setup: create NewProviderFactory(), call SetHooks(mockHooks), Add a Registration
with New capturing receivedOpts, call factory.Create(cfg) and assert
receivedOpts.Hooks.OnRequestStart != nil; delete the redundant
TestProviderFactory_SetHooks to avoid duplicated coverage.
In `@internal/providers/gemini/gemini.go`:
- Around line 140-157: ListModels creates a temporary modelsClient using
llmclient.DefaultConfig("gemini", p.modelsURL) which omits the Provider's
resilience settings, so propagate the same retry/circuit-breaker config used by
p.client (or stored ProviderOptions) into modelsCfg before creating
modelsClient; update the Provider struct to persist the resilience config (or
expose p.client.Config/resilience fields), then copy those settings into
modelsCfg (e.g., attach retry, timeout, and circuit-breaker hooks/transport
options) so model listing benefits from the same resilience policies as
chat/completion paths.
- Around line 57-72: NewWithHTTPClient currently builds cfg via
llmclient.DefaultConfig(...) but doesn't wire the resilience settings like the
primary constructor does; update NewWithHTTPClient so the local cfg includes the
same resilience configuration used by the standard Provider constructor before
creating p.client (i.e., populate cfg.Resilience or call the same helper used
elsewhere), referencing NewWithHTTPClient, Provider, cfg and the p.client =
llmclient.NewWithHTTPClient(...) line; if this path is intended only for tests,
add a clear comment stating it's test-only instead of changing behavior.
- Around line 84-86: Update the header key used in the Gemini provider to match
the project's standard `X-Request-ID` casing: locate the usage where
core.GetRequestID(req.Context()) is retrieved and req.Header.Set("X-Request-Id",
...) is called in the function handling outgoing requests, and change the header
name string to "X-Request-ID" so it matches other providers and the documented
convention.
In `@internal/providers/openai/openai.go`:
- Around line 44-55: NewWithHTTPClient currently builds a llmclient config with
hardcoded defaults (using llmclient.DefaultConfig) so tests don't pick up the
app's resilience settings; modify NewWithHTTPClient (and similar provider
constructors) to accept and forward a Producer-level resilience config (e.g.,
add a ProviderOptions or config.ResilienceConfig parameter) and use that to
populate the llmclient config instead of relying on DefaultConfig (i.e., set
retry/circuit-breaker/timeouts on the cfg you pass to
llmclient.NewWithHTTPClient); update the Provider struct/constructor signature
(Provider, NewWithHTTPClient) to store or consume the resilience settings so
tests and production share the same resilience behavior.
---
Duplicate comments:
In `@config/config_test.go`:
- Around line 336-381: Add unit tests in internal/providers/config_test.go that
directly exercise resolveProviders, buildProviderConfig, filterEmptyProviders,
and applyProviderEnvVars: create controlled provider YAML/struct inputs and env
var overlays to assert correct merging/precedence and that Ollama providers are
filtered by filterEmptyProviders when empty; for applyProviderEnvVars assert env
vars override YAML values; for buildProviderConfig and resolveProviders assert
per-provider resilience merge behavior. Also strengthen the existing
TestLoad_ProviderResilienceInRawProviders by asserting that the anthropic
provider’s Resilience is nil (no override) in result.RawProviders to ensure
absent overrides remain unset.
In `@config/config.go`:
- Around line 295-337: Retry and circuit-breaker structs now have env tags but
you must ensure the env-override code actually parses and sets those types:
update applyEnvOverridesValue to read env vars for fields on RetryConfig and
CircuitBreakerConfig (e.g., env names like RETRY_MAX_RETRIES,
CIRCUIT_BREAKER_FAILURE_THRESHOLD) and correctly handle time.Duration (use
reflect.Int64 or parse as duration string) and float64 values so the defaults
from DefaultRetryConfig and DefaultCircuitBreakerConfig are properly overridden;
verify ResilienceConfig composition is respected when applying overrides.
In `@internal/app/app.go`:
- Around line 40-42: Update the comment for the exported field AppConfig so it
follows Go doc conventions by starting with "AppConfig" and clearly states that
the field is a pointer to config.LoadResult containing the loaded application
configuration and the raw provider data produced by config.Load; edit the
comment above the AppConfig *config.LoadResult declaration to be a short,
complete sentence beginning with "AppConfig".
In `@internal/providers/config.go`:
- Around line 10-18: ProviderConfig should omit YAML struct tags because the
resolved config is produced by buildProviderConfig rather than direct YAML
unmarshalling; ensure the struct ProviderConfig has no `yaml` tags and that
buildProviderConfig returns/constructs this resolved ProviderConfig (verify
function name buildProviderConfig) so that callers use the fully merged
ProviderConfig rather than unmarshalling into it.
- Around line 82-96: Ensure filterEmptyProviders preserves Ollama providers
unconditionally by checking p.Type == "ollama" (in filterEmptyProviders) and not
requiring BaseURL or APIKey; keep the early branch that assigns result[name] = p
when p.Type == "ollama" and only apply the APIKey presence check for non-Ollama
providers so Ollama can rely on its built-in default base URL.
In `@internal/providers/factory_test.go`:
- Around line 296-345: The test
TestProviderFactory_Create_PassesResilienceConfig correctly asserts all five
retry fields but the review message contains stray review tokens
([approve_code_changes], [duplicate_comment]); remove those stray tokens from
the review/comment metadata and keep the test as-is (ensure assertions in
TestProviderFactory_Create_PassesResilienceConfig verify MaxRetries,
InitialBackoff, MaxBackoff, BackoffFactor, and JitterFactor are propagated into
receivedOpts.Resilience.Retry).
In `@internal/providers/factory.go`:
- Around line 49-62: No change required: keep the validation panics in the Add
method as implemented; ensure the Registration type checks (reg.Type and
reg.New) remain and the constructor is stored under f.builders[reg.Type] while
holding f.mu (refer to Add, Registration, f.mu, and f.builders) so startup
detects programming errors early.
In `@internal/providers/init.go`:
- Around line 51-57: In Init(ctx, result *config.LoadResult, factory
*ProviderFactory) add a nil check for result.Config (before any access like
result.Config.Resilience/result.Config.SomeField) and return a descriptive error
(e.g., "result.Config is required") to avoid a nil dereference; update any
downstream logic that assumes result.Config is non-nil to run only after this
check so uses of result.Config.Resilience are safe.
- Around line 40-48: The doc comment for Init was updated to mention the
configurable refresh interval but the function-level documentation still doesn't
reference the cfg.Cache.RefreshInterval symbol; update the Init doc comment to
explicitly mention and align with the implementation by calling out
cfg.Cache.RefreshInterval (and any default behavior) so readers know the
background refresh scheduling is driven by cfg.Cache.RefreshInterval in Init;
locate the Init function/method and its top comment and add a short sentence
like "Background refresh scheduling uses cfg.Cache.RefreshInterval" (and note
default if applicable) to match the actual behavior.
In `@internal/providers/xai/xai.go`:
- Around line 31-42: The previous hardcoded circuit breaker has been addressed:
ensure the constructor New correctly assigns opts.Resilience.CircuitBreaker into
llmclient.Config (verify in function New that cfg.CircuitBreaker uses
opts.Resilience.CircuitBreaker), confirm Retry comes from opts.Resilience.Retry
and Hooks from opts.Hooks, and return the initialized Provider (type Provider
with field client set via llmclient.New(cfg, p.setHeaders)); no further code
changes required if these assignments are already present and correct.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 194: The sentence currently reduces "resilience config" to "(retry
settings)"; update it to explicitly include both retries and circuit-breaker
settings and correct the canonical type reference: change the parenthetical to
"(retry and circuit-breaker settings)" and replace any mention of
config.RetryConfig with config.ResilienceConfig while keeping the rest of the
sentence about RawProviderConfig and buildProviderConfig unchanged.
- Line 156: Add the global resilience.circuit_breaker defaults to the example
config under the existing resilience section: include fields failure_threshold
(5), success_threshold (2), and timeout (30s) so users know these are
configurable and not hardcoded; also show that providers can override them via
providers.<name>.resilience.circuit_breaker.* to mirror the retry override
pattern already documented for resilience.retry.*.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
It encapsulates retry config which is now manageable both globally and per provider.
Summary by CodeRabbit
New Features
Chores
Documentation