Add Ollama Support with Local Model Discovery and Embeddings#875
Add Ollama Support with Local Model Discovery and Embeddings#875Milofax wants to merge 20 commits into
Conversation
- Implement Ollama integration for LLM and embeddings - Add OllamaConfigurationPanel for UI configuration - Add model discovery service for automatic model detection - Update RAG settings to support Ollama embeddings - Add documentation (INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md) - Add E2E tests for Ollama functionality 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-instance Ollama authentication end-to-end (UI, persistence, and backend propagation), a native Ollama embeddings adapter, auth-aware model discovery and health checks, Supabase infra docs/quickstart, Playwright tests, Docker restart policies, multi-dimensional search RPCs, and PDF/OCR extraction improvements. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI as Frontend (RAGSettings / OllamaConfig)
participant API as Archon API (ollama_api)
participant Discovery as ModelDiscoveryService
participant Ollama as Ollama Instance
User->>UI: Configure Ollama URL + useAuth + authToken
UI->>API: Save ragSettings (includes per-instance tokens)
User->>UI: Trigger health check / discover models
UI->>API: GET /health or /discover with instance URLs
API->>API: Normalize URLs and map per-URL tokens from rag_strategy
API->>Discovery: check_instance_health/discover_models(url, auth_token)
Discovery->>Ollama: HTTP request (Authorization: Bearer token if provided)
Ollama-->>Discovery: health/models response
Discovery-->>API: aggregated result
API-->>UI: health/discovery result
UI-->>User: display status and available models
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…if block 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
- Add test for Ollama with auth token - Add test for Ollama without auth token (local setup) - Update fallback test to expect 'required-but-ignored' placeholder - Add RAG settings mock to multi-provider sequence test 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
archon-ui-main/src/services/credentialsService.ts (1)
495-547: Critical: Authentication tokens are not persisted to the database.The
authTokenfield is added to theOllamaInstanceinterface (line 68) and collected in the UI, but it is never saved in thesetOllamaInstancesmethod. Lines 507-528 define thefieldsobject that gets persisted, andauthTokenis not included.Impact: Users can enter authentication tokens in the UI, but these tokens will be silently discarded and not saved to the database. This breaks the per-instance authentication feature entirely.
Fix: Add
authTokento the persisted fields:const fields: Record<string, any> = { name: instance.name, baseUrl: instance.baseUrl, isEnabled: instance.isEnabled, isPrimary: instance.isPrimary, instanceType: instance.instanceType || 'both', loadBalancingWeight: instance.loadBalancingWeight || 100 }; + // Add optional authToken if present + if (instance.authToken) { + fields.authToken = instance.authToken; + } + // Add optional health-related fields if (instance.isHealthy !== undefined) {Also update the loading logic in
getOllamaInstances(around line 465) to handle theauthTokenfield when reconstructing instances from credentials.python/src/server/services/llm_provider_service.py (1)
1211-1245: Ollama instance validation doesn’t pass auth_token to health check
validate_provider_instancestill calls:health_status = await model_discovery_service.check_instance_health(instance_url)with no
auth_token, whilecheck_instance_healthnow supports token-based auth and/api/ollama/instances/healthalready uses it. For instances with authentication enabled,/api/ollama/validatewill incorrectly report them as unavailable.Consider resolving an auth token the same way as in
health_check_endpointand passing it through:from .credential_service import credential_service rag_settings = await credential_service.get_credentials_by_category("rag_strategy") llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/") embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/") chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or "" embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or "" normalized = instance_url.replace("/v1", "").rstrip("/") auth_token = None if normalized == llm_base_url: auth_token = chat_auth_token or None elif normalized == embedding_base_url: auth_token = embedding_auth_token or None health_status = await model_discovery_service.check_instance_health(instance_url, auth_token=auth_token)(ideally sharing a helper with the API route to avoid duplication).
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
786-805: Resetting instance configs dropsuseAuth/authTokenfieldsIn several places the instance config state is reset or auto‑populated without including the new auth fields:
handleDeleteLLMInstanceandhandleDeleteEmbeddingInstancereset configs to{ name: '', url: '' }.- The “auto‑populate embedding instance if it’s empty” path in the LLM URL change handler constructs an embedding config with just
nameandurl.Given the initial state includes
useAuthandauthToken, this:
- Conflicts with the inferred state type under
strictTypeScript.- Can cause React to flip checkboxes between controlled and uncontrolled (
checked={undefined}).- Silently clears any auth state when users delete or auto‑populate instances.
A simple fix is to always provide the full shape when setting these configs, e.g.:
// LLM delete - setLLMInstanceConfig({ - name: '', - url: '' - }); + setLLMInstanceConfig({ + name: '', + url: '', + useAuth: false, + authToken: '' + }); // Embedding delete - setEmbeddingInstanceConfig({ - name: '', - url: '' - }); + setEmbeddingInstanceConfig({ + name: '', + url: '', + useAuth: false, + authToken: '' + }); // Auto-populate embedding from LLM URL - if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) { - setEmbeddingInstanceConfig({ - name: llmInstanceConfig.name || 'Default Ollama', - url: newUrl - }); - } + if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) { + setEmbeddingInstanceConfig({ + name: llmInstanceConfig.name || 'Default Ollama', + url: newUrl, + useAuth: embeddingInstanceConfig.useAuth ?? false, + authToken: embeddingInstanceConfig.authToken ?? '', + }); + }This keeps state shape consistent and avoids subtle UI glitches.
Also applies to: 809-827, 2329-2354
1845-1899: Selected model summary usesCHAT_MODEL, which is never written hereThe new summary row shows:
{activeSelection === 'chat' ? ( <span className="text-white"> {ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span>} </span> ) : ( ... )}But throughout this component the chat model is managed via
MODEL_CHOICE:
- Provider buttons set
MODEL_CHOICE.OllamaModelSelectionModal’sonSelectModelupdatesMODEL_CHOICE.getDisplayedChatModelreadsMODEL_CHOICEand the provider to display the active chat model.
CHAT_MODELis defined in the props type but never written here, so the summary will often show “Not selected” even when a chat model is configured.Consider either:
- Reusing
getDisplayedChatModel(ragSettings)in this row, or- Keeping
CHAT_MODELas the canonical key and updating it wheneverMODEL_CHOICEchanges.For example:
- {ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span>} + {getDisplayedChatModel(ragSettings) || <span className="text-gray-500 italic">Not selected</span>}The authentication status row itself looks good and does not leak token values.
🧹 Nitpick comments (12)
QUICKSTART.md (1)
1-47: LGTM! Clear and concise quickstart guide.The quickstart guide provides a focused, step-by-step workflow for local development setup. The structure is clean and the instructions are easy to follow.
Optional: For improved markdown formatting, consider wrapping bare URLs in angle brackets per markdown best practices (lines 25-28):
-| **Archon UI** | http://localhost:3737 | Hauptanwendung | +| **Archon UI** | <http://localhost:3737> | Hauptanwendung |This is a minor style preference and does not affect functionality.
python/src/server/api_routes/ollama_api.py (1)
83-142: Align all Ollama health/model flows with auth-token supportYou’ve wired per-instance auth tokens into
/modelsand/instances/health, but other Ollama flows in this module still call discovery/health without tokens:
/validate→validate_provider_instance("ollama", instance_url)(service currently callscheck_instance_healthwith noauth_token)./models/discover-and-storeand/models/discover-with-detailsstill hit Ollama without any Authorization header.For auth-protected instances these will now systematically report failures even though
/instances/healthcan succeed with tokens.Recommend:
- Updating
validate_provider_instance(inllm_provider_service.py) to resolve anauth_tokenexactly the wayhealth_check_endpointdoes and pass it through tocheck_instance_health.- Planning a follow-up to add token support to the
discover-and-store/discover-with-detailsendpoints as well, or explicitly document that they currently require unauthenticated instances.Also applies to: 166-217, 262-305
archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (1)
1-191: Reduce hard-coded timeouts and brittle selectors in Playwright testsThe flows are clear, but they’re quite fragile:
- Almost every step uses
waitForTimeout(...)instead of waiting on concrete UI conditions (waitForSelector,expect(...).toBeVisible()/toBeEnabled()). This slows tests and risks flakiness.- Many selectors are generic text queries like
text=Embeddingorbutton:has-text("Config"), which can easily break on copy changes or multiple matches.- Navigation to Settings → RAG Settings → Embedding → Ollama Config is duplicated across tests.
Suggested improvements (can be incremental):
- Replace sleeps with condition-based waits around key elements.
- Prefer role-/test-id–based locators (
getByRole,getByTestId) over raw text where possible.- Extract the repeated “open Ollama config in RAG Settings” flow into a helper to DRY the suite.
python/src/server/services/embeddings/embedding_service.py (2)
107-172: NativeOllamaEmbeddingAdapter looks correct but has unbounded per-batch concurrencyThe adapter correctly targets
/api/embeddingswith{ "model": ..., "prompt": ... }and wraps HTTP/JSON failures inEmbeddingAPIError, which is consistent with the rest of the service.Two things to consider:
asyncio.gather(*...)will fire one HTTP request per text concurrently for each batch. With largeEMBEDDING_BATCH_SIZEvalues this could mean dozens or hundreds of parallel requests to a single Ollama instance. You may want to cap per-batch concurrency via a semaphore or chunking to avoid overloading local/remote instances.- The
dimensionsargument is currently ignored for native Ollama. That’s fine (Ollama controls embedding size), but it may be worth documenting in the docstring or via a comment to avoid confusion later.
287-303: Adapter selection for Ollama depends onapi_modestring value – normalize it
_get_embedding_adapterchecks:if provider_name == "ollama" and base_url: if (api_mode or "native") == "native": ...If
OLLAMA_API_MODEever comes through as"Native"/"NATIVE"or with whitespace, this will silently fall into the OpenAI-compatible branch even though the intent is “native”.Given you already read/derive
ollama_api_modeabove, consider normalizing it once (e.g.,.lower().strip()) before passing into_get_embedding_adapter, and/or do the normalization inside_get_embedding_adapter:- if (api_mode or "native") == "native": + mode = (api_mode or "native").strip().lower() + if mode == "native":This makes the behavior robust to minor config variations.
python/src/server/services/ollama/model_discovery_service.py (3)
122-133: Auth token support for/api/tagsis correct but not propagated to detail/capability callsThe new
auth_tokenparameter is correctly wired into the/api/tagsdiscovery call and keeps behavior unchanged when no token is provided. However, follow‑up calls such as_get_model_details()and the capability tests still hit Ollama endpoints without any auth header. On protected instances this will silently 401 and you’ll lose detailed metadata and accurate capability detection, even though listing itself succeeded with a token.If you want full parity for authenticated instances, consider threading
auth_tokenthrough the enrichment pipeline, e.g.:- async def discover_models(self, instance_url: str, fetch_details: bool = False, auth_token: str | None = None) -> list[OllamaModel]: + async def discover_models( + self, + instance_url: str, + fetch_details: bool = False, + auth_token: str | None = None, + ) -> list[OllamaModel]: @@ - enriched_models = await self._enrich_model_capabilities(models, instance_url, fetch_details=fetch_details) + enriched_models = await self._enrich_model_capabilities( + models, + instance_url, + fetch_details=fetch_details, + auth_token=auth_token, + )and then update
_enrich_model_capabilities,_get_model_details, and any capability helpers that hit Ollama directly to accept and forwardauth_tokenin their HTTP headers.This keeps non‑auth flows identical while making “fetch_details=True” actually usable with locked‑down Ollama instances.
Also applies to: 192-199
964-1004: Health check auth header wiring looks good; cache key ignores token changesPassing
auth_tokenand conditionally adding aBearerheader here is straightforward and matches the new health‑check API behavior. One nuance: thehealth_cachekey is onlyhealth_{instance_url}, so a failure without a token and a success with a token (or vice versa) will be cached identically for 30 seconds.If you expect users to toggle/authenticate instances and immediately rely on the health UI, consider including the auth presence in the cache key, e.g.
health_{instance_url}_{bool(auth_token)}or similar, so that “with token” and “without token” statuses don’t mask each other in the short term.
1026-1060: Per-instance auth_tokens map depends on exact URL string matchingThe
auth_tokensdict is plumbed correctly intodiscover_models_from_multiple_instances, and usingauth_tokens.get(url)is nice and simple. This does assume that the keys inauth_tokensare in the same normalized form as theinstance_urlspassed here (e.g. either both with/v1stripped and no trailing slash, or both unnormalized).If the router that constructs
auth_tokensis normalizing URLs (as in the health‑check endpoint), it would be safer to normalizeurlbefore doingauth_tokens.get(...):- tasks = [ - self.discover_models(url, fetch_details=fetch_details, auth_token=auth_tokens.get(url)) - for url in instance_urls - ] + tasks = [ + self.discover_models( + url, + fetch_details=fetch_details, + auth_token=auth_tokens.get(url.rstrip("/").replace("/v1", "")), + ) + for url in instance_urls + ]so a mismatch in
/v1or trailing slashes doesn’t quietly drop the token.archon-ui-main/src/components/settings/RAGSettings.tsx (4)
208-221: Instance config + token sync logic is solid; consider typing RAG token fieldsThe new
llmInstanceConfig/embeddingInstanceConfigshapes and theuseEffectblocks that sync fromLLM_BASE_URL/OLLAMA_*fields into these configs look logically correct:
- Refs (
lastLLMConfigRef/lastEmbeddingConfigRef) prevent infinite loops.useAuthis derived from token presence andauthTokenis updated only when the incoming value changes.The only minor concern is the reliance on
(ragSettings as any).OLLAMA_*_AUTH_TOKENto satisfy TypeScript. Since these keys are now part of your configuration surface, it would be cleaner to add them toRAGSettingsProps["ragSettings"]explicitly:ragSettings: { MODEL_CHOICE: string; CHAT_MODEL?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; @@ OLLAMA_EMBEDDING_URL?: string; OLLAMA_EMBEDDING_INSTANCE_NAME?: string;so you can drop the
as anycasts and keep strict typing end‑to‑end.Also applies to: 224-271
570-583: Model discovery fetches work but bypass TanStack Query and may over-fetch
ollamaMetricsnow tracksllmAvailableModels/embeddingAvailableModels, andloadAvailableModels()hits/api/ollama/modelsboth:
- Automatically when the instance URLs change, and
- After a successful
manualTestConnectionwhen acontextis provided.This is functionally fine and the error handling is intentionally soft, but two points:
Project guidelines say “Use TanStack Query for all data fetching”. These new fetches bypass that layer, so if you already have React Query plumbing for Ollama endpoints, consider converting
loadAvailableModelsinto a query (withenabledflags keyed on URLs) instead of a bespokefetch. That would centralize retries, caching, and error behavior.The URL‑change effects fire regardless of the active provider; if a user temporarily switches away from Ollama, you still ping
/api/ollama/modelswhenever the URL state changes. You may want to guard them:- useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== DEFAULT_OLLAMA_URL) { + useEffect(() => { + if ( + chatProvider === 'ollama' && + llmInstanceConfig.url && + llmInstanceConfig.url !== DEFAULT_OLLAMA_URL + ) { loadAvailableModels(llmInstanceConfig.url, 'chat'); } - }, [llmInstanceConfig.url]); + }, [chatProvider, llmInstanceConfig.url]);Same idea for the embedding instance. This reduces unnecessary calls when Ollama isn’t currently in use.
Also applies to: 640-691, 751-755
1595-1652: OLLAMA_API_MODE UI is clear; consider typing the setting keyThe API mode toggle between “Native Ollama API” and “OpenAI‑Compatible” is well‑designed and the defaulting logic (
nativewhen unset) is clear. The only gap is thatOLLAMA_API_MODEisn’t declared inRAGSettingsProps["ragSettings"], so allsetRagSettings({ ...ragSettings, OLLAMA_API_MODE: ... })calls rely on structural typing and thesetRagSettings: anyescape hatch.For better static safety, add:
ragSettings: { @@ OLLAMA_EMBEDDING_INSTANCE_NAME?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible';so backend and frontend stay in sync as this option evolves.
2351-2387: Auth settings in modals are wired correctly; consider clearing tokens on deleteThe new auth sections in the LLM and Embedding edit modals behave as expected:
useAuthtoggles a password input.- Tokens are stored in
llmInstanceConfig.authToken/embeddingInstanceConfig.authToken.- Saving the modal pushes tokens into
ragSettingsunderOLLAMA_CHAT_AUTH_TOKENandOLLAMA_EMBEDDING_AUTH_TOKEN.Two small follow‑ups:
When deleting an instance (LLM or Embedding), you clear
LLM_BASE_URL/OLLAMA_EMBEDDING_URLbut leave the correspondingOLLAMA_*_AUTH_TOKENinragSettings. It might be more intuitive (and safer) to also blank those keys so “delete instance” truly removes all associated configuration.The modal “Save Changes” handlers currently only update local
ragSettings; persistence happens when the user later clicks the top‑level “Save Settings” button. That matches existing behavior for other fields, but the success toast (“instance updated successfully”) might be interpreted as “persisted to server”. If you’ve had user confusion around this, consider callingcredentialsService.updateRagSettings(updatedSettings)here as well.Functionally, the core token wiring looks good and doesn’t expose secrets in the UI.
Also applies to: 2400-2405, 2446-2477, 2490-2495
| def _get_embedding_adapter( | ||
| provider: str, client: Any, base_url: str | None = None, auth_token: str | None = None, api_mode: str | None = None | ||
| ) -> EmbeddingProviderAdapter: | ||
| provider_name = (provider or "").lower() | ||
| if provider_name == "google": | ||
| search_logger.info("Using Google native embedding adapter") | ||
| return GoogleEmbeddingAdapter() | ||
| if provider_name == "ollama" and base_url: | ||
| # Check API mode - default to native if not specified or if set to native | ||
| if (api_mode or "native") == "native": | ||
| search_logger.info(f"Using Ollama native API adapter with base URL: {base_url}") | ||
| return NativeOllamaEmbeddingAdapter(base_url, auth_token) | ||
| else: | ||
| search_logger.info(f"Using OpenAI-compatible adapter for Ollama with base URL: {getattr(client, 'base_url', 'N/A')}") | ||
| return OpenAICompatibleEmbeddingAdapter(client) | ||
| search_logger.info(f"Using OpenAI-compatible embedding adapter for provider: {provider_name}") | ||
| return OpenAICompatibleEmbeddingAdapter(client) |
There was a problem hiding this comment.
Handle missing OLLAMA_EMBEDDING_URL more cleanly and avoid unnecessary Ollama config work for other providers
In create_embeddings_batch you do:
ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")If OLLAMA_EMBEDDING_URL is present but set to None, the .rstrip call will raise and force the except path, even though this is just “no embedding URL configured”. You also compute these Ollama-specific values even when embedding_provider is not "ollama".
You could make this more robust and targeted as:
- ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
- ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
- ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")
+ ollama_base_url_raw = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip()
+ ollama_base_url = ollama_base_url_raw.rstrip("/v1").rstrip("/") if ollama_base_url_raw else ""
+ ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") or ""
+ ollama_api_mode = (rag_settings.get("OLLAMA_API_MODE") or "native").lower()and only pass ollama_base_url/ollama_auth_token into _get_embedding_adapter when embedding_provider.lower() == "ollama".
This avoids noisy warnings from benign None values and keeps non-Ollama providers from incurring extra config work. Normalizing ollama_api_mode to lowercase also makes _get_embedding_adapter resilient to casing differences.
Also applies to: 433-469
🤖 Prompt for AI Agents
python/src/server/services/embeddings/embedding_service.py around lines 287-303
(also apply same change to 433-469): the code currently computes
ollama_base_url/auth_token/api_mode unconditionally and calls .rstrip on a value
that may be None, which can raise and causes unnecessary Ollama handling for
other providers; change the call sites to only compute and sanitize
OLLAMA_EMBEDDING_URL/AUTH_TOKEN/API_MODE when embedding_provider.lower() ==
"ollama" (guard against None before calling .rstrip), normalize api_mode to
lowercase, and only pass ollama_base_url and ollama_auth_token into
_get_embedding_adapter when the provider is Ollama so non-Ollama providers avoid
extra config work and you won't call .rstrip on None.
🧩 Analysis chain
Ensure OLLAMA_*_AUTH_TOKEN values from rag_settings are actual tokens, not “[ENCRYPTED]” placeholders
Both here and in llm_provider_service.get_llm_client you read:
ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
# or
ollama_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "")The provided credential_service.get_credentials_by_category("rag_strategy") implementation returns a dict with {"value": "[ENCRYPTED]", "is_encrypted": True, ...} for encrypted keys, not the real secret value. If the Ollama auth tokens are stored as encrypted rag_strategy entries, these lookups will hand [ENCRYPTED] placeholders (or dicts) to the HTTP layer, and auth will consistently fail.
Please double-check:
- How OLLAMA_*_AUTH_TOKEN is stored (encrypted vs plaintext) in
archon_settingsfor therag_strategycategory. - Whether there is a backend-only path for retrieving decrypted values (while still redacting them in the UI) and use that instead of the generic
get_credentials_by_category.
If tokens are currently encrypted and only exposed through [ENCRYPTED] placeholders, this retrieval logic will need to be adjusted for Ollama auth to actually work.
Also applies to: 433-469
I'll verify the review comment by examining the codebase to understand how OLLAMA auth tokens are retrieved and whether they may be "[ENCRYPTED]" placeholders.
---
🏁 Script executed:
```shell
#!/bin/bash
# Locate key files
fd embedding_service.py
fd credential_service.py
fd llm_provider_service.py
Length of output: 444
🏁 Script executed:
#!/bin/bash
# Examine the embedding_service.py lines around 287-303 and 433-469
cat python/src/server/services/embeddings/embedding_service.py | sed -n '280,310p'
cat python/src/server/services/embeddings/embedding_service.py | sed -n '430,475p'Length of output: 4073
🏁 Script executed:
#!/bin/bash
# Find how rag_settings are obtained and where OLLAMA_EMBEDDING_AUTH_TOKEN is used
rg -n "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN" --type py -B 2 -A 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find the credential_service implementation
fd credential_service.py -type fLength of output: 170
🏁 Script executed:
#!/bin/bash
# Search for get_credentials_by_category and how it returns encrypted values
rg -n "get_credentials_by_category\|ENCRYPTED" --type py -B 2 -A 2Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine credential_service.py to see get_credentials_by_category implementation
cat python/src/server/services/credential_service.py
Length of output: 25935
🏁 Script executed:
#!/bin/bash
# Search for OLLAMA token references using simpler patterns
rg "OLLAMA_EMBEDDING_AUTH_TOKEN" --type py -B 3 -A 3
rg "OLLAMA_CHAT_AUTH_TOKEN" --type py -B 3 -A 3Length of output: 7033
🏁 Script executed:
#!/bin/bash
# Examine llm_provider_service.py to find the get_llm_client function and token usage
cat python/src/server/services/llm_provider_service.py | head -500Length of output: 20625
🏁 Script executed:
#!/bin/bash
# Search for ENCRYPTED placeholder or decryption logic
rg "ENCRYPTED" --type py -B 2 -A 2
rg "decrypt\|is_encrypted" --type py -B 2 -A 2Length of output: 1368
---
🏁 Script executed:
```shell
#!/bin/bash
# Check ollama_api.py to see if it has the same issue with auth tokens
cat python/src/server/api_routes/ollama_api.py | grep -n -B 5 -A 5 "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN"
Length of output: 1237
🏁 Script executed:
#!/bin/bash
# Verify if OLLAMA auth tokens are actually stored as encrypted in the database schema or init
rg "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN" --type py -B 5 -A 5 | grep -i "encrypt\|set_credential"Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check how settings are initialized - are auth tokens marked encrypted?
fd "init\|seed\|migration" --type py | head -20
cat python/src/server/services/settings_service.py 2>/dev/null || echo "File not found"Length of output: 184
Replace category-based credential retrieval with backend-only decryption for Ollama auth tokens
The identified issue is confirmed: credential_service.get_credentials_by_category("rag_strategy") returns {"value": "[ENCRYPTED]", "is_encrypted": True} for encrypted entries, causing the literal string "[ENCRYPTED]" to be passed as auth tokens instead of actual decrypted values.
This affects multiple locations:
- embedding_service.py (line ~453):
ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") - llm_provider_service.py (lines ~290, ~340): Similar pattern for both
OLLAMA_EMBEDDING_AUTH_TOKENandOLLAMA_CHAT_AUTH_TOKEN - ollama_api.py (lines ~107, ~192): Same problematic pattern
Replace with backend-only credential retrieval:
ollama_auth_token = await credential_service.get_credential("OLLAMA_EMBEDDING_AUTH_TOKEN", default="", decrypt=True)The get_credential(key, decrypt=True) method already exists in credential_service and properly handles decryption while keeping UI redaction intact.
🤖 Prompt for AI Agents
python/src/server/services/embeddings/embedding_service.py around lines 287-303,
the code currently pulls Ollama auth tokens from settings/credential categories
which can yield the literal "[ENCRYPTED]" string; replace those lookups with a
backend-only decrypted retrieval by calling
credential_service.get_credential("OLLAMA_EMBEDDING_AUTH_TOKEN", default="",
decrypt=True) (awaiting it if async) and use that returned value when
constructing Ollama adapters so the actual decrypted token is passed instead of
the encrypted placeholder.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
python/src/agent_work_orders/utils/state_reconciliation.py (1)
159-161: Fix Windows root directory detection logic.The check
len(path_obj.parts) == 2 and path_obj.parts[1] == ""will never match Windows drive roots. On Windows,Path("C:\\").partsreturns("C:\\",)(1 element, not 2). This allows dangerous paths like "C:" to bypass the root directory safety check.Apply this diff to correctly detect Windows drive roots:
- is_not_root = not ( - orphan_path_resolved in ("/", "\\") or - (os.name == "nt" and len(path_obj.parts) == 2 and path_obj.parts[1] == "") - ) + is_not_root = not ( + orphan_path_resolved in ("/", "\\") or + (os.name == "nt" and len(path_obj.parts) == 1) + )archon-ui-main/src/services/credentialsService.ts (1)
62-75:OllamaInstance.authTokenis never persisted or reloaded, breaking per-instance authVerification confirms the issue. In
getOllamaInstances(lines 437-485), the credential parsing correctly extracts all fields intoinstanceMap, but the final instance object construction (lines 477-487) explicitly omitsauthTokenfrom the returned array. Similarly, insetOllamaInstances(lines 495-543), thefieldsmap excludesauthTokenentirely, so any token set in the UI is never persisted to credentials storage.To fix this:
- Include
authTokenin the returned instance object (line ~487):instances.push({ id: instance.id, name: instance.name, baseUrl: instance.baseUrl, isEnabled: instance.isEnabled ?? true, isPrimary: instance.isPrimary ?? false, instanceType: instance.instanceType ?? 'both', loadBalancingWeight: instance.loadBalancingWeight ?? 100, isHealthy: instance.isHealthy, responseTimeMs: instance.responseTimeMs, modelsAvailable: instance.modelsAvailable, lastHealthCheck: instance.lastHealthCheck, + authToken: instance.authToken, });
- Add
authTokento the persisted fields insetOllamaInstances(after line 514):if (instance.lastHealthCheck) { fields.lastHealthCheck = instance.lastHealthCheck; } + +if (instance.authToken) { + fields.authToken = instance.authToken; +}Alternatively, if tokens should live only in global credentials (e.g.,
OLLAMA_CHAT_AUTH_TOKEN), removeauthTokenfrom theOllamaInstanceinterface entirely to avoid the false expectation of per-instance storage.archon-ui-main/src/components/settings/RAGSettings.tsx (3)
2346-2354: Sync authentication settings when using same host for both instances.When the "Use same host for embedding instance" checkbox is enabled, only the
urlandnamefields are synced. If the LLM instance has authentication enabled, the embedding instance should inherit those settings to avoid configuration mismatch.Apply this diff:
onChange={(e) => { if (e.target.checked) { // Sync embedding instance with LLM instance setEmbeddingInstanceConfig({ name: llmInstanceConfig.name || 'Default Ollama', url: llmInstanceConfig.url, - useAuth: embeddingInstanceConfig.useAuth, - authToken: embeddingInstanceConfig.authToken + useAuth: llmInstanceConfig.useAuth, + authToken: llmInstanceConfig.authToken }); } }}
787-828: Clear authentication tokens when deleting instance configurations.The instance deletion handlers reset the instance config objects but don't remove the corresponding auth token keys from
ragSettings. This leaves orphaned authentication data.Apply these diffs:
const handleDeleteLLMInstance = () => { if (window.confirm('Are you sure you want to delete the current LLM instance configuration?')) { // Reset LLM instance configuration setLLMInstanceConfig({ name: '', - url: '' + url: '', + useAuth: false, + authToken: '' }); // Clear related RAG settings const updatedSettings = { ...ragSettings }; delete updatedSettings.LLM_BASE_URL; + delete updatedSettings.LLM_INSTANCE_NAME; delete updatedSettings.MODEL_CHOICE; + delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN; setRagSettings(updatedSettings);Apply similar changes to
handleDeleteEmbeddingInstance(lines 809-828).
1845-1899: Frontend will display "Not selected" for CHAT_MODEL due to field name mismatch.The concern in the original review is valid. Backend stores the Ollama chat model selection as
OLLAMA_CHAT_MODELin the database, but frontend line 1892 attempts to displayragSettings.CHAT_MODEL, which doesn't exist.Backend flow: Database stores
OLLAMA_CHAT_MODEL→credential_service.get_credentials_by_category("rag_strategy")returns raw dict withOLLAMA_CHAT_MODELkey → Frontend receives dict withoutCHAT_MODELfield.Fix required: Either update the backend to map
OLLAMA_CHAT_MODELtoCHAT_MODELin the response, or update frontend line 1892 to displayragSettings.OLLAMA_CHAT_MODELinstead.
♻️ Duplicate comments (6)
archon-ui-main/test-results/.last-run.json (1)
1-6: Do not commit.last-run.json; ignore test-results in VCS.This file is a transient Playwright/Vitest artifact and will churn frequently. Please:
- Add
test-results/(or at leasttest-results/.last-run.json) toarchon-ui-main/.gitignore.- Remove the tracked file via
git rm --cached archon-ui-main/test-results/.last-run.json.This matches the earlier review feedback.
INFRASTRUCTURE.md (1)
108-121: Remove hard-coded Supabase JWT from docs and use a placeholder instead.The
SUPABASE_SERVICE_KEYhere is a real JWT with a long expiry. Even if intended for local use, committing it is a credential leak and has already been flagged by gitleaks and a prior review.Recommend:
-# JWT Service Role Key (generiert mit lokalem JWT_SECRET) -SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoic2VydmljZV9yb2xlIiwiaXNzIjoic3VwYWJhc2UiLCJpYXQiOjE3NjM2NTg1MDYsImV4cCI6MjA3OTAxODUwNn0.HVH5TgwW70JZtiGdnjU4RGexDDVbGnI3mXt-diQhVy8 +# JWT Service Role Key (mit lokalem JWT_SECRET generieren – nicht committen) +SUPABASE_SERVICE_KEY=<generate-using-script-below>…and explicitly instruct readers to run the Python script in this doc to generate their own key locally, add it to their
.env, and never commit it.python/src/server/services/embeddings/embedding_service.py (1)
447-473: Ollama embedding config still fragile and likely passing “[ENCRYPTED]” placeholders instead of real auth tokensThis block reintroduces both issues previously flagged in earlier reviews:
rag_settings = await _maybe_await( credential_service.get_credentials_by_category("rag_strategy") ) ... ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/") ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native") ... adapter = _get_embedding_adapter( embedding_provider, client, base_url=ollama_base_url, auth_token=ollama_auth_token, api_mode=ollama_api_mode, )
.rstripon possibly non‑string values:get_credentials_by_category("rag_strategy")returns objects like{"value": "[ENCRYPTED]", "is_encrypted": True, ...}for encrypted rows. IfOLLAMA_EMBEDDING_URLor related keys are missing or malformed, you can end up calling.rstriponNoneor on a dict, which will raise and force the generic “Failed to load embedding settings” path even though this is just misconfiguration. Guard and normalize the raw value before stripping.Auth token is almost certainly the
[ENCRYPTED]placeholder, not the decrypted secret: Becauseget_credentials_by_categoryintentionally returns masked values for encrypted entries,OLLAMA_EMBEDDING_AUTH_TOKENhere will be either a placeholder dict or the literal string"[ENCRYPTED]". That value is then used as the Bearer token inNativeOllamaEmbeddingAdapter, so auth will consistently fail even when a valid token is stored.To address both, you can:
- Only compute Ollama‑specific settings when
embedding_provider.lower() == "ollama".- Pull the real token via the backend‑only decryption API instead of from the category map.
- Normalize the base URL defensively.
For example:
- rag_settings = await _maybe_await( - credential_service.get_credentials_by_category("rag_strategy") - ) - batch_size = int(rag_settings.get("EMBEDDING_BATCH_SIZE", "100")) - embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536")) - - # For Ollama, get native API URL, auth token, and API mode - ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/") - ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") - ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native") + rag_settings = await _maybe_await( + credential_service.get_credentials_by_category("rag_strategy") + ) + batch_size = int(rag_settings.get("EMBEDDING_BATCH_SIZE", "100")) + embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536")) + + ollama_base_url = "" + ollama_auth_token = "" + ollama_api_mode = "native" + + if (embedding_provider or "").lower() == "ollama": + raw_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip() + if isinstance(raw_url, str) and raw_url: + ollama_base_url = raw_url.rstrip("/v1").rstrip("/") + + # Use backend-only decrypted credential for the token + try: + ollama_auth_token = await credential_service.get_credential( + "OLLAMA_EMBEDDING_AUTH_TOKEN", + default="", + decrypt=True, + ) + except TypeError: + # Fallback if get_credential signature differs; keep empty token + pass + + mode_raw = rag_settings.get("OLLAMA_API_MODE") or "native" + if isinstance(mode_raw, str): + ollama_api_mode = mode_raw.strip().lower()and keep the default values in the
exceptblock as you already do.This avoids noisy errors from benign misconfig, and—more importantly—ensures the adapter receives a real decrypted Bearer token instead of masked placeholders.
python/src/server/api_routes/ollama_api.py (2)
99-131: [Duplicate] Normalize instance URLs consistently when mapping Ollama auth tokensThis issue has already been flagged in a previous review. The configured base URLs strip
/v1and then trailing/, but incoming instance URLs only strip trailing/. When callers pass URLs with/v1(common from UI), the equality checks fail and no auth token is attached, causing 401 errors.
180-187: [Duplicate] Do not log full rag_settings – it may contain secretsThis issue has already been flagged in a previous review. Logging the entire
rag_settingsdictionary exposes sensitive data including Ollama auth tokens and other credentials. This is a security/compliance risk.PLAN.md (1)
253-261: [Duplicate] Remove real auth tokens from documentation and rotate themThis critical issue has already been flagged in a previous review. The DB snapshot contains what appears to be a real Ollama auth token (
ollama_13107e338aa16a6a8295592ce050f6cb) that should be immediately rotated and replaced with a placeholder.
🧹 Nitpick comments (12)
python/src/agent_work_orders/utils/state_reconciliation.py (1)
184-186: Catch specific exception types where possible.Catching bare
Exceptionduring file operations masks specific errors. Consider catchingOSErrorand its subclasses (e.g.,PermissionError,FileNotFoundError) for more precise diagnostics, while allowing unexpected errors to propagate.- except Exception as e: + except OSError as e:This improves observability by distinguishing expected file-system errors from unexpected runtime failures, making it easier to diagnose issues during reconciliation.
archon-ui-main/package.json (1)
64-67: Playwright devDependency looks good; consider adding an e2e script.Adding
@playwright/testas a devDependency is appropriate for the new e2e tests. Optionally, add a"test:e2e": "playwright test"script so CI and devs have a clear entrypoint for those tests.QUICKSTART.md (1)
7-19: Replace machine-specific absolute paths with repo-relative or placeholder paths.The commands hard-code
/Volumes/DATEN/Coding/archon/..., which is specific to one machine and will not work for most contributors. Consider using repo-relative paths or a placeholder like<ARCHON_REPO_ROOT>instead, for example:- cd /Volumes/DATEN/Coding/archon/supabase + cd path/to/archon/supabase - cd /Volumes/DATEN/Coding/archon + cd path/to/archonThis keeps the quick-start copy‑pasteable for other environments.
Also applies to: 78-89, 125-128
INFRASTRUCTURE.md (1)
78-89: Generalize hard-coded local filesystem paths for broader reuse.Multiple snippets use absolute paths like
/Volumes/DATEN/Coding/archon/..., which are specific to one machine. To make this infra guide reusable, consider switching to repo-relative or placeholder paths (e.g.,cd path/to/archon,cd path/to/archon/supabase) and clarifying that users should adapt them to their environment.Also applies to: 125-128, 189-191, 345-347, 455-459, 465-472
python/src/server/services/llm_provider_service.py (1)
415-431: Ollama auth-token selection looks correct; consider DRYing it and tightening logging.The new logic to:
- Fetch
rag_strategysettings,- Select
OLLAMA_EMBEDDING_AUTH_TOKENvsOLLAMA_CHAT_AUTH_TOKEN, and- Fall back to
"required-but-ignored"when unset,is consistent between the OpenAI→Ollama fallback and the direct Ollama path, and avoids logging raw tokens (only lengths), which is good.
Two small follow-ups you might consider:
Factor out token resolution
Extract a helper like_get_ollama_auth_token(use_embedding: bool, instance_type: str | None)that encapsulates:
- Cache lookup / fetch of
rag_strategy,- Key selection (
EMBEDDINGvsCHAT),- Defaulting + warning.
That removes duplication between the fallback branch and the main
provider_name == "ollama"branch.Sanitize or downgrade embedding URL logging
logger.info(f"Embedding URL from settings: {embedding_url}")andlogger.info(f"Resolved embedding URL: {final_url}")will log whatever is configured, including potential credentials if someone encodes them in the URL. Using_sanitize_for_log(...)or moving these todebugwould reduce the chance of leaking sensitive connection details into logs.These are polish items; core behavior and error handling look solid.
Also applies to: 455-475, 618-622
python/src/server/services/embeddings/embedding_service.py (1)
107-172: NativeOllamaEmbeddingAdapter behavior is reasonable, but consider concurrency and dimensions semanticsThe adapter’s per-text
asyncio.gatherover the batch and strict validation of theembeddingfield look good. Two small points to keep in mind:
dimensionsis currently ignored; that’s fine given Ollama’s fixed-dimension embeddings, but it may be worth a brief comment so future readers don’t assume down‑projection happens here.- With default
EMBEDDING_BATCH_SIZEyou’ll fire up to that many concurrent/api/embeddingscalls per batch. If users push batch size much higher, you might want an internal concurrency cap or a per‑adapter semaphore to avoid overloading smaller Ollama instances.python/src/server/services/ollama/model_discovery_service.py (1)
122-135: Auth token support is good for/api/tags, but capability/detail calls still ignore itThe new
auth_tokenplumbing fordiscover_models,check_instance_health, anddiscover_models_from_multiple_instanceslooks correct: headers are built as:headers = {} if (auth_token): headers["Authorization"] = f"Bearer {auth_token}"and passed into
client.get(...)for/api/tags.Two follow‑ups to consider:
Capability and details endpoints still unauthenticated. Methods like
_test_embedding_capability_fast,_test_embedding_capability,_get_model_details, and the various chat/structured‑output tests (_test_chat_capability_fast,_test_chat_capability, etc.) still hit/api/embeddings,/api/show, and chat completions without any Authorization header. On an Ollama instance that enforces auth globally, model listing and health will now work, but enrichment and capability detection will likely return 401s and silently fall back to defaults.If you intend “protected instance” support end‑to‑end, you probably want to thread
auth_tokenthrough_enrich_model_capabilitiesand the downstream helpers (or at least_get_model_detailsand both embedding tests) and attach the same header there.Health cache is keyed only by URL.
check_instance_healthcaches byhealth_{instance_url}regardless of token. If a user fixes a bad token and immediately re‑checks health for the same URL, they can get a stale cached error for up to 30s. Not critical, but you might want to either skip caching when a request fails with 401, or include the token (or a hash of it) in the cache key.Both are incremental and keep backward compatibility, but they’ll make the new auth support behave more predictably on locked‑down Ollama deployments.
Also applies to: 192-199, 235-243, 964-999, 1021-1024, 1026-1060
archon-ui-main/test-results/tests-e2e-ollama-api-mode--98290-t-to-Native-Ollama-API-mode/error-context.md (1)
1-638: Large UI snapshot is fine, but confirm it’s meant to live in version controlThis file is purely a captured UI tree for an e2e failure context and doesn’t affect runtime behavior. If it’s generated by tests, consider whether it should be checked in or ignored via
.gitignoreto avoid churn; otherwise, it’s fine as documentation of the failing state.python/src/server/api_routes/ollama_api.py (1)
204-215: Consider reducing log verbosity for token selection.The health check logs token selection decisions at
infolevel for every instance check. This creates verbose logs in production. Consider moving lines 210, 213, and 215 todebuglevel to reduce noise while keeping the information available for troubleshooting.Apply this diff to reduce log verbosity:
if url == llm_base_url and chat_auth_token: auth_token = chat_auth_token - logger.info(f"Using chat auth token for {url}") + logger.debug(f"Using chat auth token for {url}") elif url == embedding_base_url and embedding_auth_token: auth_token = embedding_auth_token - logger.info(f"Using embedding auth token for {url}") + logger.debug(f"Using embedding auth token for {url}") else: - logger.warning(f"No matching auth token found for {url}") + logger.debug(f"No auth token configured for {url}")archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (1)
1-192: Replace hardcoded timeouts with Playwright's auto-waiting mechanisms.The tests use
page.waitForTimeout()extensively (30+ occurrences). Playwright's auto-waiting is more reliable and faster. Most timeouts can be removed since Playwright automatically waits for elements before interacting.Example refactor for lines 9-26:
test('should display API mode radio buttons when Ollama is selected', async ({ page }) => { // Navigate to RAG Settings tab await page.click('text=RAG Settings'); - await page.waitForTimeout(500); // Select Ollama as embedding provider await page.click('text=Embedding'); - await page.waitForTimeout(300); // Click on Ollama provider card const ollamaCard = page.locator('button:has-text("Ollama")').first(); await ollamaCard.click(); - await page.waitForTimeout(500); // Open Ollama configuration const configButton = page.locator('button:has-text("Config")').first(); await configButton.click(); - await page.waitForTimeout(500); // Verify API mode section is visible (auto-waits) await expect(page.locator('text=Ollama API Mode')).toBeVisible();Playwright's
click()andexpect().toBeVisible()already include auto-waiting, making explicit timeouts unnecessary.archon-ui-main/src/components/settings/RAGSettings.tsx (2)
1595-1652: Consider clarifying that API mode is a global setting.The Ollama API Mode selection appears within the context-aware configuration panel (LLM or Embedding), but it's actually a global setting that affects both. Users might expect it to be per-instance.
Consider adding a helper text indicating this applies to all Ollama instances:
<div className="mb-4 pb-4 border-b border-green-500/20"> <label className="block text-sm font-medium text-gray-300 mb-3"> Ollama API Mode </label> + <p className="text-xs text-gray-400 mb-2"> + This setting applies to all Ollama instances (both LLM and Embedding) + </p> <div className="flex gap-3">
2363-2387: Extract authentication UI into a reusable component.The authentication sections in both LLM and Embedding modals are nearly identical (lines 2363-2387 and 2454-2477), violating the DRY principle.
Consider creating a reusable component:
interface AuthenticationSectionProps { id: string; useAuth: boolean; authToken: string; onUseAuthChange: (checked: boolean) => void; onTokenChange: (token: string) => void; accentColor?: 'green' | 'purple'; } const AuthenticationSection: React.FC<AuthenticationSectionProps> = ({ id, useAuth, authToken, onUseAuthChange, onTokenChange, accentColor = 'green' }) => ( <div className="space-y-2 mt-4"> <div className="flex items-center gap-2"> <input type="checkbox" id={id} checked={useAuth} onChange={(e) => onUseAuthChange(e.target.checked)} className={`w-4 h-4 text-${accentColor}-600 ...`} /> <label htmlFor={id} className="text-sm font-medium text-gray-700 dark:text-gray-300"> Use Authentication </label> </div> {useAuth && ( <Input type="password" placeholder="Auth Token" value={authToken} onChange={(e) => onTokenChange(e.target.value)} className="text-sm" /> )} </div> );Then use it in both modals to reduce duplication.
Also applies to: 2454-2477
| interface RAGSettingsProps { | ||
| ragSettings: { | ||
| MODEL_CHOICE: string; | ||
| CHAT_MODEL?: string; |
There was a problem hiding this comment.
Add missing auth token fields to the RAGSettingsProps interface.
The code accesses OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN via type assertion (ragSettings as any), which bypasses TypeScript's type checking and violates the strict mode requirement.
Apply this diff to add the missing fields to the interface:
interface RAGSettingsProps {
ragSettings: {
MODEL_CHOICE: string;
CHAT_MODEL?: string;
USE_CONTEXTUAL_EMBEDDINGS: boolean;
CONTEXTUAL_EMBEDDINGS_MAX_WORKERS: number;
USE_HYBRID_SEARCH: boolean;
USE_AGENTIC_RAG: boolean;
USE_RERANKING: boolean;
LLM_PROVIDER?: string;
LLM_BASE_URL?: string;
LLM_INSTANCE_NAME?: string;
EMBEDDING_MODEL?: string;
EMBEDDING_PROVIDER?: string;
OLLAMA_EMBEDDING_URL?: string;
OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+ OLLAMA_CHAT_AUTH_TOKEN?: string;
+ OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+ OLLAMA_API_MODE?: 'native' | 'openai-compatible';
// Crawling Performance Settings
CRAWL_BATCH_SIZE?: number;
// ... rest of fields
};
setRagSettings: (settings: any) => void;
}Then remove the type assertions on lines 230, 248, 253, and 271:
-const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || '';
+const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || '';As per coding guidelines.
Also applies to: 230-230, 248-248, 253-253, 271-271
🤖 Prompt for AI Agents
In archon-ui-main/src/components/settings/RAGSettings.tsx around line 144, the
RAGSettingsProps interface is missing OLLAMA_CHAT_AUTH_TOKEN and
OLLAMA_EMBEDDING_AUTH_TOKEN fields; add them (both optional strings) to the
interface definition, then remove the unsafe (ragSettings as any) type
assertions at lines 230, 248, 253 and 271 and instead access those newly typed
properties directly from ragSettings so TypeScript strict mode is satisfied.
Security fixes: - Remove hardcoded JWT token from INFRASTRUCTURE.md (replace with placeholder) - Remove real Ollama auth tokens from PLAN.md (replace with placeholders) - Fix secrets leak: Change RAG settings logging from info to debug level, log only keys Functional fixes: - Fix URL normalization for auth-token mapping (both incoming and configured URLs now use same normalization) Best practices: - Add test-results/ to .gitignore - Remove test-results from version control All 22 Python tests pass, server healthy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
f668f36 to
80ca26d
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
PR coleam00#875: Ollama Support with Local Model Discovery - Security fixes: JWT tokens, auth tokens, RAG settings logging - Bug fixes: URL normalization, None-handling - Best practices: test-results gitignore All 7 items marked as ERLEDIGT (2025-11-21) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit d7b5ea6.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
PRPs/todo.md (2)
1-107: Clarify document purpose and placement.This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository, or should it be maintained elsewhere (e.g., PR description, internal wiki)? If it should remain in the repo, add a brief header explaining its purpose and intended audience.
99-99: Use a heading instead of emphasis.Line 99 uses bold text (
**Gesamt: 7 actionable Items**) where a markdown heading would be more appropriate for document structure.- **Gesamt: 7 actionable Items** + ## Gesamt: 7 actionable Items
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
PRPs/todo.md(1 hunks)
🧰 Additional context used
🪛 Gitleaks (8.29.0)
PRPs/todo.md
[high] 21-21: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 22-22: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 LanguageTool
PRPs/todo.md
[grammar] ~1-~1: Ersetze das Satzzeichen
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review ...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~1-~1: Hier könnte ein Fehler sein.
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review vom 21. Nove...
(QB_NEW_DE)
[grammar] ~3-~3: Ergänze ein Wort
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review vom 21. November 20...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)
[grammar] ~4-~4: Hier könnte ein Fehler sein.
Context: ...ith Local Model Discovery and Embeddings --- ## KRITISCH (Sicherheit) ### 1. JWT Token ...
(QB_NEW_DE)
[grammar] ~8-~8: Hier könnte ein Fehler sein.
Context: ...mbeddings --- ## KRITISCH (Sicherheit) ### 1. JWT Token aus INFRASTRUCTURE.md entfe...
(QB_NEW_DE)
[grammar] ~10-~10: Ergänze ein Satzzeichen
Context: ...-- ## KRITISCH (Sicherheit) ### 1. JWT Token aus INFRASTRUCTURE.md entfernen - ...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_JWTDASHTOKEN)
[grammar] ~12-~12: Korrigiere das Wort
Context: ...RUCTURE.md` (Zeile ~113) - Problem: Hardcoded JWT Token mit 10-Jahres-Gültigkeit comm...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_SPELLING)
[grammar] ~12-~12: Korrigiere das Wort
Context: ...Zeile ~113) - Problem: Hardcoded JWT Token mit 10-Jahres-Gültigkeit committed...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_SPELLING)
[grammar] ~14-~14: Ersetze das Satzzeichen
Context: ...s auf Token-Generierungs-Script (Zeilen 142-157) hinzufügen - Status: [x] ERLEDIGT ...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~15-~15: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 2. Auth Tokens aus PLAN.md entfernen - *...
(QB_NEW_DE)
[grammar] ~17-~17: Ergänze ein Satzzeichen
Context: ...* [x] ERLEDIGT (2025-11-21) ### 2. Auth Tokens aus PLAN.md entfernen - **Datei:*...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_AUTHDASHTOKENS)
[grammar] ~17-~17: Hier könnte ein Fehler sein.
Context: ...### 2. Auth Tokens aus PLAN.md entfernen - Datei: PLAN.md (Zeilen 111-261) - **Proble...
(QB_NEW_DE)
[grammar] ~18-~18: Ergänze ein Leerzeichen
Context: ...okens aus PLAN.md entfernen - Datei: PLAN.md (Zeilen 111-261) - **Problem:*...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~18-~18: Ersetze das Satzzeichen
Context: ...ntfernen - Datei: PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Token...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~18-~18: Hier könnte ein Fehler sein.
Context: ... - Datei: PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Tokens und Producti...
(QB_NEW_DE)
[grammar] ~19-~19: Passe das Symbol an
Context: ... PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Tokens und Production...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)
[grammar] ~20-~20: Hier könnte ein Fehler sein.
Context: ...URL im DB-Snapshot - Tokens betroffen: - OLLAMA_CHAT_AUTH_TOKEN: ollama_13107e338aa16a6a8295592ce050f6cb - OLLAMA_EMBEDDING_AUTH_TOKEN: ollama_13107e338aa16a6a8295592ce050f6cb - Lösung: - Tokens durch Placeholder e...
(QB_NEW_DE)
[grammar] ~26-~26: Hier könnte ein Fehler sein.
Context: ...Server-seitig Tokens rotieren/revoken! - Status: [x] ERLEDIGT (2025-11-21) ###...
(QB_NEW_DE)
[grammar] ~27-~27: Ergänze ein Leerzeichen
Context: ...Tokens rotieren/revoken!** - Status: [x] ERLEDIGT (2025-11-21) ### 3. RAG Se...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~27-~27: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secre...
(QB_NEW_DE)
[grammar] ~29-~29: Ergänze ein Satzzeichen
Context: ...** [x] ERLEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secrets Leak...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_RAGDASHSETTINGSDASHLOGGING)
[grammar] ~29-~29: Ergänze ein Satzzeichen
Context: ...LEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secrets Leak) - **Dat...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_RAGDASHSETTINGSDASHLOGGING)
[grammar] ~29-~29: Hier könnte ein Fehler sein.
Context: ...ettings Logging entfernen (Secrets Leak) - Datei: `python/src/server/api_routes/ollama_...
(QB_NEW_DE)
[grammar] ~30-~30: Ergänze ein Leerzeichen
Context: ...ng entfernen (Secrets Leak) - Datei: `python/src/server/api_routes/ollama_api...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~30-~30: Ersetze das Satzzeichen
Context: ...erver/api_routes/ollama_api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG setti...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~30-~30: Hier könnte ein Fehler sein.
Context: ...i_routes/ollama_api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_set...
(QB_NEW_DE)
[grammar] ~31-~31: Ergänze ein Leerzeichen
Context: ..._api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_settin...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[typographical] ~31-~31: Zeichen ohne sein Gegenstück: ‚"‘ scheint zu fehlen
Context: ...api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_settings}")` logged a...
(DE_UNPAIRED_QUOTES)
[grammar] ~31-~31: Wähle ein passenderes Wort
Context: ...settings: {rag_settings}")` logged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Key...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_NOUN)
[grammar] ~31-~31: Ergänze ein Wort
Context: ...}")` logged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBU...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)
[grammar] ~31-~31: Hier könnte ein Fehler sein.
Context: ...gged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBUG Level: `...
(QB_NEW_DE)
[grammar] ~32-~32: Ergänze ein Leerzeichen
Context: ...settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBUG Level: ```p...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~32-~32: Korrigiere die Fehler
Context: ...- Lösung: Nur Keys loggen, auf DEBUG Level: ```python logger.debug(f"RAG ...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~32-~32: Hier könnte ein Fehler sein.
Context: ...ung:** Nur Keys loggen, auf DEBUG Level: python logger.debug(f"RAG settings keys: {list(rag_settings.keys())}") - Status: [x] ERLEDIGT (2025-11-21) ---...
(QB_NEW_DE)
[grammar] ~36-~36: Ergänze ein Leerzeichen
Context: ..._settings.keys())}") ``` - Status: [x] ERLEDIGT (2025-11-21) --- ## MAJOR...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~36-~36: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) --- ## MAJOR (Funktionale Probleme) ### 4. URL...
(QB_NEW_DE)
[grammar] ~40-~40: Passe die Groß- und Kleinschreibung an
Context: ...] ERLEDIGT (2025-11-21) --- ## MAJOR (Funktionale Probleme) ### 4. URL-Normalisierung fü...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)
[grammar] ~40-~40: Hier könnte ein Fehler sein.
Context: ...1) --- ## MAJOR (Funktionale Probleme) ### 4. URL-Normalisierung für Auth-Token Map...
(QB_NEW_DE)
[grammar] ~42-~42: Entferne ein Leerzeichen
Context: ...### 4. URL-Normalisierung für Auth-Token Mapping fixen - Datei: `python/src/s...
(QB_NEW_DE_OTHER_ERROR_IDS_UNNECESSARY_ORTHOGRAPHY_SPACE)
[grammar] ~43-~43: Hier könnte ein Fehler sein.
Context: ...ierung für Auth-Token Mapping fixen - Datei: `python/src/server/api_routes/ollama_...
(QB_NEW_DE)
[grammar] ~43-~43: Ergänze ein Leerzeichen
Context: ...ür Auth-Token Mapping fixen - Datei: `python/src/server/api_routes/ollama_api...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~43-~43: Ergänze ein Wort
Context: ...r Auth-Token Mapping fixen - Datei: python/src/server/api_routes/ollama_api.py (Zeilen 99-109, 119-131, 137-142, 188-1...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)
[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...erver/api_routes/ollama_api.py` (Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - ...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...i_routes/ollama_api.py` (Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...ollama_api.py(Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem:**/v1`...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...i.py(Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem:**/v1` wird von...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...ilen 99-109, 119-131, 137-142, 188-194, 200-217) - Problem: /v1 wird von konfigur...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~44-~44: Hier könnte ein Fehler sein.
Context: ...119-131, 137-142, 188-194, 200-217) - Problem: /v1 wird von konfigurierten URLs en...
(QB_NEW_DE)
[grammar] ~44-~44: Ergänze ein Leerzeichen
Context: ...37-142, 188-194, 200-217) - Problem: /v1 wird von konfigurierten URLs entfe...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~44-~44: Korrigiere die Fehler
Context: ...n URLs → Token-Lookup schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)
[grammar] ~45-~45: Hier könnte ein Fehler sein.
Context: ...en-Lookup schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich normalisieren: ...
(QB_NEW_DE)
[grammar] ~45-~45: Ergänze ein Leerzeichen
Context: ... schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich normalisieren: ```...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~45-~45: Hier könnte ein Fehler sein.
Context: ...ng:** Beide Seiten gleich normalisieren: python normalized_url = url.replace("/v1", "").rstrip("/") - Status: [x] ERLEDIGT (2025-11-21) ###...
(QB_NEW_DE)
[grammar] ~49-~49: Ergänze ein Leerzeichen
Context: ...v1", "").rstrip("/") ``` - Status: [x] ERLEDIGT (2025-11-21) ### 5. Encryp...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~49-~49: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 5. Encrypted Token Placeholder Problem -...
(QB_NEW_DE)
[grammar] ~57-~57: Hier könnte ein Fehler sein.
Context: ...ng:** Backend-only Decryption verwenden: python ollama_auth_token = await credential_service.get_credential( "OLLAMA_EMBEDDING_AUTH_TOKEN", default="", decrypt=True ) - Status: [x] ERLEDIGT (2025-11-21) - Au...
(QB_NEW_DE)
[grammar] ~63-~63: Ergänze ein Leerzeichen
Context: ..."", decrypt=True ) ``` - Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens ...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~63-~63: Hier könnte ein Fehler sein.
Context: ...``` - Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens sind nicht verschlüsselt,...
(QB_NEW_DE)
[grammar] ~63-~63: Passe das Symbol an
Context: ...- Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens sind nicht verschlüsselt, d...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)
[grammar] ~63-~63: Hier könnte ein Fehler sein.
Context: ...espeichert werden. Funktioniert korrekt. ### 6. OLLAMA_EMBEDDING_URL None-Handling - ...
(QB_NEW_DE)
[grammar] ~65-~65: Hier könnte ein Fehler sein.
Context: ...## 6. OLLAMA_EMBEDDING_URL None-Handling - Datei: `python/src/server/services/embedding...
(QB_NEW_DE)
[grammar] ~66-~66: Ergänze ein Leerzeichen
Context: ...EMBEDDING_URL None-Handling - Datei: `python/src/server/services/embeddings/e...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~66-~66: Ergänze ein Wort
Context: ...MBEDDING_URL None-Handling - Datei: python/src/server/services/embeddings/embedding_service.py (Zeilen 287-303, 433-469) - **Problem:*...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)
[grammar] ~66-~66: Ersetze das Satzzeichen
Context: ...mbeddings/embedding_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` au...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~66-~66: Ersetze das Satzzeichen
Context: .../embedding_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` auf None-We...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~66-~66: Hier könnte ein Fehler sein.
Context: ...ng_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` auf None-Wert wirft Excep...
(QB_NEW_DE)
[grammar] ~67-~67: Ergänze ein Leerzeichen
Context: ...(Zeilen 287-303, 433-469) - Problem: .rstrip() auf None-Wert wirft Exceptio...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~67-~67: Ergänze ein Wort
Context: ...blem:** .rstrip() auf None-Wert wirft Exception - Lösung: ```python ollama_base...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)
[grammar] ~68-~68: Passe das Symbol an
Context: ...ip()` auf None-Wert wirft Exception - Lösung: python ollama_base_url_raw = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip() ollama_base_url = ollama_base_url_raw.rstrip("/v1").rstrip("/") if ollama_base_url_raw else "" ollama_api_mode = (rag_settings.get("OLLAMA_API_MODE") or "native").lower() - Zusatz: Ollama-spezifische Werte nur b...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)
[grammar] ~74-~74: Hier könnte ein Fehler sein.
Context: ...I_MODE") or "native").lower() ``` - Zusatz: Ollama-spezifische Werte nur berechne...
(QB_NEW_DE)
[grammar] ~74-~74: Ergänze ein Leerzeichen
Context: ...or "native").lower() ``` - Zusatz: Ollama-spezifische Werte nur berechnen w...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~74-~74: Ergänze ein Satzzeichen
Context: ...Zusatz:* Ollama-spezifische Werte nur berechnen wenn `embedding_provider.lower() == "ol...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_COMMA)
[grammar] ~75-~75: Ersetze das Satzzeichen
Context: ...- Status: [x] ERLEDIGT (2025-11-21) - Code verwendet bereits (... or "") Pa...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)
[grammar] ~75-~75: Ergänze ein Satzzeichen
Context: ...) - Code verwendet bereits (... or "") Pattern für None-Safety --- ## MINOR (...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_DASHPATTERN)
[grammar] ~75-~75: Hier könnte ein Fehler sein.
Context: ...ts (... or "") Pattern für None-Safety --- ## MINOR (Best Practices) ### 7. Test-Resu...
(QB_NEW_DE)
[grammar] ~79-~79: Hier könnte ein Fehler sein.
Context: ...e-Safety --- ## MINOR (Best Practices) ### 7. Test-Results aus Version Control entf...
(QB_NEW_DE)
[grammar] ~85-~85: Hier könnte ein Fehler sein.
Context: ...ung:** 1. Zu .gitignore hinzufügen: test-results/ 2. git rm --cached archon-ui-main/test-results/ - Status: [x] ERLEDIGT (2025-11-21) ---...
(QB_NEW_DE)
[grammar] ~87-~87: Ergänze ein Leerzeichen
Context: ...hon-ui-main/test-results/` - Status: [x] ERLEDIGT (2025-11-21) --- ## Zusam...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)
[grammar] ~87-~87: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) --- ## Zusammenfassung | Priorität | Anzahl | ...
(QB_NEW_DE)
[grammar] ~91-~91: Hier könnte ein Fehler sein.
Context: ...GT (2025-11-21) --- ## Zusammenfassung | Priorität | Anzahl | Beschreibung | |-...
(QB_NEW_DE)
[grammar] ~97-~97: Hier könnte ein Fehler sein.
Context: ... | 1 | Best Practices (gitignore) | Gesamt: 7 actionable Items --- ## No...
(QB_NEW_DE)
[grammar] ~99-~99: Hier könnte ein Fehler sein.
Context: ...tignore) | Gesamt: 7 actionable Items --- ## Notizen - Die Grammar/Language-Tool War...
(QB_NEW_DE)
[grammar] ~103-~103: Hier könnte ein Fehler sein.
Context: ...t: 7 actionable Items** --- ## Notizen - Die Grammar/Language-Tool Warnungen (deu...
(QB_NEW_DE)
[grammar] ~105-~105: Entferne ein Leerzeichen
Context: ... ## Notizen - Die Grammar/Language-Tool Warnungen (deutsche Rechtschreibung in P...
(QB_NEW_DE_OTHER_ERROR_IDS_UNNECESSARY_ORTHOGRAPHY_SPACE)
[grammar] ~105-~105: Füge ein passenderes Wort ein
Context: ...n (deutsche Rechtschreibung in PLAN.md) sind niedrige Priorität - Nach Behebung der ...
(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_VERB)
[grammar] ~105-~105: Ergänze ein Satzzeichen
Context: ...chtschreibung in PLAN.md) sind niedrige Priorität - Nach Behebung der kritischen Issues: ...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_PERIOD)
[grammar] ~106-~106: Ergänze ein Satzzeichen
Context: ...: Server-seitig alle exponierten Tokens rotieren - Repo-weite Suche nach ähnlichen Token...
(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_PERIOD)
[grammar] ~107-~107: Hier könnte ein Fehler sein.
Context: ...ch ähnlichen Token/URL-Leaks durchführen
(QB_NEW_DE)
🪛 markdownlint-cli2 (0.18.1)
PRPs/todo.md
99-99: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (1)
PRPs/todo.md (1)
21-22: No security risk—this document is tracking completed remediation work, not storing active credentials.The token values shown on lines 21–22 are historical documentation of a security issue that has already been fixed. Verification confirms:
- The token does not appear anywhere else in the repository
- PLAN.md (the file where tokens were originally exposed) has already been sanitized with
ollama_xxx_placeholder_tokenplaceholders- Git history shows the fix was completed: "fix(security): Address CodeRabbit review findings"
- PRPs/todo.md is a German-language retrospective tracking document describing problems found and their solutions
This file documents that the issue was remediated server-side (as noted: "Server-seitig Tokens rotieren/revoken!"), not that secrets are currently exposed.
Likely an incorrect or invalid review comment.
|
Sorry, I forgot to clean the todo.md. This was just for the implementaion of this ollama-fix /-overhaul.
Am 23. Nov. 2025, 20:12 +0100 schrieb:
…
placement.
This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository
|
- Add auth_token parameter to validate_provider_instance() function - Load auth tokens from RAG settings in /validate endpoint - Match incoming URLs against configured LLM_BASE_URL and OLLAMA_EMBEDDING_URL - Apply correct auth token (chat or embedding) based on URL matching - Add tests for auth token flow through all layers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
When "Use same host for embedding instance" is checked, now also copies useAuth and authToken from LLM config to embedding config, not just the URL and name. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Extract syncEmbeddingFromLLM() into testable utility function - Add 6 tests covering: full sync, default name, custom default, auth settings copy, empty token handling, immutability - Refactor RAGSettings to use the utility function 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
python/src/server/api_routes/ollama_api.py (3)
254-256: Bug:timestampreturns module name instead of actual timestamp.
model_discovery_service.check_instance_health.__module__returns the module path string (e.g.,"src.server.services.ollama.model_discovery_service"), not a timestamp.- "timestamp": model_discovery_service.check_instance_health.__module__ # Use current timestamp + "timestamp": datetime.now().isoformat()
510-517: Missing auth token support indiscover_and_store_models_endpoint.Unlike
discover_models_endpointandvalidate_instance_endpoint, this endpoint doesn't extract auth tokens from RAG settings. Calls todiscover_models()at line 517 will fail on protected Ollama instances.Consider adding auth token extraction similar to other endpoints:
# Get auth tokens from RAG settings from ..services.credential_service import credential_service rag_settings = await credential_service.get_credentials_by_category("rag_strategy") llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/") embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/") chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or "" embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or "" # Then pass auth_token to discover_models based on URL matching
1056-1065: Missing auth token support indiscover_models_with_real_details.The direct
httpx.AsyncClientrequest to/api/tagsdoesn't include Authorization headers. This will fail on protected Ollama instances.+ # Get auth token for this instance + normalized_url = base_url.replace("/v1", "").rstrip("/") + auth_token = None + if normalized_url == llm_base_url and chat_auth_token: + auth_token = chat_auth_token + elif normalized_url == embedding_base_url and embedding_auth_token: + auth_token = embedding_auth_token + + headers = {} + if auth_token: + headers["Authorization"] = f"Bearer {auth_token}" + async with httpx.AsyncClient(timeout=httpx.Timeout(5.0)) as client: - tags_response = await client.get(f"{base_url}/api/tags") + tags_response = await client.get(f"{base_url}/api/tags", headers=headers)
♻️ Duplicate comments (4)
python/tests/test_ollama_auth_token.py (1)
170-180: Same patch target issue as above.This test has the same patching concern.
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
142-176: Add missing Ollama auth/API fields toRAGSettingsPropsand removeas anycasts
ragSettingsis accessed with properties that are not declared on its type:
OLLAMA_CHAT_AUTH_TOKEN(lines 231, 233–235, 243, 249)OLLAMA_EMBEDDING_AUTH_TOKEN(lines 254, 256–258, 266, 272)OLLAMA_API_MODE(lines 1605–1607, 1613–1618, 1631–1633, 1638–1643)This forces usage of
(ragSettings as any)and violates the strict-typing guideline. It also makes it easy to accidentally break these flows without a type error.Define the fields on
ragSettingsand then remove theas anycasts and expression dependencies:interface RAGSettingsProps { ragSettings: { MODEL_CHOICE: string; CHAT_MODEL?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible'; USE_CONTEXTUAL_EMBEDDINGS: boolean; @@ - const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' }); - const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' }); + const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' }); + const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' }); @@ - const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || ''; @@ - }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]); + }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]); @@ - const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || ''; @@ - }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]); + }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);This keeps the Ollama auth/API-mode surface properly typed and removes the need for
any.Also applies to: 225-227, 231-235, 243-244, 249-249, 254-258, 266-272, 1596-1653
682-692: Remove hardcoded default URL guard so models load for the default instanceThe auto-load effects currently skip model loading for the default Ollama URL:
if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { loadAvailableModels(llmInstanceConfig.url, 'chat'); }Users running the default instance will not see available models auto-populated until they change the URL, which is confusing.
You can safely load models for any non-empty URL:
- useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(llmInstanceConfig.url, 'chat'); - } - }, [llmInstanceConfig.url]); + useEffect(() => { + if (llmInstanceConfig.url?.trim()) { + void loadAvailableModels(llmInstanceConfig.url, 'chat'); + } + }, [llmInstanceConfig.url]); @@ - useEffect(() => { - if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); - } - }, [embeddingInstanceConfig.url]); + useEffect(() => { + if (embeddingInstanceConfig.url?.trim()) { + void loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); + } + }, [embeddingInstanceConfig.url]);This aligns with earlier feedback and improves the out-of-the-box experience.
2360-2383: Validate auth tokens before saving whenuseAuthis enabledRight now, when
useAuthis checked but the token is empty or whitespace, the Save handlers still persist an empty token:
- LLM modal Save (lines 2395–2401)
- Embedding modal Save (lines 2485–2490)
This leads to a configuration that looks authenticated in the UI but always fails at runtime, without immediate feedback.
Add a simple validation gate before building
updatedSettingsin both handlers, e.g.:// LLM modal Save const Button onClick={async () => { - const updatedSettings = { - ...ragSettings, - LLM_BASE_URL: llmInstanceConfig.url, - OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' - }; + if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } + + const updatedSettings = { + ...ragSettings, + LLM_BASE_URL: llmInstanceConfig.url, + OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken.trim() : '' + }; @@ // Embedding modal Save - const updatedSettings = { - ...ragSettings, - OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url, - OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : '' - }; + if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } + + const updatedSettings = { + ...ragSettings, + OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url, + OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken.trim() : '' + };This prevents “silent failure” configs and gives users actionable feedback.
Also applies to: 2395-2402, 2450-2473, 2485-2491
🧹 Nitpick comments (3)
python/tests/test_ollama_auth_token.py (2)
30-40: Unused fixture:mock_credential_serviceThis fixture is defined but never used in any of the tests in this file. Either remove it or add tests that utilize credential service mocking for more comprehensive coverage.
- @pytest.fixture - def mock_credential_service(self): - """Mock credential service with auth token settings.""" - mock_service = MagicMock() - mock_service.get_credentials_by_category = AsyncMock(return_value={ - "LLM_BASE_URL": "http://localhost:11434", - "OLLAMA_EMBEDDING_URL": "http://localhost:11434", - "OLLAMA_CHAT_AUTH_TOKEN": "test-chat-token", - "OLLAMA_EMBEDDING_AUTH_TOKEN": "test-embedding-token", - }) - return mock_service -
62-64: Consider a clearer assertion forauth_tokenverification.The
orlogic works but makes debugging harder when tests fail. Consider checking which argument style was actually used:- # Check that auth_token was passed as keyword argument - assert call_args.kwargs.get("auth_token") == "test-auth-token" or \ - (len(call_args.args) > 1 and call_args.args[1] == "test-auth-token") + # Check that auth_token was passed (either as kwarg or positional) + auth_token_in_kwargs = call_args.kwargs.get("auth_token") == "test-auth-token" + auth_token_in_args = len(call_args.args) > 1 and call_args.args[1] == "test-auth-token" + assert auth_token_in_kwargs or auth_token_in_args, ( + f"auth_token not found in call. kwargs={call_args.kwargs}, args={call_args.args}" + )archon-ui-main/src/components/settings/RAGSettings.tsx (1)
212-221: Optionally DRY up the default Ollama URL usageThe default Ollama URL literal
'http://host.docker.internal:11434/v1'appears in several places despite the presence ofDEFAULT_OLLAMA_URL:
- Initial
llmInstanceConfigandembeddingInstanceConfig(lines 212, 218)- Auto-load effects (to be refactored per earlier comment)
- Page-load testing fallbacks (lines 1253–1263, 1285–1293)
To keep things consistent and easier to change later, you could centralize on
DEFAULT_OLLAMA_URL:const [llmInstanceConfig, setLLMInstanceConfig] = useState({ name: '', - url: ragSettings.LLM_BASE_URL || 'http://host.docker.internal:11434/v1', + url: ragSettings.LLM_BASE_URL || DEFAULT_OLLAMA_URL, useAuth: false, authToken: '' }); @@ const [embeddingInstanceConfig, setEmbeddingInstanceConfig] = useState({ name: '', - url: ragSettings.OLLAMA_EMBEDDING_URL || 'http://host.docker.internal:11434/v1', + url: ragSettings.OLLAMA_EMBEDDING_URL || DEFAULT_OLLAMA_URL, useAuth: false, authToken: '' });Not critical, but it avoids subtle drift if the default host ever needs to change.
Also applies to: 682-692, 1253-1263, 1285-1293
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
archon-ui-main/src/components/settings/RAGSettings.tsx(17 hunks)archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts(1 hunks)archon-ui-main/src/components/settings/utils/instanceConfigSync.ts(1 hunks)python/src/server/api_routes/ollama_api.py(4 hunks)python/src/server/services/llm_provider_service.py(5 hunks)python/tests/test_ollama_auth_token.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/components/settings/utils/instanceConfigSync.tsarchon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/components/settings/utils/instanceConfigSync.tsarchon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/components/settings/utils/instanceConfigSync.tsarchon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data
Files:
python/src/server/api_routes/ollama_api.pypython/src/server/services/llm_provider_service.py
python/src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration
python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety
Files:
python/src/server/api_routes/ollama_api.pypython/src/server/services/llm_provider_service.py
python/src/server/api_routes/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/api_routes/**/*.py: Follow the API route → Service → Database pattern for new API endpoints
Use database values directly without frontend mapping to maintain type-safe end-to-end communication from backend upward
Files:
python/src/server/api_routes/ollama_api.py
python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations
Files:
python/src/server/api_routes/ollama_api.pypython/tests/test_ollama_auth_token.pypython/src/server/services/llm_provider_service.py
archon-ui-main/src/**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Vitest with React Testing Library for frontend testing
Files:
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
archon-ui-main/**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use Vitest with React Testing Library for frontend testing
Files:
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
python/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use Pytest for testing with async support in Python backend
Use Pytest for testing Python code with async support
Files:
python/tests/test_ollama_auth_token.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations
Files:
python/src/server/services/llm_provider_service.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: zaksnet
Repo: coleam00/Archon PR: 473
File: python/src/providers_clean/api/routes/post_migrate_model_config.py:7-9
Timestamp: 2025-09-10T06:45:10.853Z
Learning: User zaksnet prefers to handle authentication/authorization in separate PRs rather than mixing security infrastructure with feature development.
📚 Learning: 2025-11-29T19:25:01.813Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: python/src/agent_work_orders/CLAUDE.md:0-0
Timestamp: 2025-11-29T19:25:01.813Z
Learning: Applies to python/src/agent_work_orders/**/*.py : Do not log sensitive data such as passwords, API keys, or tokens; mask them using format like `api_key[:8] + "..."`
Applied to files:
python/src/server/api_routes/ollama_api.py
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/**/*.test.{ts,tsx} : Use Vitest with React Testing Library for frontend testing
Applied to files:
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{test,spec}.{ts,tsx} : Use Vitest with React Testing Library for frontend testing
Applied to files:
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (4)
python/src/server/api_routes/ollama_api.py (2)
python/src/server/services/credential_service.py (1)
get_credentials_by_category(323-369)python/src/server/services/ollama/model_discovery_service.py (3)
discover_models_from_multiple_instances(1026-1140)check_instance_health(964-1024)discover_models(122-243)
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (2)
InstanceConfig(5-10)syncEmbeddingFromLLM(18-28)
python/src/server/services/llm_provider_service.py (3)
python/src/server/services/credential_service.py (1)
get_credentials_by_category(323-369)python/src/server/api_routes/settings_api.py (1)
get_credentials_by_category(78-93)python/src/server/services/ollama/model_discovery_service.py (1)
check_instance_health(964-1024)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (14)
python/src/server/api_routes/ollama_api.py (3)
99-143: Auth token propagation looks correct.The token extraction from RAG settings, URL normalization, and mapping to
auth_tokens_mapis implemented correctly. The normalized URL comparison ensures consistent token matching.
181-218: Health check auth token handling is correct.The implementation properly extracts tokens from RAG settings, normalizes URLs for consistent matching, and passes the appropriate token to
check_instance_health(). Debug logging only outputs keys/URLs, not secret values.
277-308: Validation endpoint correctly propagates auth tokens.The auth token is properly extracted, matched to the appropriate URL, and passed through both
validate_provider_instance()anddiscover_models()calls.python/src/server/services/llm_provider_service.py (4)
415-441: Ollama fallback auth token handling looks correct.The token selection based on
use_embedding_provideris appropriate, and the "required-but-ignored" fallback handles Ollama instances without auth enabled. Token values are not logged (only length would be logged in the main Ollama path).
455-480: Ollama auth token handling is well implemented.Token selection based on operation type (
use_embedding_providerorinstance_type == "embedding") is correct. Logging token length (not the token itself) is appropriate for debugging while maintaining security. Based on learnings, this follows the guideline to "mask [tokens] using format likeapi_key[:8] + \"...\"" — logging only length is similarly safe.
616-622: Good addition of debugging logs for URL resolution.The logging at lines 618 and 621-622 will help diagnose embedding URL configuration issues without exposing sensitive data.
1211-1235:validate_provider_instancecorrectly extended withauth_tokensupport.The function signature, docstring, and implementation properly handle the optional
auth_tokenparameter and pass it through tocheck_instance_health().python/tests/test_ollama_auth_token.py (1)
133-144: Patch target may not intercept the actualhttpx.AsyncClientusage.The patch should target where
httpx.AsyncClientis used (in the service module), not where it's defined. Verify the import style inmodel_discovery_service.py—if it usesimport httpx, the patch target should be the full qualified path:src.server.services.ollama.model_discovery_service.httpx.AsyncClient.archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
5-28: Utility and typing look solid for config mirroring
InstanceConfigandsyncEmbeddingFromLLMare minimal, type-safe, and behave as expected (copy + default name fallback, no mutation). They fit the intended “single source of truth” pattern for LLM/Embedding instance configs.archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)
1-90: Good, comprehensive tests aligned with utility behaviorThese Vitest tests exercise the main paths (field copy, default/custom names, auth edge cases, immutability) and match
syncEmbeddingFromLLM’s contract. They also follow the Vitest testing guideline for frontend code.archon-ui-main/src/components/settings/RAGSettings.tsx (4)
571-585: Ollama metrics and model loading wiring looks goodThe expanded
ollamaMetricsstate (per-instance counts + available model lists) andloadAvailableModelshelper are cleanly structured and reused from both auto-load effects andmanualTestConnection. Using instance URL normalization and separate chat/embedding contexts should make the UI’s “available models” views reliable.Also applies to: 641-680, 885-903
189-221: Propagating auth settings and usingsyncEmbeddingFromLLMis a nice UX improvementThe new
useAuth/authTokenfields onllmInstanceConfigandembeddingInstanceConfig, coupled with:
- the “Use same host for embedding instance” checkbox wired to
syncEmbeddingFromLLM(llmInstanceConfig)(line 2350), and- per-modal auth sections with password inputs (lines 2360–2383, 2450–2473), and
- persistence into
ragSettingsviaOLLAMA_CHAT_AUTH_TOKEN/OLLAMA_EMBEDDING_AUTH_TOKEN(lines 2396–2400, 2486–2490),make the Ollama auth story much clearer and consistent across chat and embeddings.
Once the typings for those ragSettings fields are added (see earlier comment), this whole flow will be in good shape.
Also applies to: 2350-2351, 2450-2473, 2485-2491
1596-1653: API mode toggle UX for Ollama is clear and self-containedThe new “Ollama API Mode” section with explicit “Native Ollama API” vs “OpenAI-Compatible” buttons is a nice addition:
- State is stored on
ragSettings.OLLAMA_API_MODE.- Visual selection and radio-circle affordance make the current mode unambiguous.
- Helper text (“Uses /api/embeddings endpoint” vs “Uses /v1/embeddings endpoint”) sets expectations well.
Once the
OLLAMA_API_MODEfield is added toRAGSettingsProps, this section is in good shape.
145-145: The specific rendering issue described does not appear in the current codeThe exact code patterns referenced in this review comment—
ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span>rendering,activeSelection === 'chat'logic, and thegetDisplayedChatModel()function—do not exist in the current RAGSettings.tsx implementation. Line 145 only contains the field definitionCHAT_MODEL?: string;as part of a type, not active rendering logic.If there is actual misalignment between
CHAT_MODELandMODEL_CHOICEfield usage in the component, it should be verified against the actual current implementation and resubmitted with specifics about where in the code the issue manifests.Likely an incorrect or invalid review comment.
|
|
||
|
|
||
| async def validate_provider_instance(provider: str, instance_url: str | None = None) -> dict[str, any]: | ||
| async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]: |
There was a problem hiding this comment.
Type hint uses any instead of Any.
The return type annotation dict[str, any] uses lowercase any, which refers to the built-in any() function, not the Any type. This may cause type checking issues with MyPy.
-async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]:
+async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]: | |
| async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]: |
🤖 Prompt for AI Agents
In python/src/server/services/llm_provider_service.py around line 1211, the
return type annotation uses the builtin any instead of the typing Any; change
the annotation from dict[str, any] to dict[str, Any] and ensure you import Any
(e.g. add "from typing import Any" at the top of the file if it's not already
imported) so static type checkers like MyPy recognize the correct type.
Merges OpenRouter embedding provider feature from upstream. Resolves conflict in llm_provider_service.py by keeping auth_token parameter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Addresses CodeRabbit review: CHAT_MODEL was undefined because MODEL_CHOICE is the actual key used. Now uses helper functions that correctly resolve the displayed model name. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
16-20: Remove OpenRouter from EMBEDDING_CAPABLE_PROVIDERS
EMBEDDING_CAPABLE_PROVIDERSincludes'openrouter', but OpenRouter does not provide embedding models—it only acts as a proxy for LLM chat completions. This inclusion allows users to select OpenRouter as an embedding provider in the UI, which the backend cannot serve.Revert to:
-const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama']; +const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];If OpenRouter embedding support is needed in the future, implement it with a feature flag once backend support is confirmed.
♻️ Duplicate comments (5)
python/src/server/services/llm_provider_service.py (1)
1211-1211: Type hint usesanyinstead ofAny(previously flagged).This issue was already identified in a previous review. The return type annotation uses lowercase
any(the built-in function) instead ofAnyfrom the typing module, which will cause MyPy type checking issues.Apply the previously suggested fix:
-async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]: +async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]:Ensure
Anyis imported at the top of the file (it's already imported on line 11).archon-ui-main/src/components/settings/RAGSettings.tsx (4)
142-175: Type Ollama auth/API-mode fields inRAGSettingsPropsand drop(ragSettings as any)castsYou’re now reading/writing
ragSettings.OLLAMA_CHAT_AUTH_TOKEN,ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN, andragSettings.OLLAMA_API_MODE, but:
- These keys are not declared in
RAGSettingsProps['ragSettings'].- The effects rely on
(ragSettings as any).OLLAMA_*_AUTH_TOKENboth in the body and dependency arrays, which breaks strict typing and defeats the “no implicit any” goal.This is essentially the same issue the previous review already flagged for the auth tokens. Suggest tightening the types and removing the
anycasts:interface RAGSettingsProps { ragSettings: { MODEL_CHOICE: string; CHAT_MODEL?: string; @@ EMBEDDING_MODEL?: string; EMBEDDING_PROVIDER?: string; OLLAMA_EMBEDDING_URL?: string; OLLAMA_EMBEDDING_INSTANCE_NAME?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible'; @@ }; setRagSettings: (settings: any) => void; } @@ - const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' }); - const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' }); + const lastLLMConfigRef = useRef<{ url: string; name: string; authToken: string }>({ + url: '', + name: '', + authToken: '', + }); + const lastEmbeddingConfigRef = useRef<{ url: string; name: string; authToken: string }>({ + url: '', + name: '', + authToken: '', + }); @@ useEffect(() => { const newLLMUrl = ragSettings.LLM_BASE_URL || ''; const newLLMName = ragSettings.LLM_INSTANCE_NAME || ''; - const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || ''; @@ - }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]); + }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]); @@ useEffect(() => { const newEmbeddingUrl = ragSettings.OLLAMA_EMBEDDING_URL || ''; const newEmbeddingName = ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME || ''; - const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || ''; @@ - }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]); + }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);This keeps the state-sync logic the same but restores type safety and satisfies strict TS in this file.
Also applies to: 225-227, 231-244, 249-272
641-679: Auto-load Ollama models should not special‑case the default URL
loadAvailableModelsitself looks good, but the auto-load effects still skip loading when the URL equals the hardcoded default'http://host.docker.internal:11434/v1'. That means users running a working default Ollama instance won’t see any models until they tweak the URL.You already have
DEFAULT_OLLAMA_URL; I’d drop the special-case and only require a non-empty, trimmed URL:// Auto-load available models when URLs are configured useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(llmInstanceConfig.url, 'chat'); + if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) { + void loadAvailableModels(llmInstanceConfig.url, 'chat'); } }, [llmInstanceConfig.url]); useEffect(() => { - if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); + if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) { + void loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); } }, [embeddingInstanceConfig.url]);This also aligns with the earlier review suggestion to avoid hardcoding that URL.
Also applies to: 682-692
787-807: Keep instance config shape consistent and clear auth fields when deleting instances
llmInstanceConfig/embeddingInstanceConfignow includeuseAuthandauthToken, but some paths still construct configs with{ name, url }only and deletion doesn’t clear the persisted auth fields:
handleDeleteLLMInstanceandhandleDeleteEmbeddingInstancereset state to{ name: '', url: '' }, droppinguseAuthandauthToken.- Deletion only deletes
LLM_BASE_URL,MODEL_CHOICE,OLLAMA_EMBEDDING_URL, andEMBEDDING_MODELfromragSettings; instance names and auth tokens remain.- The “auto-populate embedding instance” block on LLM URL change also only sets
{ name, url }, ignoring auth flags, while you already havesyncEmbeddingFromLLMfor shape-safe syncing.This can lead to inconsistent state shape and “ghost” tokens surviving after delete. Consider:
const handleDeleteLLMInstance = () => { @@ - setLLMInstanceConfig({ - name: '', - url: '' - }); + setLLMInstanceConfig({ + name: '', + url: '', + useAuth: false, + authToken: '' + }); @@ const updatedSettings = { ...ragSettings }; delete updatedSettings.LLM_BASE_URL; delete updatedSettings.MODEL_CHOICE; + delete updatedSettings.LLM_INSTANCE_NAME; + delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN; @@ const handleDeleteEmbeddingInstance = () => { @@ - setEmbeddingInstanceConfig({ - name: '', - url: '' - }); + setEmbeddingInstanceConfig({ + name: '', + url: '', + useAuth: false, + authToken: '' + }); @@ const updatedSettings = { ...ragSettings }; delete updatedSettings.OLLAMA_EMBEDDING_URL; delete updatedSettings.EMBEDDING_MODEL; + delete updatedSettings.OLLAMA_EMBEDDING_INSTANCE_NAME; + delete updatedSettings.OLLAMA_EMBEDDING_AUTH_TOKEN; @@ - // Auto-populate embedding instance if it's empty (convenience for single-host users) - if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) { - setEmbeddingInstanceConfig({ - name: llmInstanceConfig.name || 'Default Ollama', - url: newUrl - }); - } + // Auto-populate embedding instance if it's empty (convenience for single-host users) + if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) { + // Reuse shared helper to keep shape (including auth flags) consistent + setEmbeddingInstanceConfig(syncEmbeddingFromLLM({ ...llmInstanceConfig, url: newUrl })); + } @@ - // Sync embedding instance with LLM instance (including auth settings) - setEmbeddingInstanceConfig(syncEmbeddingFromLLM(llmInstanceConfig)); + // Sync embedding instance with LLM instance (including auth settings) + setEmbeddingInstanceConfig(syncEmbeddingFromLLM(llmInstanceConfig));This keeps config state structurally consistent and ensures deleting an instance actually clears its auth settings as well.
[highlight: this overlaps with the earlier review about instance config shape.]Also applies to: 809-829, 2331-2335, 2349-2351
2360-2383: Validate auth tokens before saving whenUse Authenticationis enabledThe LLM and Embedding edit modals now let users toggle “Use Authentication” and enter tokens, and you persist them into
OLLAMA_CHAT_AUTH_TOKEN/OLLAMA_EMBEDDING_AUTH_TOKEN. However, ifuseAuthis true and the token input is empty/whitespace, you still write an empty token toragSettings. This matches an earlier review concern and can lead to confusing failures when the backend attempts authenticated calls with an empty header.Add a small guard before constructing
updatedSettingsin both modals:<Button onClick={async () => { + if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } const updatedSettings = { ...ragSettings, LLM_BASE_URL: llmInstanceConfig.url, OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' }; @@ <Button onClick={async () => { + if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } const updatedSettings = { ...ragSettings, OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url, OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : '' };This keeps the UX tight and avoids persisting obviously invalid credentials.
Also applies to: 2395-2401, 2485-2491
🧹 Nitpick comments (4)
python/src/server/services/llm_provider_service.py (2)
415-441: Consider extracting token resolution logic to reduce duplication.The token resolution pattern here is duplicated at lines 455-480. Both segments fetch RAG settings, select the appropriate token (embedding vs chat), and apply the same "required-but-ignored" fallback logic.
Consider extracting this into a helper function:
+async def _resolve_ollama_auth_token(use_embedding_provider: bool, instance_type: str | None = None) -> str: + """Resolve Ollama auth token based on operation type.""" + cache_key = "rag_strategy_settings" + rag_settings = _get_cached_settings(cache_key) + if rag_settings is None: + rag_settings = await credential_service.get_credentials_by_category("rag_strategy") + _set_cached_settings(cache_key, rag_settings) + + # Select token based on operation type + if use_embedding_provider or instance_type == "embedding": + token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") + logger.info(f"Using OLLAMA_EMBEDDING_AUTH_TOKEN (length: {len(token) if token else 0})") + else: + token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "") + logger.info(f"Using OLLAMA_CHAT_AUTH_TOKEN (length: {len(token) if token else 0})") + + # Use placeholder if no token is set + if not token: + token = "required-but-ignored" + logger.warning("No Ollama auth token found, using placeholder 'required-but-ignored'") + + return tokenThen simplify both call sites:
- # Check for auth token in RAG settings for fallback - cache_key = "rag_strategy_settings" - rag_settings = _get_cached_settings(cache_key) - if rag_settings is None: - rag_settings = await credential_service.get_credentials_by_category("rag_strategy") - _set_cached_settings(cache_key, rag_settings) - - # Get correct auth token based on operation type - if use_embedding_provider: - ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") - else: - ollama_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "") - - # Use "required-but-ignored" as default if no token is set (Ollama doesn't validate when auth is disabled) - if not ollama_auth_token: - ollama_auth_token = "required-but-ignored" + ollama_auth_token = await _resolve_ollama_auth_token(use_embedding_provider)
618-622: Consider consolidating the logging statements.Lines 618 and 621 log very similar information about the embedding URL resolution. This could be simplified to a single log statement for clarity.
embedding_url = rag_settings.get("OLLAMA_EMBEDDING_URL") - logger.info(f"Embedding URL from settings: {embedding_url}") if embedding_url: final_url = embedding_url if embedding_url.endswith('/v1') else f"{embedding_url}/v1" - logger.info(f"Resolved embedding URL: {final_url}") + logger.info(f"Resolved embedding URL: {embedding_url} → {final_url}") return final_urlarchon-ui-main/src/components/settings/RAGSettings.tsx (2)
572-584: Ollama per-instance metrics + available model lists look consistentThe extended
ollamaMetricsshape (per-instance counts for both LLM and embedding plusllmAvailableModels/embeddingAvailableModels) and the corresponding updates infetchOllamaMetricsare internally consistent and line up with how you’re rendering:
- Summary table rows use
embeddingInstanceModelssafely.- The “Available on this instance” row pulls counts by context, which matches the per-instance split.
If you don’t end up wiring
llmAvailableModels/embeddingAvailableModelsinto the selection modals, you could later trim those from state to reduce surface area, but as-is this structure is fine and future‑proof.Also applies to: 885-903
731-740: Toast model counts inmanualTestConnectionmay be stale vs per-call instance statusInside
manualTestConnection, the success toast usesollamaMetrics.llmInstanceModels/embeddingInstanceModelsto derivemodelCount, but those metrics are only refreshed later viafetchOllamaMetrics()and may still be at their initial or previous values. This can yield messages like “0 chat models available” even when the instance has models and the backend returned a model count.Since
instanceStatusfrom the health endpoint already containsmodels_available, consider basing the toast count on that and only using the metrics for more detailed UIs:- // Context-aware model count display - let modelCount = instanceStatus.models_available || 0; - let modelType = 'models'; - - if (context === 'chat') { - modelCount = ollamaMetrics.llmInstanceModels?.chat || 0; - modelType = 'chat models'; - } else if (context === 'embedding') { - modelCount = ollamaMetrics.embeddingInstanceModels?.embedding || 0; - modelType = 'embedding models'; - } + // Context-aware model count display + let modelCount = instanceStatus.models_available || 0; + let modelType = 'models'; + if (context === 'chat') { + modelType = 'chat models'; + } else if (context === 'embedding') { + modelType = 'embedding models'; + } @@ - showToast(`${instanceName} connection successful: ${modelCount} ${modelType} available (${responseTime}ms)`, 'success'); + showToast( + `${instanceName} connection successful: ${modelCount} ${modelType} available (${responseTime}ms)`, + 'success' + ); @@ - // Load available models after successful health check - if (context) { - await loadAvailableModels(url, context); - } + // Load available models after successful health check + if (context) { + await loadAvailableModels(url, context); + }This keeps the metrics fetch for richer views while making the immediate toast reflect the actual response.
Also applies to: 752-755
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
archon-ui-main/src/components/settings/RAGSettings.tsx(17 hunks)python/src/server/services/llm_provider_service.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data
Files:
python/src/server/services/llm_provider_service.py
python/src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration
python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety
Files:
python/src/server/services/llm_provider_service.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations
Files:
python/src/server/services/llm_provider_service.py
python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations
Files:
python/src/server/services/llm_provider_service.py
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (3)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (2)
python/src/server/services/llm_provider_service.py (3)
python/src/server/services/credential_service.py (1)
get_credentials_by_category(323-369)python/src/server/api_routes/settings_api.py (1)
get_credentials_by_category(78-93)python/src/server/services/ollama/model_discovery_service.py (1)
check_instance_health(964-1024)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (4)
python/src/server/services/llm_provider_service.py (2)
455-480: Token resolution logic is correct.The token selection correctly handles both
use_embedding_providerandinstance_type == "embedding"cases, ensuring the appropriate token is used for each operation type. The fallback to "required-but-ignored" is reasonable for Ollama instances without authentication enabled.Note: The logging on lines 465 and 468 reveals which specific token type (embedding vs chat) is being used. While this doesn't expose the token value itself, consider whether this information disclosure is acceptable in your security model.
1218-1218: LGTM! Authentication token parameter properly documented and propagated.The
auth_tokenparameter is correctly documented in the function docstring and properly passed through to thecheck_instance_healthservice method on line 1235.Also applies to: 1235-1235
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
1594-1653: API mode UI is solid; just ensureOLLAMA_API_MODEis typed and persisted consistentlyThe “Ollama API Mode” toggle is a nice UX touch and correctly treats
undefinedas “native” by default. With the earlier type additions toRAGSettingsProps, this section becomes type-safe.Nothing to change here beyond making sure
OLLAMA_API_MODE?: 'native' | 'openai-compatible'is added to theragSettingstype (see previous comment) so these reads/writes don’t rely on structural typing.
1846-1873: Authentication and selected-model rows in the summary table are well designedThe new “Authentication” row correctly displays only a boolean status (“Token configured” vs “No authentication”) and avoids leaking token contents. Likewise, the “Selected Model” row now gracefully falls back to a “Not selected” placeholder for both chat and embeddings, which will help users debug misconfigurations.
No code changes needed here.
Also applies to: 1890-1899
- Add useAuth/authToken to instance config resets (RAGSettings.tsx) - Clear auth tokens from ragSettings when deleting instances - Fix type hint: dict[str, any] -> dict[str, Any] (llm_provider_service.py) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- ArchonChatPanel: Remove BookOpen, Search - BugReportModal: Remove ExternalLink - ErrorBoundaryWithBugReport: Remove React (use named imports) - CodeViewerModal: Remove FileText - DisconnectScreenOverlay: Remove Wifi, WifiOff - APIKeysSection: Remove Key, Input 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
142-176: Add Ollama auth/API mode fields to RAGSettingsProps and removeas anycastsThe component now reads/writes
OLLAMA_CHAT_AUTH_TOKEN,OLLAMA_EMBEDDING_AUTH_TOKEN, andOLLAMA_API_MODEviaragSettings, but these keys are not declared onRAGSettingsProps["ragSettings"]. Accessing them via(ragSettings as any)both weakens type safety (violating strict mode) and makes it easy to desync UI from persisted settings. This also meansragSettings.OLLAMA_API_MODEis currently a type error.You can fix this by extending the interface and wiring the effects to the typed fields, eliminating the
as anyusage:interface RAGSettingsProps { ragSettings: { MODEL_CHOICE: string; - CHAT_MODEL?: string; + CHAT_MODEL?: string; USE_CONTEXTUAL_EMBEDDINGS: boolean; @@ EMBEDDING_MODEL?: string; EMBEDDING_PROVIDER?: string; OLLAMA_EMBEDDING_URL?: string; OLLAMA_EMBEDDING_INSTANCE_NAME?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible'; @@ }; setRagSettings: (settings: any) => void; } @@ useEffect(() => { const newLLMUrl = ragSettings.LLM_BASE_URL || ''; const newLLMName = ragSettings.LLM_INSTANCE_NAME || ''; - const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || ''; @@ - }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]); + }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]); @@ useEffect(() => { const newEmbeddingUrl = ragSettings.OLLAMA_EMBEDDING_URL || ''; const newEmbeddingName = ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME || ''; - const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || ''; + const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || ''; @@ - }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]); + }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);This keeps the props contract honest and satisfies strict TypeScript without resorting to
any.Also applies to: 223-227, 228-249, 251-272, 1600-1659, 2404-2410, 2493-2499
641-693: Auto‑loading available models still skips the default Ollama URL
loadAvailableModelsis a nice addition and the metrics wiring (llmAvailableModels/embeddingAvailableModels) looks good, but the auto‑load effects still short‑circuit when the URL equals the defaulthttp://host.docker.internal:11434/v1. This means users running Ollama on the default host won’t see model lists until they manually tweak the URL or hit “Test Connection”.You can address this and simplify the condition by loading models whenever a non‑empty URL is present:
// Auto-load available models when URLs are configured useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(llmInstanceConfig.url, 'chat'); - } + if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) { + void loadAvailableModels(llmInstanceConfig.url, 'chat'); + } }, [llmInstanceConfig.url]); useEffect(() => { - if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); - } + if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) { + void loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); + } }, [embeddingInstanceConfig.url]);This preserves the debounce behavior (hook still keys off URL changes) and makes the default local instance behave like any other.
Also applies to: 752-755, 572-584, 891-909
2368-2391: Require non‑empty auth tokens when authentication is enabled for LLM/embedding instancesThe new auth controls in both edit modals correctly wire
useAuth/authTokeninto local state and persistOLLAMA_CHAT_AUTH_TOKEN/OLLAMA_EMBEDDING_AUTH_TOKEN. However, if a user enables “Use Authentication” but leaves the token empty, the code still proceeds to “save” and pushes an empty string intoragSettings(effectively disabling auth) without any feedback.This is the same edge case highlighted in the prior review; adding lightweight validation before constructing
updatedSettingswill prevent confusing failures:// LLM instance modal save - <Button - onClick={async () => { - const updatedSettings = { + <Button + onClick={async () => { + if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } + const updatedSettings = { ...ragSettings, LLM_BASE_URL: llmInstanceConfig.url, OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' }; setRagSettings(updatedSettings); @@ // Embedding instance modal save - <Button - onClick={async () => { - const updatedSettings = { + <Button + onClick={async () => { + if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } + const updatedSettings = { ...ragSettings, OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url, OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : '' }; setRagSettings(updatedSettings);That keeps the UX aligned with the toggles: “Use Authentication” can’t be turned on with an empty token.
Also applies to: 2403-2432, 2458-2481, 2493-2521
🧹 Nitpick comments (4)
archon-ui-main/src/components/bug-report/BugReportModal.tsx (1)
230-235: Consider removing theas anytype assertion.Line 233 uses
as anywhich bypasses TypeScript's type checking. Theseverityfield should have a proper union type matching the dropdown options.As per coding guidelines, TypeScript strict mode with no implicit
anyshould be enforced.You can define a proper type for severity and use it instead:
type BugSeverity = "low" | "medium" | "high" | "critical";Then update the cast to:
onChange={(e) => setReport((r) => ({ ...r, - severity: e.target.value as any, + severity: e.target.value as BugSeverity, })) }python/src/server/services/llm_provider_service.py (2)
399-445: Ollama auth token resolution and fallback behavior look consistentThe new logic to:
- Resolve
OLLAMA_CHAT_AUTH_TOKEN/OLLAMA_EMBEDDING_AUTH_TOKENfrom cachedrag_strategysettings,- Fall back to a
"required-but-ignored"placeholder when no token is configured, and- Use that value as
api_keyfor both the OpenAI→Ollama fallback and direct"ollama"provider paths,is coherent and avoids logging the actual token (only lengths are logged). This should work whether auth is disabled (placeholder ignored by Ollama) or enabled (real token present).
If you find this pattern spreading, consider extracting a small helper like
get_ollama_auth_token(use_embedding_provider: bool, rag_settings: dict) -> strto keep the token selection and placeholder rules in one place, but it’s not strictly necessary right now.Also applies to: 447-481
1211-1235: validate_provider_instance auth_token plumbing is correct; consider richer error logsExtending
validate_provider_instanceto acceptauth_tokenand forwarding it intomodel_discovery_service.check_instance_health(instance_url, auth_token=auth_token)correctly enables per‑instance authenticated health checks for Ollama. The return shape and non‑Ollama branches are preserved, and the return type now usesAny(fixing the earlieranytypo).To better align with the backend logging guidelines, consider adding
exc_info=Trueto the error log in theexceptblock so health‑check failures preserve full stack traces:- except Exception as e: - logger.error(f"Error validating provider {provider}: {e}") + except Exception as e: + logger.error(f"Error validating provider {provider}: {e}", exc_info=True)Also applies to: 1275-1285
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
1852-1879: Align “Authentication” summary row withuseAuthstate, not just token presenceThe “Authentication” row in the Ollama configuration summary currently reports “Token configured” purely based on whether
authTokenis non‑empty, ignoringuseAuth. Because you intentionally keep the token around in local state even whenuseAuthis toggled off, this can show “Token configured” while authentication is actually disabled.You can make the summary reflect the real effective state by incorporating
useAuth:- <tr> - <td className="py-2 text-gray-400">Authentication</td> - <td className="py-2"> - {activeSelection === 'chat' ? ( - llmInstanceConfig.authToken ? ( + <tr> + <td className="py-2 text-gray-400">Authentication</td> + <td className="py-2"> + {activeSelection === 'chat' ? ( + llmInstanceConfig.useAuth && llmInstanceConfig.authToken ? ( <span className="text-teal-400 flex items-center"> @@ ) : ( <span className="text-gray-500 italic">No authentication</span> ) ) : ( - embeddingInstanceConfig.authToken ? ( + embeddingInstanceConfig.useAuth && embeddingInstanceConfig.authToken ? ( <span className="text-teal-400 flex items-center"> @@ ) : ( <span className="text-gray-500 italic">No authentication</span> ) )} </td> </tr>This keeps “Token configured” in sync with the toggle while still letting you cache the token client‑side for convenience.
Also applies to: 2368-2390, 2458-2480
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
archon-ui-main/src/components/DisconnectScreenOverlay.tsx(1 hunks)archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx(1 hunks)archon-ui-main/src/components/bug-report/BugReportModal.tsx(1 hunks)archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx(1 hunks)archon-ui-main/src/components/code/CodeViewerModal.tsx(0 hunks)archon-ui-main/src/components/settings/APIKeysSection.tsx(1 hunks)archon-ui-main/src/components/settings/RAGSettings.tsx(20 hunks)python/src/server/services/llm_provider_service.py(5 hunks)
💤 Files with no reviewable changes (1)
- archon-ui-main/src/components/code/CodeViewerModal.tsx
🧰 Additional context used
📓 Path-based instructions (7)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/DisconnectScreenOverlay.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsxarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/DisconnectScreenOverlay.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsxarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/DisconnectScreenOverlay.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsxarchon-ui-main/src/components/settings/RAGSettings.tsx
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data
Files:
python/src/server/services/llm_provider_service.py
python/src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration
python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety
Files:
python/src/server/services/llm_provider_service.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations
Files:
python/src/server/services/llm_provider_service.py
python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations
Files:
python/src/server/services/llm_provider_service.py
🧠 Learnings (12)
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsxarchon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Use Radix UI primitives from src/features/ui/primitives/ for new UI components in features directory
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/!(features)/**/*.{ts,tsx} : Use ESLint with standard React rules for legacy code in the frontend
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/settings/APIKeysSection.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsxarchon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx,js,jsx} : Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsxarchon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Apply Tron-inspired glassmorphism styling with Tailwind and follow responsive design patterns with mobile-first approach
Applied to files:
archon-ui-main/src/components/bug-report/BugReportModal.tsx
📚 Learning: 2025-09-04T16:30:05.227Z
Learnt from: stevepresley
Repo: coleam00/Archon PR: 573
File: archon-ui-main/src/config/api.ts:15-25
Timestamp: 2025-09-04T16:30:05.227Z
Learning: Archon UI API config: Prefer lazy getters getApiFullUrl() and getWsUrl() over module-load constants to avoid SSR/test crashes. Avoid CommonJS exports patterns (Object.defineProperty(exports,…)) in ESM. Add typeof window guards with VITE_API_URL fallback inside getApiUrl()/getWebSocketUrl() when SSR safety is required.
Applied to files:
archon-ui-main/src/components/settings/APIKeysSection.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Apply Tron-inspired glassmorphism styling with Tailwind in features directory UI components
Applied to files:
archon-ui-main/src/components/settings/APIKeysSection.tsx
📚 Learning: 2025-08-28T12:03:56.388Z
Learnt from: Wirasm
Repo: coleam00/Archon PR: 514
File: archon-ui-main/src/components/layouts/ArchonChatPanel.tsx:71-81
Timestamp: 2025-08-28T12:03:56.388Z
Learning: The agent chat service is disabled in the Archon codebase, so optimizations or fixes related to agent chat functionality may not be necessary to implement.
Applied to files:
archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Use TanStack Query for all data fetching with smart HTTP polling where appropriate
Applied to files:
archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)
🔇 Additional comments (7)
archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx (1)
2-2: LGTM! Clean removal of unused imports.The
BookOpenandSearchicons are not referenced anywhere in the component, so removing them from the import statement is appropriate and improves code cleanliness.archon-ui-main/src/components/DisconnectScreenOverlay.tsx (1)
2-2: LGTM! Cleanup of unused imports.The removal of
WifiandWifiOffimports is correct since only theXicon is used in the component (line 40). This cleanup improves code maintainability.archon-ui-main/src/components/bug-report/BugReportModal.tsx (1)
3-3: LGTM! Clean removal of unused import.The
ExternalLinkicon was not referenced anywhere in the component, so removing it from the import is appropriate cleanup.archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx (1)
1-1: LGTM! Modern import pattern aligns with new JSX transform.The refactoring to named imports is correct and aligns with the new JSX runtime. All necessary imports (Component, ErrorInfo, ReactNode) are present and properly used throughout the file.
archon-ui-main/src/components/settings/APIKeysSection.tsx (1)
2-2: Import surface reduction looks goodRemoving unused icons from the
lucide-reactimport keeps the bundle lean and avoids unused‑symbol lint noise. No behavioral changes introduced.python/src/server/services/llm_provider_service.py (1)
612-622: Embedding URL logging is helpful and safeThe additional logging of
OLLAMA_EMBEDDING_URLand the resolved/v1URL in_get_optimal_ollama_instanceimproves observability for routing issues without exposing secrets. Behavior for chat vs embedding URLs and the fallback toLLM_BASE_URLremains unchanged.archon-ui-main/src/components/settings/RAGSettings.tsx (1)
1600-1659: Ollama API mode toggle and instance auth state wiring look solidThe new “Ollama API Mode” radio group correctly writes
OLLAMA_API_MODEas either'native'or'openai-compatible'intoragSettings, and the top‑level “Save Settings” path will persist it viacredentialsService.updateRagSettings(updatedSettings).Similarly, extending
llmInstanceConfig/embeddingInstanceConfigwithuseAuthandauthToken, ensuring deletion resets include those fields, and usingsyncEmbeddingFromLLM(llmInstanceConfig)when “Use same host for embedding instance” is checked all keep the instance configuration shape consistent and address the earlier state‑shape/reset concerns.No changes needed here beyond the type additions already suggested.
Also applies to: 210-221, 217-221, 2357-2358, 787-821
- Add pymupdf4llm>=0.2.5 as primary PDF extraction library - Produces clean Markdown with proper word separation - Falls back to pdfplumber, then PyPDF2 if needed - Validated: pymupdf4llm extracts "Since 1996, Bloomberg" vs pdfplumber's "Since1996,Bloomberg" (no spaces) Phase0 validated, Phase1 complete. Phase2 (OCR) pending. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…de block protection - Add heading boundary (\n#) as highest priority breakpoint - Protect code blocks from being split mid-block - Add is_inside_code_block() helper to check if position is within ```...``` - Only break at paragraphs/sentences when outside code blocks This improves RAG quality for Markdown documents (especially from pymupdf4llm) by preserving semantic structure during chunking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…dels BREAKING: Search now supports all embedding dimensions (384, 768, 1024, 1536, 3072) Previously search was hardcoded to use 1536-dim functions, which meant: - snowflake-arctic-embed2 (1024-dim) → NO RESULTS - nomic-embed-text (768-dim) → NO RESULTS - Only OpenAI text-embedding-3-small (1536-dim) worked Fix: - Detect dimension from query_embedding: len(query_embedding) - Use _multi RPC functions with embedding_dimension parameter - base_search_strategy.py: match_archon_*_multi - hybrid_search_strategy.py: hybrid_search_*_multi 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds automatic OCR extraction for scanned/image-based PDFs that don't contain extractable text layers. New files: - ocr_processing.py: Tesseract OCR engine wrapper - extract_text_with_ocr(): Convert PDF pages to images, run OCR - is_ocr_available(): Check for pytesseract/pdf2image - get_supported_languages(): List installed Tesseract languages Changes: - document_processing.py: OCR fallback when text extraction fails - pyproject.toml: Add pytesseract>=0.3.10, pdf2image>=1.17.0 Flow: pymupdf4llm → pdfplumber → PyPDF2 → Tesseract OCR Requires system dependencies: - macOS: brew install tesseract poppler - Linux: apt install tesseract-ocr poppler-utils 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@Milofax, We already supported Ollama and have all the capabilities listed in this PR except for the Ollama API selection capability but don't see a need for this as we work with the native Ollama method and I've never heard of someone using Ollama via the OpenAI specification when the native one is available. |
|
Hey @tazmon95, thanks for the feedback! Let me provide some context on why this PR exists. To be honest, I wouldn't have touched the Ollama implementation at all if it had worked for my use case. The changes
If the toggle adds unnecessary complexity and nobody uses the OpenAI-compatible mode for Ollama, I'm totally fine with Happy to discuss or simplify where it makes sense! 🙂 |
|
Thanks @Milofax ,
|
…issues Phase 3 Summary Feature: - Beautiful View with react-markdown + MermaidRenderer - Edit View with textarea + live preview - Backend summary update support - Regenerate warning dialog - E2E tests (14/14 passed) CodeRabbit Fixes (PR coleam00#875): - Fix authToken persistence in credentialsService - Remove unnecessary 'as any' casts in RAGSettings - Add authToken to OllamaInstance load/save 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…stence - Add authToken persistence to setOllamaInstances in credentialsService.ts - Remove 'as any' type casts for OLLAMA_*_AUTH_TOKEN in RAGSettings.tsx - Types are already defined in RAGSettings interface (lines 171-172) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
python/src/server/services/storage/base_storage_service.py (1)
78-80: Move helper function outside the loop to avoid repeated redefinition.
is_inside_code_blockis redefined on every iteration of thewhileloop (line 64), which is inefficient. Since it doesn't capture any loop-local state, extract it to module level or as a static/class method.+def _is_inside_code_block(text_segment: str, pos: int) -> bool: + """Check if position is inside a code block.""" + return text_segment[:pos].count("```") % 2 == 1 + + class BaseStorageService(ABC):Then remove the nested definition and reference
_is_inside_code_blockdirectly insidesmart_chunk_text.python/src/server/utils/document_processing.py (3)
253-254: Consider moving imports to module level.
tempfileandosare standard library modules with negligible import overhead. Moving them to the top of the file improves clarity and avoids repeated import resolution on each function call.import io +import os +import tempfile # Removed direct logging import - using unified configThen remove the inline imports at lines 253-254.
264-268: Extract minimum text thresholds to named constants.The magic numbers 100 (lines 264, 289) and 50 (line 325) control fallback behavior but their purpose isn't immediately clear. Named constants would improve maintainability and make tuning easier.
# At module level MIN_TEXT_CHARS_PRIMARY = 100 # Minimum chars for pymupdf4llm/pdfplumber success MIN_TEXT_CHARS_OCR = 50 # Lower threshold for OCR (typically yields less text)
265-273: Inconsistent logging betweenloggerandlogfire.Success messages use
logger.info()(line 265) while failure/warning messages uselogfire.warning()(lines 268, 273). This pattern continues throughout the function. Consider using one consistently for warnings/fallbacks to simplify log aggregation—typicallylogger.warning()for operational messages andlogfirefor structured observability events.python/src/server/utils/ocr_processing.py (3)
109-111: Includeexc_info=Truefor exception logging.Per coding guidelines, exception handlers should preserve full stack traces. This aids debugging when OCR fails on specific pages.
except Exception as e: - logger.warning(f"OCR failed on page {page_num}: {e}") + logger.warning(f"OCR failed on page {page_num}: {e}", exc_info=True) continueApply the same pattern at lines 122-124 and 141-143.
122-124: Add stack trace to the error log.The exception is logged but without the full traceback. For external API calls (Tesseract subprocess), detailed traces help diagnose issues like missing binaries or corrupted PDFs.
except Exception as e: - logfire.error(f"OCR extraction failed: {e}") + logfire.error(f"OCR extraction failed: {e}", exc_info=True) return None
93-111: Consider validating language availability upfront.If an invalid
languageis passed, Tesseract will fail on every page. While the per-page exception handler prevents crashes, it's inefficient. An upfront check could provide a clearer error:if language not in get_supported_languages(): logger.warning(f"Language '{language}' may not be installed, falling back to 'eng'") language = "eng"This is optional since the current behavior is still correct.
python/src/server/services/search/hybrid_search_strategy.py (1)
143-163: Copy filter dict before mutation and align error logging in code example hybrid searchThe multi‑dimensional code example hybrid search wiring looks consistent with the documents path. Two small tweaks will improve safety and observability:
- Avoid mutating
filter_metadataSame pattern as above:
filter_json = filter_metadata or {}followed byfilter_json.pop("source")mutates the caller’s dict. Copying is cheap and avoids surprises:- filter_json = filter_metadata or {} + filter_json = dict(filter_metadata or {}) @@ - if not final_source_filter and "source" in filter_json: - final_source_filter = filter_json.pop("source") + if not final_source_filter and "source" in filter_json: + final_source_filter = filter_json.pop("source")(Here only the first line changes; the
popstill acts on a local copy.)
- Upgrade error logging to capture stack traces and context
In the
exceptat the end of this method, mirroring the pattern suggested forvector_search(usingexc_info=Trueand includingmatch_count/source_idorfinal_source_filterin the log message) would bring this in line with the stated logging guidelines and make Supabase/DB issues much easier to troubleshoot.Also applies to: 201-204
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
python/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
python/pyproject.toml(1 hunks)python/src/server/services/search/base_search_strategy.py(2 hunks)python/src/server/services/search/hybrid_search_strategy.py(2 hunks)python/src/server/services/storage/base_storage_service.py(1 hunks)python/src/server/utils/document_processing.py(5 hunks)python/src/server/utils/ocr_processing.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
python/pyproject.toml
📄 CodeRabbit inference engine (AGENTS.md)
Use Python 3.12 for backend code
Files:
python/pyproject.toml
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data
Files:
python/src/server/utils/document_processing.pypython/src/server/utils/ocr_processing.pypython/src/server/services/search/hybrid_search_strategy.pypython/src/server/services/storage/base_storage_service.pypython/src/server/services/search/base_search_strategy.py
python/src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration
python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety
Files:
python/src/server/utils/document_processing.pypython/src/server/utils/ocr_processing.pypython/src/server/services/search/hybrid_search_strategy.pypython/src/server/services/storage/base_storage_service.pypython/src/server/services/search/base_search_strategy.py
python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations
Files:
python/src/server/utils/document_processing.pypython/src/server/utils/ocr_processing.pypython/src/server/services/search/hybrid_search_strategy.pypython/src/server/services/storage/base_storage_service.pypython/src/server/services/search/base_search_strategy.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations
Files:
python/src/server/services/search/hybrid_search_strategy.pypython/src/server/services/storage/base_storage_service.pypython/src/server/services/search/base_search_strategy.py
🧬 Code graph analysis (1)
python/src/server/utils/document_processing.py (1)
python/src/server/utils/ocr_processing.py (2)
is_ocr_available(36-38)extract_text_with_ocr(41-124)
🔇 Additional comments (10)
python/src/server/services/storage/base_storage_service.py (3)
82-89: Heading boundary logic looks correct.Good addition of heading-based break points. The 30% threshold prevents excessively small chunks, and the code block awareness is properly implemented.
91-95: Paragraph break logic correctly enhanced with code-block awareness.The cascading check (
end == original_end) ensures proper priority ordering, and the code block guard prevents breaking mid-code.
97-101: Sentence break fallback correctly updated with code-block awareness.The priority cascading and code block guard are properly implemented. Including the period (
+ 1) in the current chunk is correct behavior.python/src/server/utils/document_processing.py (3)
13-18: LGTM on the availability check pattern.The try/except pattern for optional dependency detection is consistent with the existing patterns for PyPDF2, pdfplumber, and docx. This gracefully handles environments where pymupdf4llm isn't installed.
321-338: Good error handling with actionable messages.The OCR fallback path and error messages provide clear guidance for users, including installation instructions for missing dependencies. This aligns with the guideline to fail fast with descriptive errors when extraction completely fails.
47-83: LGTM on the code block preservation logic.The regex-based approach correctly handles code blocks split by page markers. Good use of
reversed()to maintain string positions during in-place replacements. The docstring clearly explains the problem being solved.python/src/server/utils/ocr_processing.py (2)
127-143: LGTM on language discovery logic.Good defensive implementation with a sensible fallback to
["eng"]when Tesseract's language list can't be retrieved. Filtering outosdis correct as it's a detection mode, not a language.
36-38: LGTM.Simple and effective availability check.
python/src/server/services/search/base_search_strategy.py (1)
48-76: Add guard for empty embeddings and enrich error logging contextThe multi-dimensional wiring (computing
embedding_dimension, usingf"{table_rpc}_multi", and logging the dimension) looks good, but two robustness gaps need attention per backend guidelines:
- Guard against empty/invalid
query_embeddingbefore deriving dimensionIf
query_embeddingisNoneor empty,len(query_embedding)will fail downstream. Add an early validation check that logs and returns cleanly:with safe_span("base_vector_search", table=table_rpc, match_count=match_count) as span: try: + if not query_embedding: + logger.error("Vector search received empty query_embedding") + span.set_attribute("error", "empty_query_embedding") + return [] + # Detect embedding dimension from the query vector embedding_dimension = len(query_embedding)
- Include stack trace and structured context in error logs
Update the exception handler to include
exc_info=Trueand contextual fields (table_rpc,match_count) as required by backend guidelines for production debuggability:except Exception as e: - logger.error(f"Vector search failed: {e}") + logger.error( + "Vector search failed for table_rpc=%s with match_count=%s", + table_rpc, + match_count, + exc_info=True, + )python/src/server/services/search/hybrid_search_strategy.py (1)
54-67: Avoid mutating caller filters and guard/document embedding assumptionsThe switch to
hybrid_search_archon_crawled_pages_multiwith an explicitembedding_dimensionis solid. A couple of small improvements would make this more robust and side‑effect free:
- Do not mutate
filter_metadataviapop
filter_json = filter_metadata or {}creates an alias; the subsequentpop("source", ...)mutates the original dict passed by the caller. That can be surprising if the same filter dict is reused elsewhere.You can cheaply avoid this by copying:
- filter_json = filter_metadata or {} + filter_json = dict(filter_metadata or {}) source_filter = filter_json.pop("source", None) if "source" in filter_json else None
- Explicitly handle empty/None
query_embeddingbefore computing lengthEven if callers are expected to always supply a valid embedding, a lightweight guard will prevent obscure failures when that assumption is violated:
- # Detect embedding dimension from the query vector - embedding_dimension = len(query_embedding) + if not query_embedding: + logger.error("Hybrid document search received empty query_embedding") + span.set_attribute("error", "empty_query_embedding") + return [] + + # Detect embedding dimension from the query vector + embedding_dimension = len(query_embedding)
- Align error logging with backend guidelines
As in
BaseSearchStrategy.vector_search, consider switching theexceptblock here to useexc_info=Trueand include relevant context (e.g.,match_count, whether asource_filterwas applied), so operational debugging is easier.Also applies to: 108-111
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
archon-ui-main/src/services/credentialsService.ts (1)
62-75: Persisting Ollama auth tokens as plaintext credentials
authTokenis now part ofOllamaInstanceand is persisted viasetOllamaInstanceswithis_encrypted: false, meaning these per‑instance tokens will be stored in clear text like non‑secret config fields. For remote/authenticated Ollama hosts these tokens are effectively API credentials and should get the same at‑rest protections as other API keys.Consider treating
authTokenas a secret when creating credentials, e.g. marking it encrypted (and aligning backend handling) while leaving non‑secret fields unchanged:- instances.forEach(instance => { + instances.forEach(instance => { const fields: Record<string, any> = { name: instance.name, baseUrl: instance.baseUrl, isEnabled: instance.isEnabled, isPrimary: instance.isPrimary, instanceType: instance.instanceType || 'both', loadBalancingWeight: instance.loadBalancingWeight || 100 }; @@ - if (instance.authToken) { - fields.authToken = instance.authToken; - } + if (instance.authToken) { + fields.authToken = instance.authToken; + } @@ - Object.entries(fields).forEach(([field, value]) => { + Object.entries(fields).forEach(([field, value]) => { + const isSecret = field === 'authToken'; promises.push( this.createCredential({ key: `ollama_instance_${instance.id}_${field}`, value: value.toString(), - is_encrypted: false, + is_encrypted: isSecret, category: 'ollama_instances' }) ); });You may need a matching backend adjustment so
ollama_instancescredentials withis_encrypted: trueare stored/decrypted consistently withapi_keys.Also applies to: 438-485, 496-551
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
18-20: Remove OpenRouter from embedding-capable providers
EMBEDDING_CAPABLE_PROVIDERSincorrectly includes'openrouter'. OpenRouter only proxies chat/completions endpoints and does not provide embedding models. Including it here misrepresents its capabilities to users and can cause confusion, as embeddings would actually come from the underlying routed provider (OpenAI, etc.), not OpenRouter itself. This was correctly excluded in the prior version.-const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama']; +const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];
♻️ Duplicate comments (3)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)
142-175: Add missing Ollama auth/API-mode fields to RAGSettingsProps.ragSettingsThe component reads and writes
ragSettings.OLLAMA_CHAT_AUTH_TOKEN,ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN, andragSettings.OLLAMA_API_MODEin multiple places (effects, modals, API-mode toggle), but these fields are not declared on theragSettingstype. Under strict TypeScript this will be a type error and prevents catching typos on these keys.Extend the inline type to include the missing fields:
interface RAGSettingsProps { ragSettings: { MODEL_CHOICE: string; CHAT_MODEL?: string; @@ EMBEDDING_PROVIDER?: string; OLLAMA_EMBEDDING_URL?: string; OLLAMA_EMBEDDING_INSTANCE_NAME?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible'; // Crawling Performance SettingsAfter this, you can continue to access those properties directly without any
as anyworkarounds.As per coding guidelines.
571-584: Model loading behavior and default Ollama URL handlingThe new
loadAvailableModelshelper andollamaMetricsextensions are wired cleanly, and callingloadAvailableModelsafter a successfulmanualTestConnectionis a nice touch to keep model lists in sync.One lingering issue is the hardcoded default URL check in the auto‑load effects:
useEffect(() => { if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { loadAvailableModels(llmInstanceConfig.url, 'chat'); } }, [llmInstanceConfig.url]); useEffect(() => { if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); } }, [embeddingInstanceConfig.url]);This still skips auto‑loading models for users who keep the default local Ollama URL, forcing them to rely on the page‑load tests or manual “Test Connection” to populate model lists.
You can remove the special‑case while still guarding against empty values:
- useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(llmInstanceConfig.url, 'chat'); - } - }, [llmInstanceConfig.url]); + useEffect(() => { + if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) { + loadAvailableModels(llmInstanceConfig.url, 'chat'); + } + }, [llmInstanceConfig.url]); @@ - useEffect(() => { - if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { - loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); - } - }, [embeddingInstanceConfig.url]); + useEffect(() => { + if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) { + loadAvailableModels(embeddingInstanceConfig.url, 'embedding'); + } + }, [embeddingInstanceConfig.url]);This matches prior guidance and makes the behavior consistent for default vs custom URLs.
Also applies to: 641-692, 752-755, 891-909
2350-2366: Validate authentication tokens before saving Ollama instance settingsThe new auth sections in the LLM and Embedding modals let users enable
useAuthand enter anauthToken, and the save handlers persist tokens intoragSettings:LLM: OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' Embedding: OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''However, if
useAuthis enabled while the token field is empty or whitespace, the code still saves an empty token. That will cause authenticated calls to fail without immediate feedback.Add a simple guard before constructing
updatedSettingsin both handlers:// LLM Save Changes <Button onClick={async () => { + if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } const updatedSettings = { ...ragSettings, LLM_BASE_URL: llmInstanceConfig.url, OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' }; @@ // Embedding Save Changes <Button onClick={async () => { + if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) { + showToast('Authentication token is required when authentication is enabled', 'error'); + return; + } const updatedSettings = { ...ragSettings, OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url, OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : '' };This keeps the UX tight and avoids silently persisting unusable auth configurations.
Also applies to: 2368-2391, 2403-2411, 2443-2456, 2458-2481, 2493-2501
🧹 Nitpick comments (2)
archon-ui-main/src/services/credentialsService.ts (1)
13-45: RagSettings additions for CHAT_MODEL and OLLAMA_API_MODE look consistentDefaults plus string-field mapping for
CHAT_MODELandOLLAMA_API_MODEare wired correctly intogetRagSettings, so these values will survive round‑trips through credentials. Just ensure all consumers (e.g.,RAGSettings.tsx’s props type) declare the same fields to keep TypeScript happy; I’ve flagged that separately in the UI file.Also applies to: 191-281
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
225-272: Align auth token state, persistence, and summary displayThe new refs/effects correctly sync
OLLAMA_*_AUTH_TOKENfromragSettingsintollmInstanceConfig/embeddingInstanceConfig, but there’s a subtle mismatch with how the “Authentication” summary row is rendered:
- Effects set
useAuth: !!newAuthTokenbut keepauthToken: newAuthToken || prev.authToken.- The summary row at lines 1852–1879 uses only
authTokentruthiness to decide between “Token configured” and “No authentication”.That means if a user disables auth (useAuth = false) but the previous
authTokenstring remains in state (or whenragSettingsclears the token but local state still hasprev.authToken), the summary can still show “Token configured” even though auth is off and no token will be sent.You can make the UI match actual behavior by keying off
useAuth && authToken:- {activeSelection === 'chat' ? ( - llmInstanceConfig.authToken ? ( + {activeSelection === 'chat' ? ( + llmInstanceConfig.useAuth && llmInstanceConfig.authToken ? ( <span className="text-teal-400 flex items-center"> @@ - ) : ( - <span className="text-gray-500 italic">No authentication</span> - ) - ) : ( - embeddingInstanceConfig.authToken ? ( + ) : ( + <span className="text-gray-500 italic">No authentication</span> + ) + ) : ( + embeddingInstanceConfig.useAuth && embeddingInstanceConfig.authToken ? ( <span className="text-teal-400 flex items-center"> @@ - ) : ( - <span className="text-gray-500 italic">No authentication</span> - ) - )} + ) : ( + <span className="text-gray-500 italic">No authentication</span> + ) + )}Optionally, you could also drop the
|| prev.authTokenfallback in the effects if you want state to always mirror what’s actually persisted inragSettings.Also applies to: 1852-1879
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
archon-ui-main/src/components/settings/RAGSettings.tsx(20 hunks)archon-ui-main/src/services/credentialsService.ts(6 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (3)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
1600-1659: Ollama API mode toggle wiring looks correctThe new “Ollama API Mode” selector cleanly toggles
ragSettings.OLLAMA_API_MODEbetween'native'and'openai-compatible', with sensible defaulting to native when the value is missing:(!ragSettings.OLLAMA_API_MODE || ragSettings.OLLAMA_API_MODE === 'native')and
onClick={() => setRagSettings({ ...ragSettings, OLLAMA_API_MODE: 'native' })} onClick={() => setRagSettings({ ...ragSettings, OLLAMA_API_MODE: 'openai-compatible' })}Once the props type is updated to declare
OLLAMA_API_MODE, this should integrate smoothly with the backend’s mode selection logic.
- Remove secret logging: replace logger.info(rag_settings) with keys-only debug logging - Add auth state reset and token validation: add OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN to RAGSettingsProps interface; validate auth token is non-empty before persisting when useAuth is enabled for LLM and Embedding instances - Replace hardcoded URL checks with configurable values: remove hardcoded http://host.docker.internal:11434/v1 exclusion from auto-load model effects - Handle None OLLAMA_EMBEDDING_URL safely: use (value or "") pattern instead of .get(key, "") to avoid AttributeError when value is None - Make api_mode comparison case-insensitive in _get_embedding_adapter and create_embeddings_batch to handle "Native"/"NATIVE" config values - Add missing OCR dependencies (pymupdf4llm, pytesseract, pdf2image) to the all dependency group in pyproject.toml - Remove unused imports (io, tempfile, os) from ocr_processing.py - Extract long line in state_reconciliation.py to satisfy 120-char limit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
2405-2431:⚠️ Potential issue | 🟠 MajorAvoid auto-testing protected instances before the token is persisted.
These modal saves only update local React state, then immediately call backend health/model endpoints that are keyed by
instance_urlsonly. For a newly entered Ollama token, the backend still has no persisted credential yet, so authenticated remote instances can show a false failure until the page-level “Save Settings” runs.Either persist
updatedSettingsbefore the follow-up test, or skip the automatic test/model refresh here and leave it to the main save flow.Also applies to: 2499-2525
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 2405 - 2431, The save handler currently updates local state (setRagSettings) then unconditionally runs manualTestConnection and fetchOllamaMetrics, causing false failures for authenticated Ollama instances because the token isn't persisted; modify the onClick flow in the LLM edit modal so that if llmInstanceConfig.useAuth is true you either (A) persist updatedSettings to the backend (call the existing settings save API / trigger the parent-level save flow) and await success before invoking manualTestConnection and fetchOllamaMetrics, or (B) skip the automatic test/metrics refresh entirely and leave testing to the main "Save Settings" action; ensure this change targets the onClick handler where updatedSettings is built and references manualTestConnection, fetchOllamaMetrics, setRagSettings, and setShowEditLLMModal so the automatic test only runs when credentials are actually persisted.
♻️ Duplicate comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
142-176:⚠️ Potential issue | 🟠 MajorAdd
OLLAMA_API_MODEto the localragSettingstype.Lines 1612-1658 read
ragSettings.OLLAMA_API_MODE, butRAGSettingsProps["ragSettings"]still omits that field. In strict TS this is an unsafe property access and typically a compile-time error. As per coding guidelines,archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code.#!/bin/bash set -euo pipefail echo "RAGSettingsProps block:" sed -n '142,176p' archon-ui-main/src/components/settings/RAGSettings.tsx echo echo "OLLAMA_API_MODE references:" rg -n "OLLAMA_API_MODE" \ archon-ui-main/src/components/settings/RAGSettings.tsx \ archon-ui-main/src/services/credentialsService.ts🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 142 - 176, The ragSettings type inside RAGSettingsProps is missing the OLLAMA_API_MODE field referenced elsewhere (ragSettings.OLLAMA_API_MODE); update the RAGSettingsProps["ragSettings"] interface to include OLLAMA_API_MODE with the correct type (e.g., string | undefined or a stricter enum/union if applicable) so TypeScript strict mode no longer flags the property access in functions/components that read ragSettings.OLLAMA_API_MODE (e.g., places using that symbol in this file and in credentialsService).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 800-805: The handlers that delete Ollama config keys (they clone
ragSettings into updatedSettings, delete LLM_BASE_URL / MODEL_CHOICE /
OLLAMA_CHAT_AUTH_TOKEN and call setRagSettings) must also remove the persisted
instance name keys to avoid them being repopulated; update those blocks to
delete LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME from updatedSettings
before calling setRagSettings, and apply the same change in the other handler
referenced (the similar block around the second delete at lines 826-830).
- Around line 158-159: The RAG token keys OLLAMA_CHAT_AUTH_TOKEN and
OLLAMA_EMBEDDING_AUTH_TOKEN are missing from the
credentialsService.getRagSettings() shape/defaults and hydration logic; update
credentialsService.getRagSettings() to add these two keys to the RagSettings
type/defaults and include them in the string-field hydration path so values
persisted by the UI are restored on reload. Specifically, extend the RagSettings
object/return to include OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN
with appropriate default (empty string/null) and add those exact keys to any
code that iterates/string-hydrates token fields so they aren’t dropped. Ensure
references to RagSettings, getRagSettings, and the string-field hydration code
are updated together.
- Around line 238-245: The code builds newConfig using `newAuthToken ||
prev.authToken` which preserves a prior secret when the user clears the saved
token; change the logic in the `newConfig` construction so `authToken` is set to
`newAuthToken` if it is a non-empty value, otherwise set it to an empty string
(or null/undefined) when the user cleared the field and/or `useAuth` is false,
and ensure `useAuth` is properly computed from `newAuthToken`/the auth toggle;
update the same pattern elsewhere in this component where `newAuthToken ||
prev.authToken` is used (e.g., the second occurrence) so clearing the token
truly clears the stored secret and the summary no longer shows “Token
configured.”
In `@python/src/server/services/embeddings/embedding_service.py`:
- Line 459: The URL normalization for ollama_base_url currently uses
.replace("/v1", "") which can remove occurrences anywhere in the string; change
the logic that builds ollama_base_url (the expression using
rag_settings.get("OLLAMA_EMBEDDING_URL")) to use .removesuffix("/v1") followed
by .rstrip("/") so only a trailing "/v1" is removed and the URL is still trimmed
of trailing slashes.
- Around line 120-128: The current use of asyncio.gather with
self._fetch_single_embedding and httpx.AsyncClient launches all requests for
texts concurrently and can overwhelm the Ollama server or OS limits; change to
bounded concurrency by creating an asyncio.Semaphore (or
asyncio.BoundedSemaphore) with a reasonable max (e.g., 10–20), wrap calls to
self._fetch_single_embedding in a small async worker that acquires/releases the
semaphore, and then gather those bounded tasks instead of launching one task per
text unbounded; keep using the same AsyncClient instance inside the
semaphore-controlled tasks to reuse connections and return the collected
embeddings.
In `@python/src/server/utils/ocr_processing.py`:
- Around line 119-121: The except block catching Exception as e in the OCR
extraction flow currently logs with logfire.error(f"OCR extraction failed: {e}")
and returns None; update that call to include exc_info=True so the full
traceback is preserved in logs (i.e., change the logfire.error invocation in the
OCR extraction exception handler to pass exc_info=True while keeping the
descriptive message), ensuring this occurs in the function handling OCR
extraction where logfire is used.
- Around line 106-108: The except block that currently calls
logger.warning(f"OCR failed on page {page_num}: {e}") should preserve the full
stack trace; update the logging call in the OCR error handler (the except
handling OCR failures where logger.warning is used and page_num is referenced)
to include exc_info=True (or replace with logger.exception) so the full
exception stack is recorded (e.g., logger.warning(..., exc_info=True) or
logger.exception(...)).
- Around line 138-140: The except block in ocr_processing.py currently logs the
error with logger.warning(f"Could not get Tesseract languages: {e}") and returns
a default; update that logger call to pass exc_info=True (e.g.,
logger.warning("Could not get Tesseract languages", exc_info=True)) so the full
stack trace is preserved while still returning the fallback ["eng"]; locate the
logger.warning call in the function that retrieves Tesseract languages and
modify it accordingly.
---
Outside diff comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 2405-2431: The save handler currently updates local state
(setRagSettings) then unconditionally runs manualTestConnection and
fetchOllamaMetrics, causing false failures for authenticated Ollama instances
because the token isn't persisted; modify the onClick flow in the LLM edit modal
so that if llmInstanceConfig.useAuth is true you either (A) persist
updatedSettings to the backend (call the existing settings save API / trigger
the parent-level save flow) and await success before invoking
manualTestConnection and fetchOllamaMetrics, or (B) skip the automatic
test/metrics refresh entirely and leave testing to the main "Save Settings"
action; ensure this change targets the onClick handler where updatedSettings is
built and references manualTestConnection, fetchOllamaMetrics, setRagSettings,
and setShowEditLLMModal so the automatic test only runs when credentials are
actually persisted.
---
Duplicate comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 142-176: The ragSettings type inside RAGSettingsProps is missing
the OLLAMA_API_MODE field referenced elsewhere (ragSettings.OLLAMA_API_MODE);
update the RAGSettingsProps["ragSettings"] interface to include OLLAMA_API_MODE
with the correct type (e.g., string | undefined or a stricter enum/union if
applicable) so TypeScript strict mode no longer flags the property access in
functions/components that read ragSettings.OLLAMA_API_MODE (e.g., places using
that symbol in this file and in credentialsService).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0bb9ebf8-4d93-479c-ad40-fae14b0ebb4d
📒 Files selected for processing (5)
archon-ui-main/src/components/settings/RAGSettings.tsxpython/pyproject.tomlpython/src/agent_work_orders/utils/state_reconciliation.pypython/src/server/services/embeddings/embedding_service.pypython/src/server/utils/ocr_processing.py
✅ Files skipped from review due to trivial changes (1)
- python/src/agent_work_orders/utils/state_reconciliation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- python/pyproject.toml
| OLLAMA_CHAT_AUTH_TOKEN?: string; | ||
| OLLAMA_EMBEDDING_AUTH_TOKEN?: string; |
There was a problem hiding this comment.
Mirror these new token keys in credentialsService.getRagSettings().
The UI now writes OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN, but archon-ui-main/src/services/credentialsService.ts still doesn't include them in its RagSettings shape/defaults or its string-field hydration path. Those values will be silently dropped on reload.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 158 -
159, The RAG token keys OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN
are missing from the credentialsService.getRagSettings() shape/defaults and
hydration logic; update credentialsService.getRagSettings() to add these two
keys to the RagSettings type/defaults and include them in the string-field
hydration path so values persisted by the UI are restored on reload.
Specifically, extend the RagSettings object/return to include
OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN with appropriate default
(empty string/null) and add those exact keys to any code that
iterates/string-hydrates token fields so they aren’t dropped. Ensure references
to RagSettings, getRagSettings, and the string-field hydration code are updated
together.
| const newConfig = { | ||
| url: newLLMUrl || prev.url, | ||
| name: newLLMName || prev.name | ||
| name: newLLMName || prev.name, | ||
| useAuth: !!newAuthToken, | ||
| authToken: newAuthToken || prev.authToken | ||
| }; | ||
| // Only update if actually different to prevent loops | ||
| if (newConfig.url !== prev.url || newConfig.name !== prev.name) { | ||
| if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.authToken !== prev.authToken) { |
There was a problem hiding this comment.
Don't retain the previous auth token when the saved value is empty.
Using newAuthToken || prev.authToken means disabling auth never clears the local secret. The summary will still show “Token configured”, and re-enabling auth can silently reuse the stale token.
Suggested fix
const newConfig = {
url: newLLMUrl || prev.url,
name: newLLMName || prev.name,
useAuth: !!newAuthToken,
- authToken: newAuthToken || prev.authToken
+ authToken: newAuthToken
};
@@
const newConfig = {
url: newEmbeddingUrl || prev.url,
name: newEmbeddingName || prev.name,
useAuth: !!newAuthToken,
- authToken: newAuthToken || prev.authToken
+ authToken: newAuthToken
};Also applies to: 261-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 238 -
245, The code builds newConfig using `newAuthToken || prev.authToken` which
preserves a prior secret when the user clears the saved token; change the logic
in the `newConfig` construction so `authToken` is set to `newAuthToken` if it is
a non-empty value, otherwise set it to an empty string (or null/undefined) when
the user cleared the field and/or `useAuth` is false, and ensure `useAuth` is
properly computed from `newAuthToken`/the auth toggle; update the same pattern
elsewhere in this component where `newAuthToken || prev.authToken` is used
(e.g., the second occurrence) so clearing the token truly clears the stored
secret and the summary no longer shows “Token configured.”
| // Clear related RAG settings | ||
| const updatedSettings = { ...ragSettings }; | ||
| delete updatedSettings.LLM_BASE_URL; | ||
| delete updatedSettings.MODEL_CHOICE; | ||
| delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN; | ||
| setRagSettings(updatedSettings); |
There was a problem hiding this comment.
Clear the persisted instance names when deleting an Ollama config.
These handlers remove the URL/model/token, but they leave LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME behind. The sync effects at Lines 230-274 can then repopulate the “deleted” instance with its old name on the next render.
Suggested fix
const updatedSettings = { ...ragSettings };
delete updatedSettings.LLM_BASE_URL;
+ delete updatedSettings.LLM_INSTANCE_NAME;
delete updatedSettings.MODEL_CHOICE;
delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN;
@@
const updatedSettings = { ...ragSettings };
delete updatedSettings.OLLAMA_EMBEDDING_URL;
+ delete updatedSettings.OLLAMA_EMBEDDING_INSTANCE_NAME;
delete updatedSettings.EMBEDDING_MODEL;
delete updatedSettings.OLLAMA_EMBEDDING_AUTH_TOKEN;Also applies to: 826-830
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 800 -
805, The handlers that delete Ollama config keys (they clone ragSettings into
updatedSettings, delete LLM_BASE_URL / MODEL_CHOICE / OLLAMA_CHAT_AUTH_TOKEN and
call setRagSettings) must also remove the persisted instance name keys to avoid
them being repopulated; update those blocks to delete LLM_INSTANCE_NAME and
OLLAMA_EMBEDDING_INSTANCE_NAME from updatedSettings before calling
setRagSettings, and apply the same change in the other handler referenced (the
similar block around the second delete at lines 826-830).
| try: | ||
| async with httpx.AsyncClient(timeout=60.0) as http_client: | ||
| embeddings = await asyncio.gather( | ||
| *( | ||
| self._fetch_single_embedding(http_client, model, text) | ||
| for text in texts | ||
| ) | ||
| ) | ||
| return embeddings |
There was a problem hiding this comment.
Unbounded parallelism may overwhelm the Ollama server.
asyncio.gather launches all requests simultaneously without concurrency limits. For large batches (e.g., 100+ texts), this creates many simultaneous HTTP connections that could exhaust Ollama server resources or hit OS connection limits.
Consider limiting concurrency with a semaphore:
🛠️ Proposed fix
+ _MAX_CONCURRENT_REQUESTS = 10 # class-level constant
async def create_embeddings(
self,
texts: list[str],
model: str,
dimensions: int | None = None,
) -> list[list[float]]:
try:
+ semaphore = asyncio.Semaphore(self._MAX_CONCURRENT_REQUESTS)
+
+ async def limited_fetch(text: str) -> list[float]:
+ async with semaphore:
+ return await self._fetch_single_embedding(http_client, model, text)
+
async with httpx.AsyncClient(timeout=60.0) as http_client:
embeddings = await asyncio.gather(
- *(
- self._fetch_single_embedding(http_client, model, text)
- for text in texts
- )
+ *(limited_fetch(text) for text in texts)
)
return embeddings🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/services/embeddings/embedding_service.py` around lines 120
- 128, The current use of asyncio.gather with self._fetch_single_embedding and
httpx.AsyncClient launches all requests for texts concurrently and can overwhelm
the Ollama server or OS limits; change to bounded concurrency by creating an
asyncio.Semaphore (or asyncio.BoundedSemaphore) with a reasonable max (e.g.,
10–20), wrap calls to self._fetch_single_embedding in a small async worker that
acquires/releases the semaphore, and then gather those bounded tasks instead of
launching one task per text unbounded; keep using the same AsyncClient instance
inside the semaphore-controlled tasks to reuse connections and return the
collected embeddings.
| embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536")) | ||
|
|
||
| # For Ollama, get native API URL, auth token, and API mode | ||
| ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/") |
There was a problem hiding this comment.
Use removesuffix instead of replace for URL normalization.
.replace("/v1", "") removes all occurrences of /v1 anywhere in the URL, not just the trailing suffix. While unlikely in practice, URLs like http://v1-proxy.example.com/ollama/v1 would be incorrectly modified.
🔧 Proposed fix
- ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")
+ ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").rstrip("/").removesuffix("/v1")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/services/embeddings/embedding_service.py` at line 459, The
URL normalization for ollama_base_url currently uses .replace("/v1", "") which
can remove occurrences anywhere in the string; change the logic that builds
ollama_base_url (the expression using rag_settings.get("OLLAMA_EMBEDDING_URL"))
to use .removesuffix("/v1") followed by .rstrip("/") so only a trailing "/v1" is
removed and the URL is still trimmed of trailing slashes.
| except Exception as e: | ||
| logger.warning(f"OCR failed on page {page_num}: {e}") | ||
| continue |
There was a problem hiding this comment.
Add exc_info=True to preserve stack traces in error logging.
Per coding guidelines, error logging should include full stack traces for debugging. The current warning loses valuable diagnostic information.
🔧 Proposed fix
except Exception as e:
- logger.warning(f"OCR failed on page {page_num}: {e}")
+ logger.warning(f"OCR failed on page {page_num}: {e}", exc_info=True)
continueAs per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/utils/ocr_processing.py` around lines 106 - 108, The except
block that currently calls logger.warning(f"OCR failed on page {page_num}: {e}")
should preserve the full stack trace; update the logging call in the OCR error
handler (the except handling OCR failures where logger.warning is used and
page_num is referenced) to include exc_info=True (or replace with
logger.exception) so the full exception stack is recorded (e.g.,
logger.warning(..., exc_info=True) or logger.exception(...)).
| except Exception as e: | ||
| logfire.error(f"OCR extraction failed: {e}") | ||
| return None |
There was a problem hiding this comment.
Add exc_info=True to the error log for complete stack traces.
The outer exception handler should preserve the full stack trace for debugging OCR failures. This is especially important since the function returns None on failure, making post-mortem debugging reliant on logs.
🔧 Proposed fix
except Exception as e:
- logfire.error(f"OCR extraction failed: {e}")
+ logfire.error(f"OCR extraction failed: {e}", exc_info=True)
return NoneAs per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| logfire.error(f"OCR extraction failed: {e}") | |
| return None | |
| except Exception as e: | |
| logfire.error(f"OCR extraction failed: {e}", exc_info=True) | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/utils/ocr_processing.py` around lines 119 - 121, The except
block catching Exception as e in the OCR extraction flow currently logs with
logfire.error(f"OCR extraction failed: {e}") and returns None; update that call
to include exc_info=True so the full traceback is preserved in logs (i.e.,
change the logfire.error invocation in the OCR extraction exception handler to
pass exc_info=True while keeping the descriptive message), ensuring this occurs
in the function handling OCR extraction where logfire is used.
| except Exception as e: | ||
| logger.warning(f"Could not get Tesseract languages: {e}") | ||
| return ["eng"] # Default fallback |
There was a problem hiding this comment.
Add exc_info=True for consistency with error logging guidelines.
🔧 Proposed fix
except Exception as e:
- logger.warning(f"Could not get Tesseract languages: {e}")
+ logger.warning(f"Could not get Tesseract languages: {e}", exc_info=True)
return ["eng"] # Default fallbackAs per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/utils/ocr_processing.py` around lines 138 - 140, The except
block in ocr_processing.py currently logs the error with logger.warning(f"Could
not get Tesseract languages: {e}") and returns a default; update that logger
call to pass exc_info=True (e.g., logger.warning("Could not get Tesseract
languages", exc_info=True)) so the full stack trace is preserved while still
returning the fallback ["eng"]; locate the logger.warning call in the function
that retrieves Tesseract languages and modify it accordingly.
|
🔄 This repository is being replaced by a new version of Archon. The original Python/MCP codebase is being archived to the This PR is being closed as part of the migration. Thank you for your contribution! |
* fix: auto-select first DAG node on workflow run load (#781) The workflow output panel showed "No output available" on initial load because no DAG node was selected by default. Users had to manually click a node in the Logs tab to see output. Changes: - Add useEffect that auto-selects the first DAG node when workflow data loads and no node is selected - Prefer the currently running node for live workflows Fixes #781 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting and test fixes from validation - Fix formatting in workflows.test.ts and api.ts (prettier) - Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook }) in api.conversations.test.ts to match route registration requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: extract selectInitialNode to testable pure function Extract DAG node auto-selection logic from useEffect into a pure function with 5 unit tests covering all branches (undefined, empty, no running, prefer running, multiple running). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (#871) * feat: add interrupted to WorkflowRunStatus schema Implements US-001 from PRD. Changes: - Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts - Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts - Add interrupted: z.number() to dashboardRunsResponseSchema counts object - Add 'interrupted' to dashboardValidStatuses in API handler - Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts * feat: update IWorkflowStore interface & DB query implementations Implements US-002 from PRD. Changes: - IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath) - IWorkflowStore: drop conversationId from findResumableRun signature - IWorkflowStore: add interruptOrphanedRuns() method - db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted') - db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status - db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running' - store-adapter: wire all three new/modified methods - executor: update call sites to use renamed methods (type-check requirement) - tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns * feat: replace staleness guard with path-based lifecycle Implements US-003 from PRD. Changes: - executor.ts: remove STALE_MINUTES staleness auto-kill; replace with status-based guard — 'running' blocks, 'interrupted' offers resume/abandon - server/src/index.ts: replace failStaleWorkflowRuns() with createWorkflowStore().interruptOrphanedRuns() on startup - executor-preamble.test.ts: replace staleness detection tests with concurrent run guard tests covering 'running' and 'interrupted' cases * feat: command handler — /workflow status, resume, and abandon Implements US-004. Replaces time-based stale heuristics with explicit lifecycle commands for workflow management. Changes: - Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants - Replace /workflow status with global view: lists all running+interrupted runs across all worktrees (ID, name, working path, status, started-at) - Add /workflow resume <id>: validates state then calls resumeWorkflowRun - Add /workflow abandon <id>: validates state then calls failWorkflowRun - Add statuses[] filter to listWorkflowRuns for IN (...) queries - Update /workflow help text and default case usage string - Update /status command to remove stale warning that referenced removed constants - Replace old /workflow status tests with new behavior coverage - Add /workflow resume and /workflow abandon test coverage * feat: CLI workflow status, resume, and abandon subcommands Implements US-005 from PRD. Changes: - Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag - Add workflowResumeCommand: validates run state then calls resumeWorkflowRun - Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user') - Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd) - Wire resume/abandon subcommands in cli.ts - Update tests: replace "not implemented" test with status/resume/abandon coverage * feat: Web UI interrupted status badge and dashboard support Implements US-006 from PRD. Changes: - api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts - api.ts: add interrupted field to DashboardCounts interface - WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge - WorkflowRunCard.tsx: add amber dot and badge for interrupted status - StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list - DashboardPage.tsx: include interrupted in activeRuns filter and counts default * refactor: remove dead timer-based workflow staleness code Implements US-007 from PRD. Changes: - Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005) - Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002) - Remove IDatabase import from db/workflows.ts (no longer needed) - Remove failStaleWorkflowRuns tests from db/workflows.test.ts grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and grep -r 'failStaleWorkflowRuns' all return zero matches. * fix: address review feedback — truncated IDs, resume semantics, type safety - Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8)) - Add completed_at to interruptOrphanedRuns for correct duration metrics - Fix resume commands: mark interrupted→failed to unblock path guard, let next workflow invocation auto-resume via findResumableRun - Collapse dual status/statuses fields into status?: T | T[] - Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants - Add explicit else-if for interrupted status in executor path guard - Add structured logging to CLI workflow commands - Restore conversationId to cmd.workflow_status_failed log - Add tests: listWorkflowRuns statuses filter, interrupted auto-resume, DB error handling for resume/abandon - Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md * refactor: simplify CLI commands and status filter logic - Extract getRunOrThrow helper for shared run lookup pattern - Use WorkflowRun[] instead of Awaited<ReturnType<...>> - Remove single-item special case in listWorkflowRuns (IN works for all) - Use ?? instead of || for null-coalescing consistency - Remove unused ACTIVE_WORKFLOW_STATUSES constant - Add inline comment on completed_at for interrupted runs * fix: handle SQLite string dates in formatAge SQLite returns started_at as a string, not a Date object. formatAge now accepts Date | string and converts accordingly. Found during E2E testing against real SQLite database. * feat: UX improvements — real resume, dashboard actions, cleanup command - CLI resume now actually re-executes the workflow (calls workflowRunCommand with --resume internally instead of just flipping DB status) - Remove truncated IDs from executor guard messages (full ID in commands only) - Add Resume/Abandon/Delete buttons to dashboard workflow run cards - Add Delete button to history table rows - Add API endpoints: POST resume, POST abandon, DELETE workflow run - Add CLI workflow cleanup command (deletes terminal runs older than N days) - Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions * refactor: simplify API handlers, dashboard actions, and log conventions - Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers (was inline string checks diverging from CLI/command-handler) - Extract makeRunAction helper in DashboardPage (4 identical handlers → 1) - Fix log event names to use domain prefix convention (api.workflow_run_*) - Use Ban icon for Abandon to distinguish from Cancel's XCircle - Use instanceof Date guard in formatAge for clarity - Add comment on delete handler's active-status guard * refactor: simplify workflow lifecycle — remove interrupted, single resume path Rework the primitives for a clean foundation: Status model: 5 statuses (pending, running, completed, failed, cancelled). Remove 'interrupted' entirely — server restart now marks orphaned runs as 'failed' directly (with metadata.failure_reason = 'server_restart'). Resume model: one path. The executor's implicit findResumableRun detects prior failed runs and skips completed nodes. The CLI --resume flag reuses the prior run's worktree but lets the executor handle node-skipping (no more preCreatedRun bypass). Chat /workflow resume tells the user to re-invoke (auto-resume kicks in). Path guard: only blocks 'running' status (was running + interrupted). Guards always run regardless of preCreatedRun. Cancellation: generalized from status === 'cancelled' to status !== 'running' at all 3 check points (streaming, loop iterations, DAG layers). Ready for future 'paused' status with zero changes. - interruptOrphanedRuns → failOrphanedRuns - Remove preCreatedRun bypass from CLI --resume path - Generalize 3 cancellation check points in dag-executor - Update all API endpoints, command handlers, UI components - Update all tests and documentation * fix: cancel/complete race, abandon semantics, UTC dates - Fix cancel/complete race condition: dag-executor now checks DB status before calling completeWorkflowRun or failWorkflowRun, preventing a cancel during the final layer from being overwritten to completed - Abandon uses cancelWorkflowRun instead of failWorkflowRun, so abandoned runs don't get auto-resumed by findResumableRun - Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC * fix: address PR review — SQL safety, transactions, error handling, docs, tests - Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns - Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun) - Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors) - Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort) - Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files - Fix default mock getWorkflowRunStatus to return 'running' instead of null - Add NaN guard to formatAge (returns 'unknown' on unparseable dates) - Fix stale 'interrupted' references in route summary and delete comment - Include working path in /workflow resume response - Align deleteOldWorkflowRuns return type to { count } for consistency - Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference - Document new API endpoints (resume, abandon, delete) in CLAUDE.md - Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand - Fix workflowAbandonCommand test to assert cancelWorkflowRun call * refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards - Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor - Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601) - Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard - Rename cancelError → actionError in DashboardPage (covers 4 actions now) - Fix merge conflict: add IDatabase import, getRunningWorkflows from dev - Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono * fix: address review findings — double-rollback, missing guards, log context, tests - Fix double-rollback in deleteWorkflowRun by removing inner rollback() call and letting the outer catch handle it - Add terminal-status guard inside deleteWorkflowRun itself, not just in the route handler, to prevent deletion of running workflows - Add rollback failure logging to the rollback() helper - Add runId to error logs in resume/abandon/delete API route handlers - Add workingPath to getActiveWorkflowRunByPath error log - Add workflowRunId to dag-executor status-check warn logs - Wrap workflowRunCommand in try/catch in workflowResumeCommand with structured logging and null guard for user_message - Clean up stale 'interrupted' references in JSDoc - Fix missing / prefix on workflow cleanup in commands-reference.md - Add API route tests for POST /resume, POST /abandon, DELETE /:runId * refactor: apply code simplifications from review - Replace fragile startsWith string matching in deleteWorkflowRun catch with a typed WorkflowRunGuardError class - Reorder listWorkflowRuns placeholder generation: capture startIdx before pushing values for clarity - Replace curried makeRunAction factory in DashboardPage with a plain runAction helper function - Move skipIfStatusChanged helper definition before its call sites in dag-executor to match reading order * fix(workflows): idle timeout too aggressive on DAG nodes (#854) (#886) * fix(workflows): idle timeout too aggressive — break after result, reset on all messages (#854) The idle timeout (5 min default) caused two problems: (1) after a node's AI finished (result message), the loop waited for the subprocess to exit, wasting 5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls (tests, builds) triggered false timeouts on actively working nodes. Changes: - Break out of the for-await loop immediately after receiving the result message in both command/prompt and loop node paths — no more post-completion waste - Remove shouldResetTimer predicate so all message types (including tool) reset the timer — timeout only fires on complete silence - Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message resetting the timer, this is a deadlock detector, not a work limiter Fixes #854 * fix(workflows): update withIdleTimeout JSDoc to match new timer behavior Remove the tool-exclusion example from the shouldResetTimer docs since that pattern was just removed from all call sites. Clarify that most callers should omit the parameter. * fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs - Log generator cleanup errors in withIdleTimeout instead of silently swallowing - Add behavioral tests for break-after-result in both command/prompt and loop nodes - Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes) - Clarify shouldResetTimer test names and comments (utility API, not executor behavior) - Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern) - Remove redundant iterResult alias in withIdleTimeout * feat: add approval gate node type for human-in-the-loop workflows (#888) * feat: add approval gate node type for human-in-the-loop workflows Add `approval` as a fifth DAG node type that pauses workflow execution until a human approves or rejects. Built on existing resume infrastructure. - New `approval` node type in YAML workflows with `message` field - `paused` workflow status (non-terminal, resumable) - Approve/reject via REST API, CLI, chat commands, and Web UI - Approval comment available as `$node.output` in downstream nodes - Dashboard shows amber pulsing badge with approval message for paused runs - Path guard blocks new workflows on paused worktrees * fix: resolve 7 bugs in approval gate implementation Critical: - Parse metadata JSON on SQLite (returned as TEXT string, not object) - Suppress spurious "Workflow stopped (paused)" message after approval gates Medium: - CLI exits 0 on pause (not error code 1) - workflow status shows paused runs alongside running - Docs clarify auto-resume is CLI-only; API/chat mark resumable Low: - Skip completed_at when transitioning to failed for approval resume - Warn on AI fields (model, hooks, etc.) set on approval nodes - Add rowCount check to updateWorkflowRun Also: extract ApprovalContext type, add .min(1) to approval.message, fix stale "Only failed runs" error messages, fix log domain prefix, add recovery instructions to CLI approve error message. * feat(workflows): add interactive PRD workflow with approval gates (#891) Guided PRD creation through 3 approval gates: 1. Foundation questions (who/what/why/why now) 2. Deep dive questions (vision/users/JTBD/constraints) 3. Scope questions (MVP/hypothesis/exclusions) AI researches market + codebase between gates. Each gate carries the user's answers via $gate.output to downstream nodes. Final node generates the PRD from accumulated context. * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#890) * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#889) When workflow resume or approve is called, the worktree path is passed as cwd, but the codebase was registered under the main checkout path. This caused findCodebaseByDefaultCwd to return null, triggering auto-registration which tried to create a source symlink that already existed, crashing the command. Changes: - Add codebaseId option to WorkflowRunOptions - Look up codebase by stored ID before falling back to auto-registration - Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand - Add test verifying getCodebase is called with stored codebase_id on resume Fixes #889 * fix(cli): address review findings for PR #890 - Add domain prefix to 5 log events in codebase-lookup block (cli.codebase_lookup_failed, cli.db_connection_hint, cli.codebase_id_lookup_failed, cli.codebase_auto_registered, cli.codebase_auto_registration_failed) - Add inline comment clarifying intentional fallthrough in catch block - Add test for workflowApproveCommand codebase_id forwarding - Add test for error fallback path when getCodebase throws - Update pre-existing test assertion to match renamed log event * fix(workflows): improve interactive PRD — first-principles prompting + validation node (#893) Two improvements to the interactive PRD workflow: 1. First-principles prompting: All research/technical/generate nodes now instruct the AI to start from what exists, verify claims by reading actual files, prefer extending over creating, and cite exact file:line. 2. Validation node: New final node reads the generated PRD and checks every technical claim against the codebase — file paths, endpoint names, DB schemas, event types, component names. Corrects inline and documents what was fixed. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
* fix: auto-select first DAG node on workflow run load (coleam00#781) The workflow output panel showed "No output available" on initial load because no DAG node was selected by default. Users had to manually click a node in the Logs tab to see output. Changes: - Add useEffect that auto-selects the first DAG node when workflow data loads and no node is selected - Prefer the currently running node for live workflows Fixes coleam00#781 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting and test fixes from validation - Fix formatting in workflows.test.ts and api.ts (prettier) - Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook }) in api.conversations.test.ts to match route registration requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: extract selectInitialNode to testable pure function Extract DAG node auto-selection logic from useEffect into a pure function with 5 unit tests covering all branches (undefined, empty, no running, prefer running, multiple running). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871) * feat: add interrupted to WorkflowRunStatus schema Implements US-001 from PRD. Changes: - Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts - Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts - Add interrupted: z.number() to dashboardRunsResponseSchema counts object - Add 'interrupted' to dashboardValidStatuses in API handler - Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts * feat: update IWorkflowStore interface & DB query implementations Implements US-002 from PRD. Changes: - IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath) - IWorkflowStore: drop conversationId from findResumableRun signature - IWorkflowStore: add interruptOrphanedRuns() method - db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted') - db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status - db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running' - store-adapter: wire all three new/modified methods - executor: update call sites to use renamed methods (type-check requirement) - tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns * feat: replace staleness guard with path-based lifecycle Implements US-003 from PRD. Changes: - executor.ts: remove STALE_MINUTES staleness auto-kill; replace with status-based guard — 'running' blocks, 'interrupted' offers resume/abandon - server/src/index.ts: replace failStaleWorkflowRuns() with createWorkflowStore().interruptOrphanedRuns() on startup - executor-preamble.test.ts: replace staleness detection tests with concurrent run guard tests covering 'running' and 'interrupted' cases * feat: command handler — /workflow status, resume, and abandon Implements US-004. Replaces time-based stale heuristics with explicit lifecycle commands for workflow management. Changes: - Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants - Replace /workflow status with global view: lists all running+interrupted runs across all worktrees (ID, name, working path, status, started-at) - Add /workflow resume <id>: validates state then calls resumeWorkflowRun - Add /workflow abandon <id>: validates state then calls failWorkflowRun - Add statuses[] filter to listWorkflowRuns for IN (...) queries - Update /workflow help text and default case usage string - Update /status command to remove stale warning that referenced removed constants - Replace old /workflow status tests with new behavior coverage - Add /workflow resume and /workflow abandon test coverage * feat: CLI workflow status, resume, and abandon subcommands Implements US-005 from PRD. Changes: - Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag - Add workflowResumeCommand: validates run state then calls resumeWorkflowRun - Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user') - Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd) - Wire resume/abandon subcommands in cli.ts - Update tests: replace "not implemented" test with status/resume/abandon coverage * feat: Web UI interrupted status badge and dashboard support Implements US-006 from PRD. Changes: - api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts - api.ts: add interrupted field to DashboardCounts interface - WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge - WorkflowRunCard.tsx: add amber dot and badge for interrupted status - StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list - DashboardPage.tsx: include interrupted in activeRuns filter and counts default * refactor: remove dead timer-based workflow staleness code Implements US-007 from PRD. Changes: - Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005) - Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002) - Remove IDatabase import from db/workflows.ts (no longer needed) - Remove failStaleWorkflowRuns tests from db/workflows.test.ts grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and grep -r 'failStaleWorkflowRuns' all return zero matches. * fix: address review feedback — truncated IDs, resume semantics, type safety - Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8)) - Add completed_at to interruptOrphanedRuns for correct duration metrics - Fix resume commands: mark interrupted→failed to unblock path guard, let next workflow invocation auto-resume via findResumableRun - Collapse dual status/statuses fields into status?: T | T[] - Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants - Add explicit else-if for interrupted status in executor path guard - Add structured logging to CLI workflow commands - Restore conversationId to cmd.workflow_status_failed log - Add tests: listWorkflowRuns statuses filter, interrupted auto-resume, DB error handling for resume/abandon - Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md * refactor: simplify CLI commands and status filter logic - Extract getRunOrThrow helper for shared run lookup pattern - Use WorkflowRun[] instead of Awaited<ReturnType<...>> - Remove single-item special case in listWorkflowRuns (IN works for all) - Use ?? instead of || for null-coalescing consistency - Remove unused ACTIVE_WORKFLOW_STATUSES constant - Add inline comment on completed_at for interrupted runs * fix: handle SQLite string dates in formatAge SQLite returns started_at as a string, not a Date object. formatAge now accepts Date | string and converts accordingly. Found during E2E testing against real SQLite database. * feat: UX improvements — real resume, dashboard actions, cleanup command - CLI resume now actually re-executes the workflow (calls workflowRunCommand with --resume internally instead of just flipping DB status) - Remove truncated IDs from executor guard messages (full ID in commands only) - Add Resume/Abandon/Delete buttons to dashboard workflow run cards - Add Delete button to history table rows - Add API endpoints: POST resume, POST abandon, DELETE workflow run - Add CLI workflow cleanup command (deletes terminal runs older than N days) - Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions * refactor: simplify API handlers, dashboard actions, and log conventions - Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers (was inline string checks diverging from CLI/command-handler) - Extract makeRunAction helper in DashboardPage (4 identical handlers → 1) - Fix log event names to use domain prefix convention (api.workflow_run_*) - Use Ban icon for Abandon to distinguish from Cancel's XCircle - Use instanceof Date guard in formatAge for clarity - Add comment on delete handler's active-status guard * refactor: simplify workflow lifecycle — remove interrupted, single resume path Rework the primitives for a clean foundation: Status model: 5 statuses (pending, running, completed, failed, cancelled). Remove 'interrupted' entirely — server restart now marks orphaned runs as 'failed' directly (with metadata.failure_reason = 'server_restart'). Resume model: one path. The executor's implicit findResumableRun detects prior failed runs and skips completed nodes. The CLI --resume flag reuses the prior run's worktree but lets the executor handle node-skipping (no more preCreatedRun bypass). Chat /workflow resume tells the user to re-invoke (auto-resume kicks in). Path guard: only blocks 'running' status (was running + interrupted). Guards always run regardless of preCreatedRun. Cancellation: generalized from status === 'cancelled' to status !== 'running' at all 3 check points (streaming, loop iterations, DAG layers). Ready for future 'paused' status with zero changes. - interruptOrphanedRuns → failOrphanedRuns - Remove preCreatedRun bypass from CLI --resume path - Generalize 3 cancellation check points in dag-executor - Update all API endpoints, command handlers, UI components - Update all tests and documentation * fix: cancel/complete race, abandon semantics, UTC dates - Fix cancel/complete race condition: dag-executor now checks DB status before calling completeWorkflowRun or failWorkflowRun, preventing a cancel during the final layer from being overwritten to completed - Abandon uses cancelWorkflowRun instead of failWorkflowRun, so abandoned runs don't get auto-resumed by findResumableRun - Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC * fix: address PR review — SQL safety, transactions, error handling, docs, tests - Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns - Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun) - Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors) - Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort) - Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files - Fix default mock getWorkflowRunStatus to return 'running' instead of null - Add NaN guard to formatAge (returns 'unknown' on unparseable dates) - Fix stale 'interrupted' references in route summary and delete comment - Include working path in /workflow resume response - Align deleteOldWorkflowRuns return type to { count } for consistency - Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference - Document new API endpoints (resume, abandon, delete) in CLAUDE.md - Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand - Fix workflowAbandonCommand test to assert cancelWorkflowRun call * refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards - Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor - Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601) - Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard - Rename cancelError → actionError in DashboardPage (covers 4 actions now) - Fix merge conflict: add IDatabase import, getRunningWorkflows from dev - Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono * fix: address review findings — double-rollback, missing guards, log context, tests - Fix double-rollback in deleteWorkflowRun by removing inner rollback() call and letting the outer catch handle it - Add terminal-status guard inside deleteWorkflowRun itself, not just in the route handler, to prevent deletion of running workflows - Add rollback failure logging to the rollback() helper - Add runId to error logs in resume/abandon/delete API route handlers - Add workingPath to getActiveWorkflowRunByPath error log - Add workflowRunId to dag-executor status-check warn logs - Wrap workflowRunCommand in try/catch in workflowResumeCommand with structured logging and null guard for user_message - Clean up stale 'interrupted' references in JSDoc - Fix missing / prefix on workflow cleanup in commands-reference.md - Add API route tests for POST /resume, POST /abandon, DELETE /:runId * refactor: apply code simplifications from review - Replace fragile startsWith string matching in deleteWorkflowRun catch with a typed WorkflowRunGuardError class - Reorder listWorkflowRuns placeholder generation: capture startIdx before pushing values for clarity - Replace curried makeRunAction factory in DashboardPage with a plain runAction helper function - Move skipIfStatusChanged helper definition before its call sites in dag-executor to match reading order * fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886) * fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854) The idle timeout (5 min default) caused two problems: (1) after a node's AI finished (result message), the loop waited for the subprocess to exit, wasting 5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls (tests, builds) triggered false timeouts on actively working nodes. Changes: - Break out of the for-await loop immediately after receiving the result message in both command/prompt and loop node paths — no more post-completion waste - Remove shouldResetTimer predicate so all message types (including tool) reset the timer — timeout only fires on complete silence - Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message resetting the timer, this is a deadlock detector, not a work limiter Fixes coleam00#854 * fix(workflows): update withIdleTimeout JSDoc to match new timer behavior Remove the tool-exclusion example from the shouldResetTimer docs since that pattern was just removed from all call sites. Clarify that most callers should omit the parameter. * fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs - Log generator cleanup errors in withIdleTimeout instead of silently swallowing - Add behavioral tests for break-after-result in both command/prompt and loop nodes - Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes) - Clarify shouldResetTimer test names and comments (utility API, not executor behavior) - Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern) - Remove redundant iterResult alias in withIdleTimeout * feat: add approval gate node type for human-in-the-loop workflows (coleam00#888) * feat: add approval gate node type for human-in-the-loop workflows Add `approval` as a fifth DAG node type that pauses workflow execution until a human approves or rejects. Built on existing resume infrastructure. - New `approval` node type in YAML workflows with `message` field - `paused` workflow status (non-terminal, resumable) - Approve/reject via REST API, CLI, chat commands, and Web UI - Approval comment available as `$node.output` in downstream nodes - Dashboard shows amber pulsing badge with approval message for paused runs - Path guard blocks new workflows on paused worktrees * fix: resolve 7 bugs in approval gate implementation Critical: - Parse metadata JSON on SQLite (returned as TEXT string, not object) - Suppress spurious "Workflow stopped (paused)" message after approval gates Medium: - CLI exits 0 on pause (not error code 1) - workflow status shows paused runs alongside running - Docs clarify auto-resume is CLI-only; API/chat mark resumable Low: - Skip completed_at when transitioning to failed for approval resume - Warn on AI fields (model, hooks, etc.) set on approval nodes - Add rowCount check to updateWorkflowRun Also: extract ApprovalContext type, add .min(1) to approval.message, fix stale "Only failed runs" error messages, fix log domain prefix, add recovery instructions to CLI approve error message. * feat(workflows): add interactive PRD workflow with approval gates (coleam00#891) Guided PRD creation through 3 approval gates: 1. Foundation questions (who/what/why/why now) 2. Deep dive questions (vision/users/JTBD/constraints) 3. Scope questions (MVP/hypothesis/exclusions) AI researches market + codebase between gates. Each gate carries the user's answers via $gate.output to downstream nodes. Final node generates the PRD from accumulated context. * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890) * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889) When workflow resume or approve is called, the worktree path is passed as cwd, but the codebase was registered under the main checkout path. This caused findCodebaseByDefaultCwd to return null, triggering auto-registration which tried to create a source symlink that already existed, crashing the command. Changes: - Add codebaseId option to WorkflowRunOptions - Look up codebase by stored ID before falling back to auto-registration - Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand - Add test verifying getCodebase is called with stored codebase_id on resume Fixes coleam00#889 * fix(cli): address review findings for PR coleam00#890 - Add domain prefix to 5 log events in codebase-lookup block (cli.codebase_lookup_failed, cli.db_connection_hint, cli.codebase_id_lookup_failed, cli.codebase_auto_registered, cli.codebase_auto_registration_failed) - Add inline comment clarifying intentional fallthrough in catch block - Add test for workflowApproveCommand codebase_id forwarding - Add test for error fallback path when getCodebase throws - Update pre-existing test assertion to match renamed log event * fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893) Two improvements to the interactive PRD workflow: 1. First-principles prompting: All research/technical/generate nodes now instruct the AI to start from what exists, verify claims by reading actual files, prefer extending over creating, and cite exact file:line. 2. Validation node: New final node reads the generated PRD and checks every technical claim against the codebase — file paths, endpoint names, DB schemas, event types, component names. Corrects inline and documents what was fixed. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
* fix: auto-select first DAG node on workflow run load (coleam00#781) The workflow output panel showed "No output available" on initial load because no DAG node was selected by default. Users had to manually click a node in the Logs tab to see output. Changes: - Add useEffect that auto-selects the first DAG node when workflow data loads and no node is selected - Prefer the currently running node for live workflows Fixes coleam00#781 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting and test fixes from validation - Fix formatting in workflows.test.ts and api.ts (prettier) - Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook }) in api.conversations.test.ts to match route registration requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: extract selectInitialNode to testable pure function Extract DAG node auto-selection logic from useEffect into a pure function with 5 unit tests covering all branches (undefined, empty, no running, prefer running, multiple running). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871) * feat: add interrupted to WorkflowRunStatus schema Implements US-001 from PRD. Changes: - Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts - Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts - Add interrupted: z.number() to dashboardRunsResponseSchema counts object - Add 'interrupted' to dashboardValidStatuses in API handler - Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts * feat: update IWorkflowStore interface & DB query implementations Implements US-002 from PRD. Changes: - IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath) - IWorkflowStore: drop conversationId from findResumableRun signature - IWorkflowStore: add interruptOrphanedRuns() method - db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted') - db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status - db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running' - store-adapter: wire all three new/modified methods - executor: update call sites to use renamed methods (type-check requirement) - tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns * feat: replace staleness guard with path-based lifecycle Implements US-003 from PRD. Changes: - executor.ts: remove STALE_MINUTES staleness auto-kill; replace with status-based guard — 'running' blocks, 'interrupted' offers resume/abandon - server/src/index.ts: replace failStaleWorkflowRuns() with createWorkflowStore().interruptOrphanedRuns() on startup - executor-preamble.test.ts: replace staleness detection tests with concurrent run guard tests covering 'running' and 'interrupted' cases * feat: command handler — /workflow status, resume, and abandon Implements US-004. Replaces time-based stale heuristics with explicit lifecycle commands for workflow management. Changes: - Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants - Replace /workflow status with global view: lists all running+interrupted runs across all worktrees (ID, name, working path, status, started-at) - Add /workflow resume <id>: validates state then calls resumeWorkflowRun - Add /workflow abandon <id>: validates state then calls failWorkflowRun - Add statuses[] filter to listWorkflowRuns for IN (...) queries - Update /workflow help text and default case usage string - Update /status command to remove stale warning that referenced removed constants - Replace old /workflow status tests with new behavior coverage - Add /workflow resume and /workflow abandon test coverage * feat: CLI workflow status, resume, and abandon subcommands Implements US-005 from PRD. Changes: - Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag - Add workflowResumeCommand: validates run state then calls resumeWorkflowRun - Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user') - Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd) - Wire resume/abandon subcommands in cli.ts - Update tests: replace "not implemented" test with status/resume/abandon coverage * feat: Web UI interrupted status badge and dashboard support Implements US-006 from PRD. Changes: - api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts - api.ts: add interrupted field to DashboardCounts interface - WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge - WorkflowRunCard.tsx: add amber dot and badge for interrupted status - StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list - DashboardPage.tsx: include interrupted in activeRuns filter and counts default * refactor: remove dead timer-based workflow staleness code Implements US-007 from PRD. Changes: - Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005) - Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002) - Remove IDatabase import from db/workflows.ts (no longer needed) - Remove failStaleWorkflowRuns tests from db/workflows.test.ts grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and grep -r 'failStaleWorkflowRuns' all return zero matches. * fix: address review feedback — truncated IDs, resume semantics, type safety - Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8)) - Add completed_at to interruptOrphanedRuns for correct duration metrics - Fix resume commands: mark interrupted→failed to unblock path guard, let next workflow invocation auto-resume via findResumableRun - Collapse dual status/statuses fields into status?: T | T[] - Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants - Add explicit else-if for interrupted status in executor path guard - Add structured logging to CLI workflow commands - Restore conversationId to cmd.workflow_status_failed log - Add tests: listWorkflowRuns statuses filter, interrupted auto-resume, DB error handling for resume/abandon - Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md * refactor: simplify CLI commands and status filter logic - Extract getRunOrThrow helper for shared run lookup pattern - Use WorkflowRun[] instead of Awaited<ReturnType<...>> - Remove single-item special case in listWorkflowRuns (IN works for all) - Use ?? instead of || for null-coalescing consistency - Remove unused ACTIVE_WORKFLOW_STATUSES constant - Add inline comment on completed_at for interrupted runs * fix: handle SQLite string dates in formatAge SQLite returns started_at as a string, not a Date object. formatAge now accepts Date | string and converts accordingly. Found during E2E testing against real SQLite database. * feat: UX improvements — real resume, dashboard actions, cleanup command - CLI resume now actually re-executes the workflow (calls workflowRunCommand with --resume internally instead of just flipping DB status) - Remove truncated IDs from executor guard messages (full ID in commands only) - Add Resume/Abandon/Delete buttons to dashboard workflow run cards - Add Delete button to history table rows - Add API endpoints: POST resume, POST abandon, DELETE workflow run - Add CLI workflow cleanup command (deletes terminal runs older than N days) - Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions * refactor: simplify API handlers, dashboard actions, and log conventions - Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers (was inline string checks diverging from CLI/command-handler) - Extract makeRunAction helper in DashboardPage (4 identical handlers → 1) - Fix log event names to use domain prefix convention (api.workflow_run_*) - Use Ban icon for Abandon to distinguish from Cancel's XCircle - Use instanceof Date guard in formatAge for clarity - Add comment on delete handler's active-status guard * refactor: simplify workflow lifecycle — remove interrupted, single resume path Rework the primitives for a clean foundation: Status model: 5 statuses (pending, running, completed, failed, cancelled). Remove 'interrupted' entirely — server restart now marks orphaned runs as 'failed' directly (with metadata.failure_reason = 'server_restart'). Resume model: one path. The executor's implicit findResumableRun detects prior failed runs and skips completed nodes. The CLI --resume flag reuses the prior run's worktree but lets the executor handle node-skipping (no more preCreatedRun bypass). Chat /workflow resume tells the user to re-invoke (auto-resume kicks in). Path guard: only blocks 'running' status (was running + interrupted). Guards always run regardless of preCreatedRun. Cancellation: generalized from status === 'cancelled' to status !== 'running' at all 3 check points (streaming, loop iterations, DAG layers). Ready for future 'paused' status with zero changes. - interruptOrphanedRuns → failOrphanedRuns - Remove preCreatedRun bypass from CLI --resume path - Generalize 3 cancellation check points in dag-executor - Update all API endpoints, command handlers, UI components - Update all tests and documentation * fix: cancel/complete race, abandon semantics, UTC dates - Fix cancel/complete race condition: dag-executor now checks DB status before calling completeWorkflowRun or failWorkflowRun, preventing a cancel during the final layer from being overwritten to completed - Abandon uses cancelWorkflowRun instead of failWorkflowRun, so abandoned runs don't get auto-resumed by findResumableRun - Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC * fix: address PR review — SQL safety, transactions, error handling, docs, tests - Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns - Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun) - Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors) - Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort) - Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files - Fix default mock getWorkflowRunStatus to return 'running' instead of null - Add NaN guard to formatAge (returns 'unknown' on unparseable dates) - Fix stale 'interrupted' references in route summary and delete comment - Include working path in /workflow resume response - Align deleteOldWorkflowRuns return type to { count } for consistency - Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference - Document new API endpoints (resume, abandon, delete) in CLAUDE.md - Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand - Fix workflowAbandonCommand test to assert cancelWorkflowRun call * refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards - Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor - Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601) - Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard - Rename cancelError → actionError in DashboardPage (covers 4 actions now) - Fix merge conflict: add IDatabase import, getRunningWorkflows from dev - Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono * fix: address review findings — double-rollback, missing guards, log context, tests - Fix double-rollback in deleteWorkflowRun by removing inner rollback() call and letting the outer catch handle it - Add terminal-status guard inside deleteWorkflowRun itself, not just in the route handler, to prevent deletion of running workflows - Add rollback failure logging to the rollback() helper - Add runId to error logs in resume/abandon/delete API route handlers - Add workingPath to getActiveWorkflowRunByPath error log - Add workflowRunId to dag-executor status-check warn logs - Wrap workflowRunCommand in try/catch in workflowResumeCommand with structured logging and null guard for user_message - Clean up stale 'interrupted' references in JSDoc - Fix missing / prefix on workflow cleanup in commands-reference.md - Add API route tests for POST /resume, POST /abandon, DELETE /:runId * refactor: apply code simplifications from review - Replace fragile startsWith string matching in deleteWorkflowRun catch with a typed WorkflowRunGuardError class - Reorder listWorkflowRuns placeholder generation: capture startIdx before pushing values for clarity - Replace curried makeRunAction factory in DashboardPage with a plain runAction helper function - Move skipIfStatusChanged helper definition before its call sites in dag-executor to match reading order * fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886) * fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854) The idle timeout (5 min default) caused two problems: (1) after a node's AI finished (result message), the loop waited for the subprocess to exit, wasting 5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls (tests, builds) triggered false timeouts on actively working nodes. Changes: - Break out of the for-await loop immediately after receiving the result message in both command/prompt and loop node paths — no more post-completion waste - Remove shouldResetTimer predicate so all message types (including tool) reset the timer — timeout only fires on complete silence - Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message resetting the timer, this is a deadlock detector, not a work limiter Fixes coleam00#854 * fix(workflows): update withIdleTimeout JSDoc to match new timer behavior Remove the tool-exclusion example from the shouldResetTimer docs since that pattern was just removed from all call sites. Clarify that most callers should omit the parameter. * fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs - Log generator cleanup errors in withIdleTimeout instead of silently swallowing - Add behavioral tests for break-after-result in both command/prompt and loop nodes - Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes) - Clarify shouldResetTimer test names and comments (utility API, not executor behavior) - Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern) - Remove redundant iterResult alias in withIdleTimeout * feat: add approval gate node type for human-in-the-loop workflows (coleam00#888) * feat: add approval gate node type for human-in-the-loop workflows Add `approval` as a fifth DAG node type that pauses workflow execution until a human approves or rejects. Built on existing resume infrastructure. - New `approval` node type in YAML workflows with `message` field - `paused` workflow status (non-terminal, resumable) - Approve/reject via REST API, CLI, chat commands, and Web UI - Approval comment available as `$node.output` in downstream nodes - Dashboard shows amber pulsing badge with approval message for paused runs - Path guard blocks new workflows on paused worktrees * fix: resolve 7 bugs in approval gate implementation Critical: - Parse metadata JSON on SQLite (returned as TEXT string, not object) - Suppress spurious "Workflow stopped (paused)" message after approval gates Medium: - CLI exits 0 on pause (not error code 1) - workflow status shows paused runs alongside running - Docs clarify auto-resume is CLI-only; API/chat mark resumable Low: - Skip completed_at when transitioning to failed for approval resume - Warn on AI fields (model, hooks, etc.) set on approval nodes - Add rowCount check to updateWorkflowRun Also: extract ApprovalContext type, add .min(1) to approval.message, fix stale "Only failed runs" error messages, fix log domain prefix, add recovery instructions to CLI approve error message. * feat(workflows): add interactive PRD workflow with approval gates (coleam00#891) Guided PRD creation through 3 approval gates: 1. Foundation questions (who/what/why/why now) 2. Deep dive questions (vision/users/JTBD/constraints) 3. Scope questions (MVP/hypothesis/exclusions) AI researches market + codebase between gates. Each gate carries the user's answers via $gate.output to downstream nodes. Final node generates the PRD from accumulated context. * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890) * fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889) When workflow resume or approve is called, the worktree path is passed as cwd, but the codebase was registered under the main checkout path. This caused findCodebaseByDefaultCwd to return null, triggering auto-registration which tried to create a source symlink that already existed, crashing the command. Changes: - Add codebaseId option to WorkflowRunOptions - Look up codebase by stored ID before falling back to auto-registration - Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand - Add test verifying getCodebase is called with stored codebase_id on resume Fixes coleam00#889 * fix(cli): address review findings for PR coleam00#890 - Add domain prefix to 5 log events in codebase-lookup block (cli.codebase_lookup_failed, cli.db_connection_hint, cli.codebase_id_lookup_failed, cli.codebase_auto_registered, cli.codebase_auto_registration_failed) - Add inline comment clarifying intentional fallthrough in catch block - Add test for workflowApproveCommand codebase_id forwarding - Add test for error fallback path when getCodebase throws - Update pre-existing test assertion to match renamed log event * fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893) Two improvements to the interactive PRD workflow: 1. First-principles prompting: All research/technical/generate nodes now instruct the AI to start from what exists, verify claims by reading actual files, prefer extending over creating, and cite exact file:line. 2. Validation node: New final node reads the generated PRD and checks every technical claim against the codebase — file paths, endpoint names, DB schemas, event types, component names. Corrects inline and documents what was fixed. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Summary
This PR adds comprehensive Ollama support to Archon, enabling users to use local LLM models for both chat and embeddings.
Features Added:
Technical Changes:
python/src/server/api_routes/ollama_api.py- Extended API endpointspython/src/server/services/embeddings/embedding_service.py- Ollama embedding supportpython/src/server/services/llm_provider_service.py- Ollama LLM providerpython/src/server/services/ollama/model_discovery_service.py- Model discoveryarchon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx- UI panelarchon-ui-main/src/components/settings/RAGSettings.tsx- Updated settingsConfiguration:
Users can configure Ollama via the Settings page:
Test plan
archon-ui-main/tests/e2e/ollama-api-mode.spec.ts🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Improvements
Tests
Chores
Style