Skip to content

Add Ollama Support with Local Model Discovery and Embeddings#875

Closed
Milofax wants to merge 20 commits into
coleam00:mainfrom
Milofax:feature/ollama-support
Closed

Add Ollama Support with Local Model Discovery and Embeddings#875
Milofax wants to merge 20 commits into
coleam00:mainfrom
Milofax:feature/ollama-support

Conversation

@Milofax

@Milofax Milofax commented Nov 21, 2025

Copy link
Copy Markdown

Summary

This PR adds comprehensive Ollama support to Archon, enabling users to use local LLM models for both chat and embeddings.

Features Added:

  • Ollama Integration: Full support for Ollama as LLM and embedding provider
  • Model Discovery: Automatic detection of available Ollama models via API
  • UI Configuration: New OllamaConfigurationPanel for easy setup in Settings
  • RAG Settings Update: Support for Ollama embeddings in RAG pipeline
  • API Mode Selection: Native Ollama API or OpenAI-compatible mode
  • Documentation: Added INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md

Technical Changes:

  • python/src/server/api_routes/ollama_api.py - Extended API endpoints
  • python/src/server/services/embeddings/embedding_service.py - Ollama embedding support
  • python/src/server/services/llm_provider_service.py - Ollama LLM provider
  • python/src/server/services/ollama/model_discovery_service.py - Model discovery
  • archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx - UI panel
  • archon-ui-main/src/components/settings/RAGSettings.tsx - Updated settings

Configuration:

Users can configure Ollama via the Settings page:

  • Ollama Base URL (local or remote)
  • API Mode (native or OpenAI-compatible)
  • Embedding model selection
  • Chat model selection

Test plan

  • Tested with local Ollama instance
  • Tested with remote Ollama server (with auth token)
  • Verified model discovery works correctly
  • Verified embeddings generation with Ollama models
  • E2E tests included in archon-ui-main/tests/e2e/ollama-api-mode.spec.ts

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-instance Ollama authentication (chat & embeddings), API mode toggle, native Ollama embeddings, and automatic model discovery.
  • Documentation

    • Added comprehensive INFRASTRUCTURE guide and a bilingual QUICKSTART.
  • Improvements

    • Enhanced PDF extraction with OCR fallback and smarter document chunking; multi-dimensional hybrid search/embedding support.
  • Tests

    • New E2E Playwright and expanded Python/unit tests for API modes and auth flows.
  • Chores

    • Containers set to restart automatically; Playwright added; .gitignore ignores Supabase local data and test-results.
  • Style

    • Removed unused icon imports and minor UI refinements.

- Implement Ollama integration for LLM and embeddings
- Add OllamaConfigurationPanel for UI configuration
- Add model discovery service for automatic model detection
- Update RAG settings to support Ollama embeddings
- Add documentation (INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md)
- Add E2E tests for Ollama functionality

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Nov 21, 2025

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-instance Ollama authentication end-to-end (UI, persistence, and backend propagation), a native Ollama embeddings adapter, auth-aware model discovery and health checks, Supabase infra docs/quickstart, Playwright tests, Docker restart policies, multi-dimensional search RPCs, and PDF/OCR extraction improvements.

Changes

Cohort / File(s) Summary
Docs & Repo config
/.gitignore, archon-ui-main/.gitignore, INFRASTRUCTURE.md, QUICKSTART.md, PLAN.md
Add Supabase ignore patterns; new INFRASTRUCTURE and QUICKSTART docs; document Ollama auth, deployment and health-check guidance.
Frontend: RAG & Ollama UI / types
archon-ui-main/src/components/settings/RAGSettings.tsx, archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx, archon-ui-main/src/components/settings/types/OllamaTypes.ts, archon-ui-main/src/services/credentialsService.ts
Add per-instance useAuth/authToken, UI controls, persist tokens into ragSettings (OLLAMA_CHAT_AUTH_TOKEN / OLLAMA_EMBEDDING_AUTH_TOKEN), and update related types (CHAT_MODEL, OLLAMA_API_MODE).
Frontend: utilities & unit tests
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts, archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
New sync utility for embedding-from-LLM configs and comprehensive unit tests covering name/url/useAuth/authToken sync and edge cases.
Frontend: e2e & build deps
archon-ui-main/package.json, archon-ui-main/tests/e2e/ollama-api-mode.spec.ts
Add @playwright/test devDependency and Playwright e2e tests validating Ollama API mode UI flows and persistence.
Frontend: import cleanup & ignores
archon-ui-main/src/components/*, archon-ui-main/.gitignore
Remove unused lucide-react imports across multiple components; add test-results/ to .gitignore.
Backend: API routes & token mapping
python/src/server/api_routes/ollama_api.py
Derive per-URL tokens from rag_strategy, normalize instance URLs, map tokens per-instance, and pass tokens into discovery/health/validate endpoints with logging.
Backend: model discovery & health checks
python/src/server/services/ollama/model_discovery_service.py
Add optional auth_token to discovery and health methods; accept an auth_tokens map for multi-instance discovery and attach Authorization headers when provided.
Backend: LLM provider & validation
python/src/server/services/llm_provider_service.py
Select Ollama auth token by operation (chat vs embedding), fallback to "required-but-ignored" if absent, resolve embedding URL /v1 as needed, and extend validate_provider_instance to accept auth_token.
Backend: Embeddings adapter
python/src/server/services/embeddings/embedding_service.py
Add NativeOllamaEmbeddingAdapter for Ollama native /api/embeddings (per-text parallel fetchs, optional Bearer auth) and select adapter based on OLLAMA_API_MODE (native vs OpenAI-compatible).
Backend: search & storage
python/src/server/services/search/..., python/src/server/services/storage/base_storage_service.py
Switch hybrid search RPCs to multi-dimensional variants (include embedding_dimension), and improve chunk-splitting to avoid splitting code blocks and prefer headings/paragraphs.
Backend: PDF/OCR extraction & deps
python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py, python/pyproject.toml
Add pymupdf4llm-first PDF→Markdown path, pdfplumber/PyPDF2 fallbacks, OCR fallback via pytesseract/pdf2image, and add related pyproject dependencies.
Backend: tests & small changes
python/tests/*, python/src/agent_work_orders/utils/state_reconciliation.py, python/tests/test_ollama_auth_token.py, python/tests/test_async_llm_provider_service.py
Add tests for Ollama auth propagation and llm-provider scenarios; whitespace-only reflow in state_reconciliation.
Docker
docker-compose.yml
Add restart: unless-stopped to archon-server, archon-mcp, archon-agents, and archon-frontend services.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant UI as Frontend (RAGSettings / OllamaConfig)
  participant API as Archon API (ollama_api)
  participant Discovery as ModelDiscoveryService
  participant Ollama as Ollama Instance

  User->>UI: Configure Ollama URL + useAuth + authToken
  UI->>API: Save ragSettings (includes per-instance tokens)
  User->>UI: Trigger health check / discover models
  UI->>API: GET /health or /discover with instance URLs
  API->>API: Normalize URLs and map per-URL tokens from rag_strategy
  API->>Discovery: check_instance_health/discover_models(url, auth_token)
  Discovery->>Ollama: HTTP request (Authorization: Bearer token if provided)
  Ollama-->>Discovery: health/models response
  Discovery-->>API: aggregated result
  API-->>UI: health/discovery result
  UI-->>User: display status and available models
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • tazmon95
  • leex279
  • Wirasm

Poem

🐰 I found a token and tucked it away,

Rag settings hum with a brighter day.
From checkbox click to a backend ping,
Ollama answers when the bell does ring.
A hopping rabbit cheers—code stitched and gay. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature being added: Ollama support with model discovery and embeddings capabilities.
Description check ✅ Passed The description covers the main features, technical changes, and test plan, but lacks detailed information on specific test commands, evidence of test results, and checklist items.
Docstring Coverage ✅ Passed Docstring coverage is 86.05% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Milofax Milofax mentioned this pull request Nov 21, 2025
Milofax and others added 2 commits November 21, 2025 20:02
…if block

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Add test for Ollama with auth token
- Add test for Ollama without auth token (local setup)
- Update fallback test to expect 'required-but-ignored' placeholder
- Add RAG settings mock to multi-provider sequence test

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
archon-ui-main/src/services/credentialsService.ts (1)

495-547: Critical: Authentication tokens are not persisted to the database.

The authToken field is added to the OllamaInstance interface (line 68) and collected in the UI, but it is never saved in the setOllamaInstances method. Lines 507-528 define the fields object that gets persisted, and authToken is not included.

Impact: Users can enter authentication tokens in the UI, but these tokens will be silently discarded and not saved to the database. This breaks the per-instance authentication feature entirely.

Fix: Add authToken to the persisted fields:

       const fields: Record<string, any> = {
         name: instance.name,
         baseUrl: instance.baseUrl,
         isEnabled: instance.isEnabled,
         isPrimary: instance.isPrimary,
         instanceType: instance.instanceType || 'both',
         loadBalancingWeight: instance.loadBalancingWeight || 100
       };
       
+      // Add optional authToken if present
+      if (instance.authToken) {
+        fields.authToken = instance.authToken;
+      }
+      
       // Add optional health-related fields
       if (instance.isHealthy !== undefined) {

Also update the loading logic in getOllamaInstances (around line 465) to handle the authToken field when reconstructing instances from credentials.

python/src/server/services/llm_provider_service.py (1)

1211-1245: Ollama instance validation doesn’t pass auth_token to health check

validate_provider_instance still calls:

health_status = await model_discovery_service.check_instance_health(instance_url)

with no auth_token, while check_instance_health now supports token-based auth and /api/ollama/instances/health already uses it. For instances with authentication enabled, /api/ollama/validate will incorrectly report them as unavailable.

Consider resolving an auth token the same way as in health_check_endpoint and passing it through:

from .credential_service import credential_service

rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/")
embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")

chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or ""
embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or ""

normalized = instance_url.replace("/v1", "").rstrip("/")
auth_token = None
if normalized == llm_base_url:
    auth_token = chat_auth_token or None
elif normalized == embedding_base_url:
    auth_token = embedding_auth_token or None

health_status = await model_discovery_service.check_instance_health(instance_url, auth_token=auth_token)

(ideally sharing a helper with the API route to avoid duplication).

archon-ui-main/src/components/settings/RAGSettings.tsx (2)

786-805: Resetting instance configs drops useAuth / authToken fields

In several places the instance config state is reset or auto‑populated without including the new auth fields:

  • handleDeleteLLMInstance and handleDeleteEmbeddingInstance reset configs to { name: '', url: '' }.
  • The “auto‑populate embedding instance if it’s empty” path in the LLM URL change handler constructs an embedding config with just name and url.

Given the initial state includes useAuth and authToken, this:

  • Conflicts with the inferred state type under strict TypeScript.
  • Can cause React to flip checkboxes between controlled and uncontrolled (checked={undefined}).
  • Silently clears any auth state when users delete or auto‑populate instances.

A simple fix is to always provide the full shape when setting these configs, e.g.:

// LLM delete
- setLLMInstanceConfig({
-   name: '',
-   url: ''
- });
+ setLLMInstanceConfig({
+   name: '',
+   url: '',
+   useAuth: false,
+   authToken: ''
+ });

// Embedding delete
- setEmbeddingInstanceConfig({
-   name: '',
-   url: ''
- });
+ setEmbeddingInstanceConfig({
+   name: '',
+   url: '',
+   useAuth: false,
+   authToken: ''
+ });

// Auto-populate embedding from LLM URL
- if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) {
-   setEmbeddingInstanceConfig({
-     name: llmInstanceConfig.name || 'Default Ollama',
-     url: newUrl
-   });
- }
+ if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) {
+   setEmbeddingInstanceConfig({
+     name: llmInstanceConfig.name || 'Default Ollama',
+     url: newUrl,
+     useAuth: embeddingInstanceConfig.useAuth ?? false,
+     authToken: embeddingInstanceConfig.authToken ?? '',
+   });
+ }

This keeps state shape consistent and avoids subtle UI glitches.

Also applies to: 809-827, 2329-2354


1845-1899: Selected model summary uses CHAT_MODEL, which is never written here

The new summary row shows:

{activeSelection === 'chat' ? (
  <span className="text-white">
    {ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span>}
  </span>
) : (
  ...
)}

But throughout this component the chat model is managed via MODEL_CHOICE:

  • Provider buttons set MODEL_CHOICE.
  • OllamaModelSelectionModal’s onSelectModel updates MODEL_CHOICE.
  • getDisplayedChatModel reads MODEL_CHOICE and the provider to display the active chat model.

CHAT_MODEL is defined in the props type but never written here, so the summary will often show “Not selected” even when a chat model is configured.

Consider either:

  • Reusing getDisplayedChatModel(ragSettings) in this row, or
  • Keeping CHAT_MODEL as the canonical key and updating it whenever MODEL_CHOICE changes.

For example:

- {ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span>}
+ {getDisplayedChatModel(ragSettings) || <span className="text-gray-500 italic">Not selected</span>}

The authentication status row itself looks good and does not leak token values.

🧹 Nitpick comments (12)
QUICKSTART.md (1)

1-47: LGTM! Clear and concise quickstart guide.

The quickstart guide provides a focused, step-by-step workflow for local development setup. The structure is clean and the instructions are easy to follow.

Optional: For improved markdown formatting, consider wrapping bare URLs in angle brackets per markdown best practices (lines 25-28):

-| **Archon UI** | http://localhost:3737 | Hauptanwendung |
+| **Archon UI** | <http://localhost:3737> | Hauptanwendung |

This is a minor style preference and does not affect functionality.

python/src/server/api_routes/ollama_api.py (1)

83-142: Align all Ollama health/model flows with auth-token support

You’ve wired per-instance auth tokens into /models and /instances/health, but other Ollama flows in this module still call discovery/health without tokens:

  • /validatevalidate_provider_instance("ollama", instance_url) (service currently calls check_instance_health with no auth_token).
  • /models/discover-and-store and /models/discover-with-details still hit Ollama without any Authorization header.

For auth-protected instances these will now systematically report failures even though /instances/health can succeed with tokens.

Recommend:

  • Updating validate_provider_instance (in llm_provider_service.py) to resolve an auth_token exactly the way health_check_endpoint does and pass it through to check_instance_health.
  • Planning a follow-up to add token support to the discover-and-store / discover-with-details endpoints as well, or explicitly document that they currently require unauthenticated instances.

Also applies to: 166-217, 262-305

archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (1)

1-191: Reduce hard-coded timeouts and brittle selectors in Playwright tests

The flows are clear, but they’re quite fragile:

  • Almost every step uses waitForTimeout(...) instead of waiting on concrete UI conditions (waitForSelector, expect(...).toBeVisible()/toBeEnabled()). This slows tests and risks flakiness.
  • Many selectors are generic text queries like text=Embedding or button:has-text("Config"), which can easily break on copy changes or multiple matches.
  • Navigation to Settings → RAG Settings → Embedding → Ollama Config is duplicated across tests.

Suggested improvements (can be incremental):

  • Replace sleeps with condition-based waits around key elements.
  • Prefer role-/test-id–based locators (getByRole, getByTestId) over raw text where possible.
  • Extract the repeated “open Ollama config in RAG Settings” flow into a helper to DRY the suite.
python/src/server/services/embeddings/embedding_service.py (2)

107-172: NativeOllamaEmbeddingAdapter looks correct but has unbounded per-batch concurrency

The adapter correctly targets /api/embeddings with { "model": ..., "prompt": ... } and wraps HTTP/JSON failures in EmbeddingAPIError, which is consistent with the rest of the service.

Two things to consider:

  • asyncio.gather(*...) will fire one HTTP request per text concurrently for each batch. With large EMBEDDING_BATCH_SIZE values this could mean dozens or hundreds of parallel requests to a single Ollama instance. You may want to cap per-batch concurrency via a semaphore or chunking to avoid overloading local/remote instances.
  • The dimensions argument is currently ignored for native Ollama. That’s fine (Ollama controls embedding size), but it may be worth documenting in the docstring or via a comment to avoid confusion later.

287-303: Adapter selection for Ollama depends on api_mode string value – normalize it

_get_embedding_adapter checks:

if provider_name == "ollama" and base_url:
    if (api_mode or "native") == "native":
        ...

If OLLAMA_API_MODE ever comes through as "Native" / "NATIVE" or with whitespace, this will silently fall into the OpenAI-compatible branch even though the intent is “native”.

Given you already read/derive ollama_api_mode above, consider normalizing it once (e.g., .lower().strip()) before passing into _get_embedding_adapter, and/or do the normalization inside _get_embedding_adapter:

-        if (api_mode or "native") == "native":
+        mode = (api_mode or "native").strip().lower()
+        if mode == "native":

This makes the behavior robust to minor config variations.

python/src/server/services/ollama/model_discovery_service.py (3)

122-133: Auth token support for /api/tags is correct but not propagated to detail/capability calls

The new auth_token parameter is correctly wired into the /api/tags discovery call and keeps behavior unchanged when no token is provided. However, follow‑up calls such as _get_model_details() and the capability tests still hit Ollama endpoints without any auth header. On protected instances this will silently 401 and you’ll lose detailed metadata and accurate capability detection, even though listing itself succeeded with a token.

If you want full parity for authenticated instances, consider threading auth_token through the enrichment pipeline, e.g.:

-    async def discover_models(self, instance_url: str, fetch_details: bool = False, auth_token: str | None = None) -> list[OllamaModel]:
+    async def discover_models(
+        self,
+        instance_url: str,
+        fetch_details: bool = False,
+        auth_token: str | None = None,
+    ) -> list[OllamaModel]:

@@
-                enriched_models = await self._enrich_model_capabilities(models, instance_url, fetch_details=fetch_details)
+                enriched_models = await self._enrich_model_capabilities(
+                    models,
+                    instance_url,
+                    fetch_details=fetch_details,
+                    auth_token=auth_token,
+                )

and then update _enrich_model_capabilities, _get_model_details, and any capability helpers that hit Ollama directly to accept and forward auth_token in their HTTP headers.

This keeps non‑auth flows identical while making “fetch_details=True” actually usable with locked‑down Ollama instances.

Also applies to: 192-199


964-1004: Health check auth header wiring looks good; cache key ignores token changes

Passing auth_token and conditionally adding a Bearer header here is straightforward and matches the new health‑check API behavior. One nuance: the health_cache key is only health_{instance_url}, so a failure without a token and a success with a token (or vice versa) will be cached identically for 30 seconds.

If you expect users to toggle/authenticate instances and immediately rely on the health UI, consider including the auth presence in the cache key, e.g. health_{instance_url}_{bool(auth_token)} or similar, so that “with token” and “without token” statuses don’t mask each other in the short term.


1026-1060: Per-instance auth_tokens map depends on exact URL string matching

The auth_tokens dict is plumbed correctly into discover_models_from_multiple_instances, and using auth_tokens.get(url) is nice and simple. This does assume that the keys in auth_tokens are in the same normalized form as the instance_urls passed here (e.g. either both with /v1 stripped and no trailing slash, or both unnormalized).

If the router that constructs auth_tokens is normalizing URLs (as in the health‑check endpoint), it would be safer to normalize url before doing auth_tokens.get(...):

-        tasks = [
-            self.discover_models(url, fetch_details=fetch_details, auth_token=auth_tokens.get(url))
-            for url in instance_urls
-        ]
+        tasks = [
+            self.discover_models(
+                url,
+                fetch_details=fetch_details,
+                auth_token=auth_tokens.get(url.rstrip("/").replace("/v1", "")),
+            )
+            for url in instance_urls
+        ]

so a mismatch in /v1 or trailing slashes doesn’t quietly drop the token.

archon-ui-main/src/components/settings/RAGSettings.tsx (4)

208-221: Instance config + token sync logic is solid; consider typing RAG token fields

The new llmInstanceConfig/embeddingInstanceConfig shapes and the useEffect blocks that sync from LLM_BASE_URL / OLLAMA_* fields into these configs look logically correct:

  • Refs (lastLLMConfigRef / lastEmbeddingConfigRef) prevent infinite loops.
  • useAuth is derived from token presence and authToken is updated only when the incoming value changes.

The only minor concern is the reliance on (ragSettings as any).OLLAMA_*_AUTH_TOKEN to satisfy TypeScript. Since these keys are now part of your configuration surface, it would be cleaner to add them to RAGSettingsProps["ragSettings"] explicitly:

   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
@@
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;

so you can drop the as any casts and keep strict typing end‑to‑end.

Also applies to: 224-271


570-583: Model discovery fetches work but bypass TanStack Query and may over-fetch

ollamaMetrics now tracks llmAvailableModels / embeddingAvailableModels, and loadAvailableModels() hits /api/ollama/models both:

  • Automatically when the instance URLs change, and
  • After a successful manualTestConnection when a context is provided.

This is functionally fine and the error handling is intentionally soft, but two points:

  1. Project guidelines say “Use TanStack Query for all data fetching”. These new fetches bypass that layer, so if you already have React Query plumbing for Ollama endpoints, consider converting loadAvailableModels into a query (with enabled flags keyed on URLs) instead of a bespoke fetch. That would centralize retries, caching, and error behavior.

  2. The URL‑change effects fire regardless of the active provider; if a user temporarily switches away from Ollama, you still ping /api/ollama/models whenever the URL state changes. You may want to guard them:

-  useEffect(() => {
-    if (llmInstanceConfig.url && llmInstanceConfig.url !== DEFAULT_OLLAMA_URL) {
+  useEffect(() => {
+    if (
+      chatProvider === 'ollama' &&
+      llmInstanceConfig.url &&
+      llmInstanceConfig.url !== DEFAULT_OLLAMA_URL
+    ) {
       loadAvailableModels(llmInstanceConfig.url, 'chat');
     }
-  }, [llmInstanceConfig.url]);
+  }, [chatProvider, llmInstanceConfig.url]);

Same idea for the embedding instance. This reduces unnecessary calls when Ollama isn’t currently in use.

Also applies to: 640-691, 751-755


1595-1652: OLLAMA_API_MODE UI is clear; consider typing the setting key

The API mode toggle between “Native Ollama API” and “OpenAI‑Compatible” is well‑designed and the defaulting logic (native when unset) is clear. The only gap is that OLLAMA_API_MODE isn’t declared in RAGSettingsProps["ragSettings"], so all setRagSettings({ ...ragSettings, OLLAMA_API_MODE: ... }) calls rely on structural typing and the setRagSettings: any escape hatch.

For better static safety, add:

   ragSettings: {
@@
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';

so backend and frontend stay in sync as this option evolves.


2351-2387: Auth settings in modals are wired correctly; consider clearing tokens on delete

The new auth sections in the LLM and Embedding edit modals behave as expected:

  • useAuth toggles a password input.
  • Tokens are stored in llmInstanceConfig.authToken / embeddingInstanceConfig.authToken.
  • Saving the modal pushes tokens into ragSettings under OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN.

Two small follow‑ups:

  1. When deleting an instance (LLM or Embedding), you clear LLM_BASE_URL / OLLAMA_EMBEDDING_URL but leave the corresponding OLLAMA_*_AUTH_TOKEN in ragSettings. It might be more intuitive (and safer) to also blank those keys so “delete instance” truly removes all associated configuration.

  2. The modal “Save Changes” handlers currently only update local ragSettings; persistence happens when the user later clicks the top‑level “Save Settings” button. That matches existing behavior for other fields, but the success toast (“instance updated successfully”) might be interpreted as “persisted to server”. If you’ve had user confusion around this, consider calling credentialsService.updateRagSettings(updatedSettings) here as well.

Functionally, the core token wiring looks good and doesn’t expose secrets in the UI.

Also applies to: 2400-2405, 2446-2477, 2490-2495

Comment thread archon-ui-main/test-results/.last-run.json Outdated
Comment thread INFRASTRUCTURE.md Outdated
Comment thread PLAN.md
Comment thread python/src/server/api_routes/ollama_api.py
Comment thread python/src/server/api_routes/ollama_api.py
Comment on lines +287 to 303
def _get_embedding_adapter(
provider: str, client: Any, base_url: str | None = None, auth_token: str | None = None, api_mode: str | None = None
) -> EmbeddingProviderAdapter:
provider_name = (provider or "").lower()
if provider_name == "google":
search_logger.info("Using Google native embedding adapter")
return GoogleEmbeddingAdapter()
if provider_name == "ollama" and base_url:
# Check API mode - default to native if not specified or if set to native
if (api_mode or "native") == "native":
search_logger.info(f"Using Ollama native API adapter with base URL: {base_url}")
return NativeOllamaEmbeddingAdapter(base_url, auth_token)
else:
search_logger.info(f"Using OpenAI-compatible adapter for Ollama with base URL: {getattr(client, 'base_url', 'N/A')}")
return OpenAICompatibleEmbeddingAdapter(client)
search_logger.info(f"Using OpenAI-compatible embedding adapter for provider: {provider_name}")
return OpenAICompatibleEmbeddingAdapter(client)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle missing OLLAMA_EMBEDDING_URL more cleanly and avoid unnecessary Ollama config work for other providers

In create_embeddings_batch you do:

ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")

If OLLAMA_EMBEDDING_URL is present but set to None, the .rstrip call will raise and force the except path, even though this is just “no embedding URL configured”. You also compute these Ollama-specific values even when embedding_provider is not "ollama".

You could make this more robust and targeted as:

-                    ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
-                    ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
-                    ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")
+                    ollama_base_url_raw = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip()
+                    ollama_base_url = ollama_base_url_raw.rstrip("/v1").rstrip("/") if ollama_base_url_raw else ""
+                    ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "") or ""
+                    ollama_api_mode = (rag_settings.get("OLLAMA_API_MODE") or "native").lower()

and only pass ollama_base_url/ollama_auth_token into _get_embedding_adapter when embedding_provider.lower() == "ollama".

This avoids noisy warnings from benign None values and keeps non-Ollama providers from incurring extra config work. Normalizing ollama_api_mode to lowercase also makes _get_embedding_adapter resilient to casing differences.

Also applies to: 433-469

🤖 Prompt for AI Agents
python/src/server/services/embeddings/embedding_service.py around lines 287-303
(also apply same change to 433-469): the code currently computes
ollama_base_url/auth_token/api_mode unconditionally and calls .rstrip on a value
that may be None, which can raise and causes unnecessary Ollama handling for
other providers; change the call sites to only compute and sanitize
OLLAMA_EMBEDDING_URL/AUTH_TOKEN/API_MODE when embedding_provider.lower() ==
"ollama" (guard against None before calling .rstrip), normalize api_mode to
lowercase, and only pass ollama_base_url and ollama_auth_token into
_get_embedding_adapter when the provider is Ollama so non-Ollama providers avoid
extra config work and you won't call .rstrip on None.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Ensure OLLAMA_*_AUTH_TOKEN values from rag_settings are actual tokens, not “[ENCRYPTED]” placeholders

Both here and in llm_provider_service.get_llm_client you read:

ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
# or
ollama_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "")

The provided credential_service.get_credentials_by_category("rag_strategy") implementation returns a dict with {"value": "[ENCRYPTED]", "is_encrypted": True, ...} for encrypted keys, not the real secret value. If the Ollama auth tokens are stored as encrypted rag_strategy entries, these lookups will hand [ENCRYPTED] placeholders (or dicts) to the HTTP layer, and auth will consistently fail.

Please double-check:

  • How OLLAMA_*_AUTH_TOKEN is stored (encrypted vs plaintext) in archon_settings for the rag_strategy category.
  • Whether there is a backend-only path for retrieving decrypted values (while still redacting them in the UI) and use that instead of the generic get_credentials_by_category.

If tokens are currently encrypted and only exposed through [ENCRYPTED] placeholders, this retrieval logic will need to be adjusted for Ollama auth to actually work.

Also applies to: 433-469


I'll verify the review comment by examining the codebase to understand how OLLAMA auth tokens are retrieved and whether they may be "[ENCRYPTED]" placeholders.


---

🏁 Script executed:

```shell
#!/bin/bash
# Locate key files
fd embedding_service.py
fd credential_service.py
fd llm_provider_service.py

Length of output: 444


🏁 Script executed:

#!/bin/bash
# Examine the embedding_service.py lines around 287-303 and 433-469
cat python/src/server/services/embeddings/embedding_service.py | sed -n '280,310p'
cat python/src/server/services/embeddings/embedding_service.py | sed -n '430,475p'

Length of output: 4073


🏁 Script executed:

#!/bin/bash
# Find how rag_settings are obtained and where OLLAMA_EMBEDDING_AUTH_TOKEN is used
rg -n "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN" --type py -B 2 -A 2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find the credential_service implementation
fd credential_service.py -type f

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Search for get_credentials_by_category and how it returns encrypted values
rg -n "get_credentials_by_category\|ENCRYPTED" --type py -B 2 -A 2

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Examine credential_service.py to see get_credentials_by_category implementation
cat python/src/server/services/credential_service.py

Length of output: 25935


🏁 Script executed:

#!/bin/bash
# Search for OLLAMA token references using simpler patterns
rg "OLLAMA_EMBEDDING_AUTH_TOKEN" --type py -B 3 -A 3
rg "OLLAMA_CHAT_AUTH_TOKEN" --type py -B 3 -A 3

Length of output: 7033


🏁 Script executed:

#!/bin/bash
# Examine llm_provider_service.py to find the get_llm_client function and token usage
cat python/src/server/services/llm_provider_service.py | head -500

Length of output: 20625


🏁 Script executed:

#!/bin/bash
# Search for ENCRYPTED placeholder or decryption logic
rg "ENCRYPTED" --type py -B 2 -A 2
rg "decrypt\|is_encrypted" --type py -B 2 -A 2

Length of output: 1368



---

🏁 Script executed:

```shell
#!/bin/bash
# Check ollama_api.py to see if it has the same issue with auth tokens
cat python/src/server/api_routes/ollama_api.py | grep -n -B 5 -A 5 "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN"

Length of output: 1237


🏁 Script executed:

#!/bin/bash
# Verify if OLLAMA auth tokens are actually stored as encrypted in the database schema or init
rg "OLLAMA_EMBEDDING_AUTH_TOKEN\|OLLAMA_CHAT_AUTH_TOKEN" --type py -B 5 -A 5 | grep -i "encrypt\|set_credential"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check how settings are initialized - are auth tokens marked encrypted?
fd "init\|seed\|migration" --type py | head -20
cat python/src/server/services/settings_service.py 2>/dev/null || echo "File not found"

Length of output: 184


Replace category-based credential retrieval with backend-only decryption for Ollama auth tokens

The identified issue is confirmed: credential_service.get_credentials_by_category("rag_strategy") returns {"value": "[ENCRYPTED]", "is_encrypted": True} for encrypted entries, causing the literal string "[ENCRYPTED]" to be passed as auth tokens instead of actual decrypted values.

This affects multiple locations:

  1. embedding_service.py (line ~453): ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
  2. llm_provider_service.py (lines ~290, ~340): Similar pattern for both OLLAMA_EMBEDDING_AUTH_TOKEN and OLLAMA_CHAT_AUTH_TOKEN
  3. ollama_api.py (lines ~107, ~192): Same problematic pattern

Replace with backend-only credential retrieval:

ollama_auth_token = await credential_service.get_credential("OLLAMA_EMBEDDING_AUTH_TOKEN", default="", decrypt=True)

The get_credential(key, decrypt=True) method already exists in credential_service and properly handles decryption while keeping UI redaction intact.

🤖 Prompt for AI Agents
python/src/server/services/embeddings/embedding_service.py around lines 287-303,
the code currently pulls Ollama auth tokens from settings/credential categories
which can yield the literal "[ENCRYPTED]" string; replace those lookups with a
backend-only decrypted retrieval by calling
credential_service.get_credential("OLLAMA_EMBEDDING_AUTH_TOKEN", default="",
decrypt=True) (awaiting it if async) and use that returned value when
constructing Ollama adapters so the actual decrypted token is passed instead of
the encrypted placeholder.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
python/src/agent_work_orders/utils/state_reconciliation.py (1)

159-161: Fix Windows root directory detection logic.

The check len(path_obj.parts) == 2 and path_obj.parts[1] == "" will never match Windows drive roots. On Windows, Path("C:\\").parts returns ("C:\\",) (1 element, not 2). This allows dangerous paths like "C:" to bypass the root directory safety check.

Apply this diff to correctly detect Windows drive roots:

-                    is_not_root = not (
-                        orphan_path_resolved in ("/", "\\") or
-                        (os.name == "nt" and len(path_obj.parts) == 2 and path_obj.parts[1] == "")
-                    )
+                    is_not_root = not (
+                        orphan_path_resolved in ("/", "\\") or
+                        (os.name == "nt" and len(path_obj.parts) == 1)
+                    )
archon-ui-main/src/services/credentialsService.ts (1)

62-75: OllamaInstance.authToken is never persisted or reloaded, breaking per-instance auth

Verification confirms the issue. In getOllamaInstances (lines 437-485), the credential parsing correctly extracts all fields into instanceMap, but the final instance object construction (lines 477-487) explicitly omits authToken from the returned array. Similarly, in setOllamaInstances (lines 495-543), the fields map excludes authToken entirely, so any token set in the UI is never persisted to credentials storage.

To fix this:

  1. Include authToken in the returned instance object (line ~487):
instances.push({
  id: instance.id,
  name: instance.name,
  baseUrl: instance.baseUrl,
  isEnabled: instance.isEnabled ?? true,
  isPrimary: instance.isPrimary ?? false,
  instanceType: instance.instanceType ?? 'both',
  loadBalancingWeight: instance.loadBalancingWeight ?? 100,
  isHealthy: instance.isHealthy,
  responseTimeMs: instance.responseTimeMs,
  modelsAvailable: instance.modelsAvailable,
  lastHealthCheck: instance.lastHealthCheck,
+ authToken: instance.authToken,
});
  1. Add authToken to the persisted fields in setOllamaInstances (after line 514):
if (instance.lastHealthCheck) {
  fields.lastHealthCheck = instance.lastHealthCheck;
}
+
+if (instance.authToken) {
+  fields.authToken = instance.authToken;
+}

Alternatively, if tokens should live only in global credentials (e.g., OLLAMA_CHAT_AUTH_TOKEN), remove authToken from the OllamaInstance interface entirely to avoid the false expectation of per-instance storage.

archon-ui-main/src/components/settings/RAGSettings.tsx (3)

2346-2354: Sync authentication settings when using same host for both instances.

When the "Use same host for embedding instance" checkbox is enabled, only the url and name fields are synced. If the LLM instance has authentication enabled, the embedding instance should inherit those settings to avoid configuration mismatch.

Apply this diff:

 onChange={(e) => {
   if (e.target.checked) {
     // Sync embedding instance with LLM instance
     setEmbeddingInstanceConfig({
       name: llmInstanceConfig.name || 'Default Ollama',
       url: llmInstanceConfig.url,
-      useAuth: embeddingInstanceConfig.useAuth,
-      authToken: embeddingInstanceConfig.authToken
+      useAuth: llmInstanceConfig.useAuth,
+      authToken: llmInstanceConfig.authToken
     });
   }
 }}

787-828: Clear authentication tokens when deleting instance configurations.

The instance deletion handlers reset the instance config objects but don't remove the corresponding auth token keys from ragSettings. This leaves orphaned authentication data.

Apply these diffs:

 const handleDeleteLLMInstance = () => {
   if (window.confirm('Are you sure you want to delete the current LLM instance configuration?')) {
     // Reset LLM instance configuration
     setLLMInstanceConfig({
       name: '',
-      url: ''
+      url: '',
+      useAuth: false,
+      authToken: ''
     });
     
     // Clear related RAG settings
     const updatedSettings = { ...ragSettings };
     delete updatedSettings.LLM_BASE_URL;
+    delete updatedSettings.LLM_INSTANCE_NAME;
     delete updatedSettings.MODEL_CHOICE;
+    delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN;
     setRagSettings(updatedSettings);

Apply similar changes to handleDeleteEmbeddingInstance (lines 809-828).


1845-1899: Frontend will display "Not selected" for CHAT_MODEL due to field name mismatch.

The concern in the original review is valid. Backend stores the Ollama chat model selection as OLLAMA_CHAT_MODEL in the database, but frontend line 1892 attempts to display ragSettings.CHAT_MODEL, which doesn't exist.

Backend flow: Database stores OLLAMA_CHAT_MODELcredential_service.get_credentials_by_category("rag_strategy") returns raw dict with OLLAMA_CHAT_MODEL key → Frontend receives dict without CHAT_MODEL field.

Fix required: Either update the backend to map OLLAMA_CHAT_MODEL to CHAT_MODEL in the response, or update frontend line 1892 to display ragSettings.OLLAMA_CHAT_MODEL instead.

♻️ Duplicate comments (6)
archon-ui-main/test-results/.last-run.json (1)

1-6: Do not commit .last-run.json; ignore test-results in VCS.

This file is a transient Playwright/Vitest artifact and will churn frequently. Please:

  • Add test-results/ (or at least test-results/.last-run.json) to archon-ui-main/.gitignore.
  • Remove the tracked file via git rm --cached archon-ui-main/test-results/.last-run.json.

This matches the earlier review feedback.

INFRASTRUCTURE.md (1)

108-121: Remove hard-coded Supabase JWT from docs and use a placeholder instead.

The SUPABASE_SERVICE_KEY here is a real JWT with a long expiry. Even if intended for local use, committing it is a credential leak and has already been flagged by gitleaks and a prior review.

Recommend:

-# JWT Service Role Key (generiert mit lokalem JWT_SECRET)
-SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoic2VydmljZV9yb2xlIiwiaXNzIjoic3VwYWJhc2UiLCJpYXQiOjE3NjM2NTg1MDYsImV4cCI6MjA3OTAxODUwNn0.HVH5TgwW70JZtiGdnjU4RGexDDVbGnI3mXt-diQhVy8
+# JWT Service Role Key (mit lokalem JWT_SECRET generieren – nicht committen)
+SUPABASE_SERVICE_KEY=<generate-using-script-below>

…and explicitly instruct readers to run the Python script in this doc to generate their own key locally, add it to their .env, and never commit it.

python/src/server/services/embeddings/embedding_service.py (1)

447-473: Ollama embedding config still fragile and likely passing “[ENCRYPTED]” placeholders instead of real auth tokens

This block reintroduces both issues previously flagged in earlier reviews:

rag_settings = await _maybe_await(
    credential_service.get_credentials_by_category("rag_strategy")
)
...
ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")
...
adapter = _get_embedding_adapter(
    embedding_provider,
    client,
    base_url=ollama_base_url,
    auth_token=ollama_auth_token,
    api_mode=ollama_api_mode,
)
  1. .rstrip on possibly non‑string values: get_credentials_by_category("rag_strategy") returns objects like {"value": "[ENCRYPTED]", "is_encrypted": True, ...} for encrypted rows. If OLLAMA_EMBEDDING_URL or related keys are missing or malformed, you can end up calling .rstrip on None or on a dict, which will raise and force the generic “Failed to load embedding settings” path even though this is just misconfiguration. Guard and normalize the raw value before stripping.

  2. Auth token is almost certainly the [ENCRYPTED] placeholder, not the decrypted secret: Because get_credentials_by_category intentionally returns masked values for encrypted entries, OLLAMA_EMBEDDING_AUTH_TOKEN here will be either a placeholder dict or the literal string "[ENCRYPTED]". That value is then used as the Bearer token in NativeOllamaEmbeddingAdapter, so auth will consistently fail even when a valid token is stored.

To address both, you can:

  • Only compute Ollama‑specific settings when embedding_provider.lower() == "ollama".
  • Pull the real token via the backend‑only decryption API instead of from the category map.
  • Normalize the base URL defensively.

For example:

-            rag_settings = await _maybe_await(
-                credential_service.get_credentials_by_category("rag_strategy")
-            )
-            batch_size = int(rag_settings.get("EMBEDDING_BATCH_SIZE", "100"))
-            embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536"))
-
-            # For Ollama, get native API URL, auth token, and API mode
-            ollama_base_url = rag_settings.get("OLLAMA_EMBEDDING_URL", "").rstrip("/v1").rstrip("/")
-            ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
-            ollama_api_mode = rag_settings.get("OLLAMA_API_MODE", "native")
+            rag_settings = await _maybe_await(
+                credential_service.get_credentials_by_category("rag_strategy")
+            )
+            batch_size = int(rag_settings.get("EMBEDDING_BATCH_SIZE", "100"))
+            embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536"))
+
+            ollama_base_url = ""
+            ollama_auth_token = ""
+            ollama_api_mode = "native"
+
+            if (embedding_provider or "").lower() == "ollama":
+                raw_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip()
+                if isinstance(raw_url, str) and raw_url:
+                    ollama_base_url = raw_url.rstrip("/v1").rstrip("/")
+
+                # Use backend-only decrypted credential for the token
+                try:
+                    ollama_auth_token = await credential_service.get_credential(
+                        "OLLAMA_EMBEDDING_AUTH_TOKEN",
+                        default="",
+                        decrypt=True,
+                    )
+                except TypeError:
+                    # Fallback if get_credential signature differs; keep empty token
+                    pass
+
+                mode_raw = rag_settings.get("OLLAMA_API_MODE") or "native"
+                if isinstance(mode_raw, str):
+                    ollama_api_mode = mode_raw.strip().lower()

and keep the default values in the except block as you already do.

This avoids noisy errors from benign misconfig, and—more importantly—ensures the adapter receives a real decrypted Bearer token instead of masked placeholders.

python/src/server/api_routes/ollama_api.py (2)

99-131: [Duplicate] Normalize instance URLs consistently when mapping Ollama auth tokens

This issue has already been flagged in a previous review. The configured base URLs strip /v1 and then trailing /, but incoming instance URLs only strip trailing /. When callers pass URLs with /v1 (common from UI), the equality checks fail and no auth token is attached, causing 401 errors.


180-187: [Duplicate] Do not log full rag_settings – it may contain secrets

This issue has already been flagged in a previous review. Logging the entire rag_settings dictionary exposes sensitive data including Ollama auth tokens and other credentials. This is a security/compliance risk.

PLAN.md (1)

253-261: [Duplicate] Remove real auth tokens from documentation and rotate them

This critical issue has already been flagged in a previous review. The DB snapshot contains what appears to be a real Ollama auth token (ollama_13107e338aa16a6a8295592ce050f6cb) that should be immediately rotated and replaced with a placeholder.

🧹 Nitpick comments (12)
python/src/agent_work_orders/utils/state_reconciliation.py (1)

184-186: Catch specific exception types where possible.

Catching bare Exception during file operations masks specific errors. Consider catching OSError and its subclasses (e.g., PermissionError, FileNotFoundError) for more precise diagnostics, while allowing unexpected errors to propagate.

-            except Exception as e:
+            except OSError as e:

This improves observability by distinguishing expected file-system errors from unexpected runtime failures, making it easier to diagnose issues during reconciliation.

archon-ui-main/package.json (1)

64-67: Playwright devDependency looks good; consider adding an e2e script.

Adding @playwright/test as a devDependency is appropriate for the new e2e tests. Optionally, add a "test:e2e": "playwright test" script so CI and devs have a clear entrypoint for those tests.

QUICKSTART.md (1)

7-19: Replace machine-specific absolute paths with repo-relative or placeholder paths.

The commands hard-code /Volumes/DATEN/Coding/archon/..., which is specific to one machine and will not work for most contributors. Consider using repo-relative paths or a placeholder like <ARCHON_REPO_ROOT> instead, for example:

- cd /Volumes/DATEN/Coding/archon/supabase
+ cd path/to/archon/supabase

- cd /Volumes/DATEN/Coding/archon
+ cd path/to/archon

This keeps the quick-start copy‑pasteable for other environments.

Also applies to: 78-89, 125-128

INFRASTRUCTURE.md (1)

78-89: Generalize hard-coded local filesystem paths for broader reuse.

Multiple snippets use absolute paths like /Volumes/DATEN/Coding/archon/..., which are specific to one machine. To make this infra guide reusable, consider switching to repo-relative or placeholder paths (e.g., cd path/to/archon, cd path/to/archon/supabase) and clarifying that users should adapt them to their environment.

Also applies to: 125-128, 189-191, 345-347, 455-459, 465-472

python/src/server/services/llm_provider_service.py (1)

415-431: Ollama auth-token selection looks correct; consider DRYing it and tightening logging.

The new logic to:

  • Fetch rag_strategy settings,
  • Select OLLAMA_EMBEDDING_AUTH_TOKEN vs OLLAMA_CHAT_AUTH_TOKEN, and
  • Fall back to "required-but-ignored" when unset,

is consistent between the OpenAI→Ollama fallback and the direct Ollama path, and avoids logging raw tokens (only lengths), which is good.

Two small follow-ups you might consider:

  1. Factor out token resolution
    Extract a helper like _get_ollama_auth_token(use_embedding: bool, instance_type: str | None) that encapsulates:

    • Cache lookup / fetch of rag_strategy,
    • Key selection (EMBEDDING vs CHAT),
    • Defaulting + warning.

    That removes duplication between the fallback branch and the main provider_name == "ollama" branch.

  2. Sanitize or downgrade embedding URL logging
    logger.info(f"Embedding URL from settings: {embedding_url}") and logger.info(f"Resolved embedding URL: {final_url}") will log whatever is configured, including potential credentials if someone encodes them in the URL. Using _sanitize_for_log(...) or moving these to debug would reduce the chance of leaking sensitive connection details into logs.

These are polish items; core behavior and error handling look solid.

Also applies to: 455-475, 618-622

python/src/server/services/embeddings/embedding_service.py (1)

107-172: NativeOllamaEmbeddingAdapter behavior is reasonable, but consider concurrency and dimensions semantics

The adapter’s per-text asyncio.gather over the batch and strict validation of the embedding field look good. Two small points to keep in mind:

  • dimensions is currently ignored; that’s fine given Ollama’s fixed-dimension embeddings, but it may be worth a brief comment so future readers don’t assume down‑projection happens here.
  • With default EMBEDDING_BATCH_SIZE you’ll fire up to that many concurrent /api/embeddings calls per batch. If users push batch size much higher, you might want an internal concurrency cap or a per‑adapter semaphore to avoid overloading smaller Ollama instances.
python/src/server/services/ollama/model_discovery_service.py (1)

122-135: Auth token support is good for /api/tags, but capability/detail calls still ignore it

The new auth_token plumbing for discover_models, check_instance_health, and discover_models_from_multiple_instances looks correct: headers are built as:

headers = {}
if (auth_token):
    headers["Authorization"] = f"Bearer {auth_token}"

and passed into client.get(...) for /api/tags.

Two follow‑ups to consider:

  1. Capability and details endpoints still unauthenticated. Methods like _test_embedding_capability_fast, _test_embedding_capability, _get_model_details, and the various chat/structured‑output tests (_test_chat_capability_fast, _test_chat_capability, etc.) still hit /api/embeddings, /api/show, and chat completions without any Authorization header. On an Ollama instance that enforces auth globally, model listing and health will now work, but enrichment and capability detection will likely return 401s and silently fall back to defaults.

    If you intend “protected instance” support end‑to‑end, you probably want to thread auth_token through _enrich_model_capabilities and the downstream helpers (or at least _get_model_details and both embedding tests) and attach the same header there.

  2. Health cache is keyed only by URL. check_instance_health caches by health_{instance_url} regardless of token. If a user fixes a bad token and immediately re‑checks health for the same URL, they can get a stale cached error for up to 30s. Not critical, but you might want to either skip caching when a request fails with 401, or include the token (or a hash of it) in the cache key.

Both are incremental and keep backward compatibility, but they’ll make the new auth support behave more predictably on locked‑down Ollama deployments.

Also applies to: 192-199, 235-243, 964-999, 1021-1024, 1026-1060

archon-ui-main/test-results/tests-e2e-ollama-api-mode--98290-t-to-Native-Ollama-API-mode/error-context.md (1)

1-638: Large UI snapshot is fine, but confirm it’s meant to live in version control

This file is purely a captured UI tree for an e2e failure context and doesn’t affect runtime behavior. If it’s generated by tests, consider whether it should be checked in or ignored via .gitignore to avoid churn; otherwise, it’s fine as documentation of the failing state.

python/src/server/api_routes/ollama_api.py (1)

204-215: Consider reducing log verbosity for token selection.

The health check logs token selection decisions at info level for every instance check. This creates verbose logs in production. Consider moving lines 210, 213, and 215 to debug level to reduce noise while keeping the information available for troubleshooting.

Apply this diff to reduce log verbosity:

                 if url == llm_base_url and chat_auth_token:
                     auth_token = chat_auth_token
-                    logger.info(f"Using chat auth token for {url}")
+                    logger.debug(f"Using chat auth token for {url}")
                 elif url == embedding_base_url and embedding_auth_token:
                     auth_token = embedding_auth_token
-                    logger.info(f"Using embedding auth token for {url}")
+                    logger.debug(f"Using embedding auth token for {url}")
                 else:
-                    logger.warning(f"No matching auth token found for {url}")
+                    logger.debug(f"No auth token configured for {url}")
archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (1)

1-192: Replace hardcoded timeouts with Playwright's auto-waiting mechanisms.

The tests use page.waitForTimeout() extensively (30+ occurrences). Playwright's auto-waiting is more reliable and faster. Most timeouts can be removed since Playwright automatically waits for elements before interacting.

Example refactor for lines 9-26:

 test('should display API mode radio buttons when Ollama is selected', async ({ page }) => {
   // Navigate to RAG Settings tab
   await page.click('text=RAG Settings');
-  await page.waitForTimeout(500);

   // Select Ollama as embedding provider
   await page.click('text=Embedding');
-  await page.waitForTimeout(300);

   // Click on Ollama provider card
   const ollamaCard = page.locator('button:has-text("Ollama")').first();
   await ollamaCard.click();
-  await page.waitForTimeout(500);

   // Open Ollama configuration
   const configButton = page.locator('button:has-text("Config")').first();
   await configButton.click();
-  await page.waitForTimeout(500);

   // Verify API mode section is visible (auto-waits)
   await expect(page.locator('text=Ollama API Mode')).toBeVisible();

Playwright's click() and expect().toBeVisible() already include auto-waiting, making explicit timeouts unnecessary.

archon-ui-main/src/components/settings/RAGSettings.tsx (2)

1595-1652: Consider clarifying that API mode is a global setting.

The Ollama API Mode selection appears within the context-aware configuration panel (LLM or Embedding), but it's actually a global setting that affects both. Users might expect it to be per-instance.

Consider adding a helper text indicating this applies to all Ollama instances:

 <div className="mb-4 pb-4 border-b border-green-500/20">
   <label className="block text-sm font-medium text-gray-300 mb-3">
     Ollama API Mode
   </label>
+  <p className="text-xs text-gray-400 mb-2">
+    This setting applies to all Ollama instances (both LLM and Embedding)
+  </p>
   <div className="flex gap-3">

2363-2387: Extract authentication UI into a reusable component.

The authentication sections in both LLM and Embedding modals are nearly identical (lines 2363-2387 and 2454-2477), violating the DRY principle.

Consider creating a reusable component:

interface AuthenticationSectionProps {
  id: string;
  useAuth: boolean;
  authToken: string;
  onUseAuthChange: (checked: boolean) => void;
  onTokenChange: (token: string) => void;
  accentColor?: 'green' | 'purple';
}

const AuthenticationSection: React.FC<AuthenticationSectionProps> = ({
  id,
  useAuth,
  authToken,
  onUseAuthChange,
  onTokenChange,
  accentColor = 'green'
}) => (
  <div className="space-y-2 mt-4">
    <div className="flex items-center gap-2">
      <input
        type="checkbox"
        id={id}
        checked={useAuth}
        onChange={(e) => onUseAuthChange(e.target.checked)}
        className={`w-4 h-4 text-${accentColor}-600 ...`}
      />
      <label htmlFor={id} className="text-sm font-medium text-gray-700 dark:text-gray-300">
        Use Authentication
      </label>
    </div>
    {useAuth && (
      <Input
        type="password"
        placeholder="Auth Token"
        value={authToken}
        onChange={(e) => onTokenChange(e.target.value)}
        className="text-sm"
      />
    )}
  </div>
);

Then use it in both modals to reduce duplication.

Also applies to: 2454-2477

interface RAGSettingsProps {
ragSettings: {
MODEL_CHOICE: string;
CHAT_MODEL?: string;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing auth token fields to the RAGSettingsProps interface.

The code accesses OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN via type assertion (ragSettings as any), which bypasses TypeScript's type checking and violates the strict mode requirement.

Apply this diff to add the missing fields to the interface:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
     USE_CONTEXTUAL_EMBEDDINGS: boolean;
     CONTEXTUAL_EMBEDDINGS_MAX_WORKERS: number;
     USE_HYBRID_SEARCH: boolean;
     USE_AGENTIC_RAG: boolean;
     USE_RERANKING: boolean;
     LLM_PROVIDER?: string;
     LLM_BASE_URL?: string;
     LLM_INSTANCE_NAME?: string;
     EMBEDDING_MODEL?: string;
     EMBEDDING_PROVIDER?: string;
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
     // Crawling Performance Settings
     CRAWL_BATCH_SIZE?: number;
     // ... rest of fields
   };
   setRagSettings: (settings: any) => void;
 }

Then remove the type assertions on lines 230, 248, 253, and 271:

-const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || '';
+const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || '';

As per coding guidelines.

Also applies to: 230-230, 248-248, 253-253, 271-271

🤖 Prompt for AI Agents
In archon-ui-main/src/components/settings/RAGSettings.tsx around line 144, the
RAGSettingsProps interface is missing OLLAMA_CHAT_AUTH_TOKEN and
OLLAMA_EMBEDDING_AUTH_TOKEN fields; add them (both optional strings) to the
interface definition, then remove the unsafe (ragSettings as any) type
assertions at lines 230, 248, 253 and 271 and instead access those newly typed
properties directly from ragSettings so TypeScript strict mode is satisfied.

Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx
Comment thread python/src/agent_work_orders/utils/state_reconciliation.py Outdated
Comment thread python/src/server/services/embeddings/embedding_service.py
Security fixes:
- Remove hardcoded JWT token from INFRASTRUCTURE.md (replace with placeholder)
- Remove real Ollama auth tokens from PLAN.md (replace with placeholders)
- Fix secrets leak: Change RAG settings logging from info to debug level, log only keys

Functional fixes:
- Fix URL normalization for auth-token mapping (both incoming and configured URLs now use same normalization)

Best practices:
- Add test-results/ to .gitignore
- Remove test-results from version control

All 22 Python tests pass, server healthy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@Milofax Milofax force-pushed the feature/ollama-support branch from f668f36 to 80ca26d Compare November 21, 2025 19:26
@Milofax

Milofax commented Nov 21, 2025

Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Nov 21, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Milofax and others added 2 commits November 23, 2025 04:20
PR coleam00#875: Ollama Support with Local Model Discovery
- Security fixes: JWT tokens, auth tokens, RAG settings logging
- Bug fixes: URL normalization, None-handling
- Best practices: test-results gitignore

All 7 items marked as ERLEDIGT (2025-11-21)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
PRPs/todo.md (2)

1-107: Clarify document purpose and placement.

This appears to be a retrospective CodeRabbit review tracking document. Please clarify: is this intended to be a committed artifact in the repository, or should it be maintained elsewhere (e.g., PR description, internal wiki)? If it should remain in the repo, add a brief header explaining its purpose and intended audience.


99-99: Use a heading instead of emphasis.

Line 99 uses bold text (**Gesamt: 7 actionable Items**) where a markdown heading would be more appropriate for document structure.

- **Gesamt: 7 actionable Items**
+ ## Gesamt: 7 actionable Items
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80ca26d and d7b5ea6.

📒 Files selected for processing (1)
  • PRPs/todo.md (1 hunks)
🧰 Additional context used
🪛 Gitleaks (8.29.0)
PRPs/todo.md

[high] 21-21: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 22-22: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 LanguageTool
PRPs/todo.md

[grammar] ~1-~1: Ersetze das Satzzeichen
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review ...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~1-~1: Hier könnte ein Fehler sein.
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review vom 21. Nove...

(QB_NEW_DE)


[grammar] ~3-~3: Ergänze ein Wort
Context: # CodeRabbit Review ToDos - PR #875 Quelle: CodeRabbit Review vom 21. November 20...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)


[grammar] ~4-~4: Hier könnte ein Fehler sein.
Context: ...ith Local Model Discovery and Embeddings --- ## KRITISCH (Sicherheit) ### 1. JWT Token ...

(QB_NEW_DE)


[grammar] ~8-~8: Hier könnte ein Fehler sein.
Context: ...mbeddings --- ## KRITISCH (Sicherheit) ### 1. JWT Token aus INFRASTRUCTURE.md entfe...

(QB_NEW_DE)


[grammar] ~10-~10: Ergänze ein Satzzeichen
Context: ...-- ## KRITISCH (Sicherheit) ### 1. JWT Token aus INFRASTRUCTURE.md entfernen - ...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_JWTDASHTOKEN)


[grammar] ~12-~12: Korrigiere das Wort
Context: ...RUCTURE.md` (Zeile ~113) - Problem: Hardcoded JWT Token mit 10-Jahres-Gültigkeit comm...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_SPELLING)


[grammar] ~12-~12: Korrigiere das Wort
Context: ...Zeile ~113) - Problem: Hardcoded JWT Token mit 10-Jahres-Gültigkeit committed...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_SPELLING)


[grammar] ~14-~14: Ersetze das Satzzeichen
Context: ...s auf Token-Generierungs-Script (Zeilen 142-157) hinzufügen - Status: [x] ERLEDIGT ...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~15-~15: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 2. Auth Tokens aus PLAN.md entfernen - *...

(QB_NEW_DE)


[grammar] ~17-~17: Ergänze ein Satzzeichen
Context: ...* [x] ERLEDIGT (2025-11-21) ### 2. Auth Tokens aus PLAN.md entfernen - **Datei:*...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_AUTHDASHTOKENS)


[grammar] ~17-~17: Hier könnte ein Fehler sein.
Context: ...### 2. Auth Tokens aus PLAN.md entfernen - Datei: PLAN.md (Zeilen 111-261) - **Proble...

(QB_NEW_DE)


[grammar] ~18-~18: Ergänze ein Leerzeichen
Context: ...okens aus PLAN.md entfernen - Datei: PLAN.md (Zeilen 111-261) - **Problem:*...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~18-~18: Ersetze das Satzzeichen
Context: ...ntfernen - Datei: PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Token...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~18-~18: Hier könnte ein Fehler sein.
Context: ... - Datei: PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Tokens und Producti...

(QB_NEW_DE)


[grammar] ~19-~19: Passe das Symbol an
Context: ... PLAN.md (Zeilen 111-261) - Problem: Echte Ollama Auth-Tokens und Production...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)


[grammar] ~20-~20: Hier könnte ein Fehler sein.
Context: ...URL im DB-Snapshot - Tokens betroffen: - OLLAMA_CHAT_AUTH_TOKEN: ollama_13107e338aa16a6a8295592ce050f6cb - OLLAMA_EMBEDDING_AUTH_TOKEN: ollama_13107e338aa16a6a8295592ce050f6cb - Lösung: - Tokens durch Placeholder e...

(QB_NEW_DE)


[grammar] ~26-~26: Hier könnte ein Fehler sein.
Context: ...Server-seitig Tokens rotieren/revoken! - Status: [x] ERLEDIGT (2025-11-21) ###...

(QB_NEW_DE)


[grammar] ~27-~27: Ergänze ein Leerzeichen
Context: ...Tokens rotieren/revoken!** - Status: [x] ERLEDIGT (2025-11-21) ### 3. RAG Se...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~27-~27: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secre...

(QB_NEW_DE)


[grammar] ~29-~29: Ergänze ein Satzzeichen
Context: ...** [x] ERLEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secrets Leak...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_RAGDASHSETTINGSDASHLOGGING)


[grammar] ~29-~29: Ergänze ein Satzzeichen
Context: ...LEDIGT (2025-11-21) ### 3. RAG Settings Logging entfernen (Secrets Leak) - **Dat...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_RAGDASHSETTINGSDASHLOGGING)


[grammar] ~29-~29: Hier könnte ein Fehler sein.
Context: ...ettings Logging entfernen (Secrets Leak) - Datei: `python/src/server/api_routes/ollama_...

(QB_NEW_DE)


[grammar] ~30-~30: Ergänze ein Leerzeichen
Context: ...ng entfernen (Secrets Leak) - Datei: `python/src/server/api_routes/ollama_api...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~30-~30: Ersetze das Satzzeichen
Context: ...erver/api_routes/ollama_api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG setti...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~30-~30: Hier könnte ein Fehler sein.
Context: ...i_routes/ollama_api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_set...

(QB_NEW_DE)


[grammar] ~31-~31: Ergänze ein Leerzeichen
Context: ..._api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_settin...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[typographical] ~31-~31: Zeichen ohne sein Gegenstück: ‚"‘ scheint zu fehlen
Context: ...api.py(Zeilen 180-187) - **Problem:**logger.info(f"RAG settings: {rag_settings}")` logged a...

(DE_UNPAIRED_QUOTES)


[grammar] ~31-~31: Wähle ein passenderes Wort
Context: ...settings: {rag_settings}")` logged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Key...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_NOUN)


[grammar] ~31-~31: Ergänze ein Wort
Context: ...}")` logged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBU...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)


[grammar] ~31-~31: Hier könnte ein Fehler sein.
Context: ...gged alle RAG settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBUG Level: `...

(QB_NEW_DE)


[grammar] ~32-~32: Ergänze ein Leerzeichen
Context: ...settings inkl. Auth-Tokens - Lösung: Nur Keys loggen, auf DEBUG Level: ```p...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~32-~32: Korrigiere die Fehler
Context: ...- Lösung: Nur Keys loggen, auf DEBUG Level: ```python logger.debug(f"RAG ...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~32-~32: Hier könnte ein Fehler sein.
Context: ...ung:** Nur Keys loggen, auf DEBUG Level: python logger.debug(f"RAG settings keys: {list(rag_settings.keys())}") - Status: [x] ERLEDIGT (2025-11-21) ---...

(QB_NEW_DE)


[grammar] ~36-~36: Ergänze ein Leerzeichen
Context: ..._settings.keys())}") ``` - Status: [x] ERLEDIGT (2025-11-21) --- ## MAJOR...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~36-~36: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) --- ## MAJOR (Funktionale Probleme) ### 4. URL...

(QB_NEW_DE)


[grammar] ~40-~40: Passe die Groß- und Kleinschreibung an
Context: ...] ERLEDIGT (2025-11-21) --- ## MAJOR (Funktionale Probleme) ### 4. URL-Normalisierung fü...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_ORTHOGRAPHY_UPPERCASE)


[grammar] ~40-~40: Hier könnte ein Fehler sein.
Context: ...1) --- ## MAJOR (Funktionale Probleme) ### 4. URL-Normalisierung für Auth-Token Map...

(QB_NEW_DE)


[grammar] ~42-~42: Entferne ein Leerzeichen
Context: ...### 4. URL-Normalisierung für Auth-Token Mapping fixen - Datei: `python/src/s...

(QB_NEW_DE_OTHER_ERROR_IDS_UNNECESSARY_ORTHOGRAPHY_SPACE)


[grammar] ~43-~43: Hier könnte ein Fehler sein.
Context: ...ierung für Auth-Token Mapping fixen - Datei: `python/src/server/api_routes/ollama_...

(QB_NEW_DE)


[grammar] ~43-~43: Ergänze ein Leerzeichen
Context: ...ür Auth-Token Mapping fixen - Datei: `python/src/server/api_routes/ollama_api...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~43-~43: Ergänze ein Wort
Context: ...r Auth-Token Mapping fixen - Datei: python/src/server/api_routes/ollama_api.py (Zeilen 99-109, 119-131, 137-142, 188-1...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)


[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...erver/api_routes/ollama_api.py` (Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - ...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...i_routes/ollama_api.py` (Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...ollama_api.py(Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem:**/v1`...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...i.py(Zeilen 99-109, 119-131, 137-142, 188-194, 200-217) - **Problem:**/v1` wird von...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~43-~43: Ersetze das Satzzeichen
Context: ...ilen 99-109, 119-131, 137-142, 188-194, 200-217) - Problem: /v1 wird von konfigur...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~44-~44: Hier könnte ein Fehler sein.
Context: ...119-131, 137-142, 188-194, 200-217) - Problem: /v1 wird von konfigurierten URLs en...

(QB_NEW_DE)


[grammar] ~44-~44: Ergänze ein Leerzeichen
Context: ...37-142, 188-194, 200-217) - Problem: /v1 wird von konfigurierten URLs entfe...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~44-~44: Korrigiere die Fehler
Context: ...n URLs → Token-Lookup schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_MULTITOKEN)


[grammar] ~45-~45: Hier könnte ein Fehler sein.
Context: ...en-Lookup schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich normalisieren: ...

(QB_NEW_DE)


[grammar] ~45-~45: Ergänze ein Leerzeichen
Context: ... schlägt fehl → 401 Fehler - Lösung: Beide Seiten gleich normalisieren: ```...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~45-~45: Hier könnte ein Fehler sein.
Context: ...ng:** Beide Seiten gleich normalisieren: python normalized_url = url.replace("/v1", "").rstrip("/") - Status: [x] ERLEDIGT (2025-11-21) ###...

(QB_NEW_DE)


[grammar] ~49-~49: Ergänze ein Leerzeichen
Context: ...v1", "").rstrip("/") ``` - Status: [x] ERLEDIGT (2025-11-21) ### 5. Encryp...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~49-~49: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) ### 5. Encrypted Token Placeholder Problem -...

(QB_NEW_DE)


[grammar] ~57-~57: Hier könnte ein Fehler sein.
Context: ...ng:** Backend-only Decryption verwenden: python ollama_auth_token = await credential_service.get_credential( "OLLAMA_EMBEDDING_AUTH_TOKEN", default="", decrypt=True ) - Status: [x] ERLEDIGT (2025-11-21) - Au...

(QB_NEW_DE)


[grammar] ~63-~63: Ergänze ein Leerzeichen
Context: ..."", decrypt=True ) ``` - Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens ...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~63-~63: Hier könnte ein Fehler sein.
Context: ...``` - Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens sind nicht verschlüsselt,...

(QB_NEW_DE)


[grammar] ~63-~63: Passe das Symbol an
Context: ...- Status: [x] ERLEDIGT (2025-11-21) - Auth-Tokens sind nicht verschlüsselt, d...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)


[grammar] ~63-~63: Hier könnte ein Fehler sein.
Context: ...espeichert werden. Funktioniert korrekt. ### 6. OLLAMA_EMBEDDING_URL None-Handling - ...

(QB_NEW_DE)


[grammar] ~65-~65: Hier könnte ein Fehler sein.
Context: ...## 6. OLLAMA_EMBEDDING_URL None-Handling - Datei: `python/src/server/services/embedding...

(QB_NEW_DE)


[grammar] ~66-~66: Ergänze ein Leerzeichen
Context: ...EMBEDDING_URL None-Handling - Datei: `python/src/server/services/embeddings/e...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~66-~66: Ergänze ein Wort
Context: ...MBEDDING_URL None-Handling - Datei: python/src/server/services/embeddings/embedding_service.py (Zeilen 287-303, 433-469) - **Problem:*...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)


[grammar] ~66-~66: Ersetze das Satzzeichen
Context: ...mbeddings/embedding_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` au...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~66-~66: Ersetze das Satzzeichen
Context: .../embedding_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` auf None-We...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~66-~66: Hier könnte ein Fehler sein.
Context: ...ng_service.py(Zeilen 287-303, 433-469) - **Problem:**.rstrip()` auf None-Wert wirft Excep...

(QB_NEW_DE)


[grammar] ~67-~67: Ergänze ein Leerzeichen
Context: ...(Zeilen 287-303, 433-469) - Problem: .rstrip() auf None-Wert wirft Exceptio...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~67-~67: Ergänze ein Wort
Context: ...blem:** .rstrip() auf None-Wert wirft Exception - Lösung: ```python ollama_base...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_NOUN)


[grammar] ~68-~68: Passe das Symbol an
Context: ...ip()` auf None-Wert wirft Exception - Lösung: python ollama_base_url_raw = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").strip() ollama_base_url = ollama_base_url_raw.rstrip("/v1").rstrip("/") if ollama_base_url_raw else "" ollama_api_mode = (rag_settings.get("OLLAMA_API_MODE") or "native").lower() - Zusatz: Ollama-spezifische Werte nur b...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_OTHER)


[grammar] ~74-~74: Hier könnte ein Fehler sein.
Context: ...I_MODE") or "native").lower() ``` - Zusatz: Ollama-spezifische Werte nur berechne...

(QB_NEW_DE)


[grammar] ~74-~74: Ergänze ein Leerzeichen
Context: ...or "native").lower() ``` - Zusatz: Ollama-spezifische Werte nur berechnen w...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~74-~74: Ergänze ein Satzzeichen
Context: ...Zusatz:* Ollama-spezifische Werte nur berechnen wenn `embedding_provider.lower() == "ol...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_COMMA)


[grammar] ~75-~75: Ersetze das Satzzeichen
Context: ...- Status: [x] ERLEDIGT (2025-11-21) - Code verwendet bereits (... or "") Pa...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_PUNCTUATION_DASH_–)


[grammar] ~75-~75: Ergänze ein Satzzeichen
Context: ...) - Code verwendet bereits (... or "") Pattern für None-Safety --- ## MINOR (...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_DASHPATTERN)


[grammar] ~75-~75: Hier könnte ein Fehler sein.
Context: ...ts (... or "") Pattern für None-Safety --- ## MINOR (Best Practices) ### 7. Test-Resu...

(QB_NEW_DE)


[grammar] ~79-~79: Hier könnte ein Fehler sein.
Context: ...e-Safety --- ## MINOR (Best Practices) ### 7. Test-Results aus Version Control entf...

(QB_NEW_DE)


[grammar] ~85-~85: Hier könnte ein Fehler sein.
Context: ...ung:** 1. Zu .gitignore hinzufügen: test-results/ 2. git rm --cached archon-ui-main/test-results/ - Status: [x] ERLEDIGT (2025-11-21) ---...

(QB_NEW_DE)


[grammar] ~87-~87: Ergänze ein Leerzeichen
Context: ...hon-ui-main/test-results/` - Status: [x] ERLEDIGT (2025-11-21) --- ## Zusam...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_ORTHOGRAPHY_SPACE)


[grammar] ~87-~87: Hier könnte ein Fehler sein.
Context: ... - Status: [x] ERLEDIGT (2025-11-21) --- ## Zusammenfassung | Priorität | Anzahl | ...

(QB_NEW_DE)


[grammar] ~91-~91: Hier könnte ein Fehler sein.
Context: ...GT (2025-11-21) --- ## Zusammenfassung | Priorität | Anzahl | Beschreibung | |-...

(QB_NEW_DE)


[grammar] ~97-~97: Hier könnte ein Fehler sein.
Context: ... | 1 | Best Practices (gitignore) | Gesamt: 7 actionable Items --- ## No...

(QB_NEW_DE)


[grammar] ~99-~99: Hier könnte ein Fehler sein.
Context: ...tignore) | Gesamt: 7 actionable Items --- ## Notizen - Die Grammar/Language-Tool War...

(QB_NEW_DE)


[grammar] ~103-~103: Hier könnte ein Fehler sein.
Context: ...t: 7 actionable Items** --- ## Notizen - Die Grammar/Language-Tool Warnungen (deu...

(QB_NEW_DE)


[grammar] ~105-~105: Entferne ein Leerzeichen
Context: ... ## Notizen - Die Grammar/Language-Tool Warnungen (deutsche Rechtschreibung in P...

(QB_NEW_DE_OTHER_ERROR_IDS_UNNECESSARY_ORTHOGRAPHY_SPACE)


[grammar] ~105-~105: Füge ein passenderes Wort ein
Context: ...n (deutsche Rechtschreibung in PLAN.md) sind niedrige Priorität - Nach Behebung der ...

(QB_NEW_DE_OTHER_ERROR_IDS_REPLACEMENT_VERB)


[grammar] ~105-~105: Ergänze ein Satzzeichen
Context: ...chtschreibung in PLAN.md) sind niedrige Priorität - Nach Behebung der kritischen Issues: ...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_PERIOD)


[grammar] ~106-~106: Ergänze ein Satzzeichen
Context: ...: Server-seitig alle exponierten Tokens rotieren - Repo-weite Suche nach ähnlichen Token...

(QB_NEW_DE_OTHER_ERROR_IDS_MISSING_PUNCTUATION_PERIOD)


[grammar] ~107-~107: Hier könnte ein Fehler sein.
Context: ...ch ähnlichen Token/URL-Leaks durchführen

(QB_NEW_DE)

🪛 markdownlint-cli2 (0.18.1)
PRPs/todo.md

99-99: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (1)
PRPs/todo.md (1)

21-22: No security risk—this document is tracking completed remediation work, not storing active credentials.

The token values shown on lines 21–22 are historical documentation of a security issue that has already been fixed. Verification confirms:

  • The token does not appear anywhere else in the repository
  • PLAN.md (the file where tokens were originally exposed) has already been sanitized with ollama_xxx_placeholder_token placeholders
  • Git history shows the fix was completed: "fix(security): Address CodeRabbit review findings"
  • PRPs/todo.md is a German-language retrospective tracking document describing problems found and their solutions

This file documents that the issue was remediated server-side (as noted: "Server-seitig Tokens rotieren/revoken!"), not that secrets are currently exposed.

Likely an incorrect or invalid review comment.

@Milofax

Milofax commented Nov 23, 2025 via email

Copy link
Copy Markdown
Author

@Wirasm Wirasm requested a review from tazmon95 November 24, 2025 08:34
@Wirasm

Wirasm commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

#870

Milofax and others added 3 commits December 1, 2025 02:47
- Add auth_token parameter to validate_provider_instance() function
- Load auth tokens from RAG settings in /validate endpoint
- Match incoming URLs against configured LLM_BASE_URL and OLLAMA_EMBEDDING_URL
- Apply correct auth token (chat or embedding) based on URL matching
- Add tests for auth token flow through all layers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
When "Use same host for embedding instance" is checked, now also copies
useAuth and authToken from LLM config to embedding config, not just the
URL and name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract syncEmbeddingFromLLM() into testable utility function
- Add 6 tests covering: full sync, default name, custom default,
  auth settings copy, empty token handling, immutability
- Refactor RAGSettings to use the utility function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
python/src/server/api_routes/ollama_api.py (3)

254-256: Bug: timestamp returns module name instead of actual timestamp.

model_discovery_service.check_instance_health.__module__ returns the module path string (e.g., "src.server.services.ollama.model_discovery_service"), not a timestamp.

-            "timestamp": model_discovery_service.check_instance_health.__module__  # Use current timestamp
+            "timestamp": datetime.now().isoformat()

510-517: Missing auth token support in discover_and_store_models_endpoint.

Unlike discover_models_endpoint and validate_instance_endpoint, this endpoint doesn't extract auth tokens from RAG settings. Calls to discover_models() at line 517 will fail on protected Ollama instances.

Consider adding auth token extraction similar to other endpoints:

# Get auth tokens from RAG settings
from ..services.credential_service import credential_service
rag_settings = await credential_service.get_credentials_by_category("rag_strategy")

llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/")
embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")
chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or ""
embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or ""

# Then pass auth_token to discover_models based on URL matching

1056-1065: Missing auth token support in discover_models_with_real_details.

The direct httpx.AsyncClient request to /api/tags doesn't include Authorization headers. This will fail on protected Ollama instances.

+                # Get auth token for this instance
+                normalized_url = base_url.replace("/v1", "").rstrip("/")
+                auth_token = None
+                if normalized_url == llm_base_url and chat_auth_token:
+                    auth_token = chat_auth_token
+                elif normalized_url == embedding_base_url and embedding_auth_token:
+                    auth_token = embedding_auth_token
+                
+                headers = {}
+                if auth_token:
+                    headers["Authorization"] = f"Bearer {auth_token}"
+
                 async with httpx.AsyncClient(timeout=httpx.Timeout(5.0)) as client:
-                    tags_response = await client.get(f"{base_url}/api/tags")
+                    tags_response = await client.get(f"{base_url}/api/tags", headers=headers)
♻️ Duplicate comments (4)
python/tests/test_ollama_auth_token.py (1)

170-180: Same patch target issue as above.

This test has the same patching concern.

archon-ui-main/src/components/settings/RAGSettings.tsx (3)

142-176: Add missing Ollama auth/API fields to RAGSettingsProps and remove as any casts

ragSettings is accessed with properties that are not declared on its type:

  • OLLAMA_CHAT_AUTH_TOKEN (lines 231, 233–235, 243, 249)
  • OLLAMA_EMBEDDING_AUTH_TOKEN (lines 254, 256–258, 266, 272)
  • OLLAMA_API_MODE (lines 1605–1607, 1613–1618, 1631–1633, 1638–1643)

This forces usage of (ragSettings as any) and violates the strict-typing guideline. It also makes it easy to accidentally break these flows without a type error.

Define the fields on ragSettings and then remove the as any casts and expression dependencies:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
     USE_CONTEXTUAL_EMBEDDINGS: boolean;
@@
-  const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' });
-  const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' });
+  const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' });
+  const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' });
@@
-    const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || '';
@@
-  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]);
+  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]);
@@
-    const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || '';
@@
-  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]);
+  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);

This keeps the Ollama auth/API-mode surface properly typed and removes the need for any.

Also applies to: 225-227, 231-235, 243-244, 249-249, 254-258, 266-272, 1596-1653


682-692: Remove hardcoded default URL guard so models load for the default instance

The auto-load effects currently skip model loading for the default Ollama URL:

if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
  loadAvailableModels(llmInstanceConfig.url, 'chat');
}

Users running the default instance will not see available models auto-populated until they change the URL, which is confusing.

You can safely load models for any non-empty URL:

-  useEffect(() => {
-    if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(llmInstanceConfig.url, 'chat');
-    }
-  }, [llmInstanceConfig.url]);
+  useEffect(() => {
+    if (llmInstanceConfig.url?.trim()) {
+      void loadAvailableModels(llmInstanceConfig.url, 'chat');
+    }
+  }, [llmInstanceConfig.url]);
@@
-  useEffect(() => {
-    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
-    }
-  }, [embeddingInstanceConfig.url]);
+  useEffect(() => {
+    if (embeddingInstanceConfig.url?.trim()) {
+      void loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
+    }
+  }, [embeddingInstanceConfig.url]);

This aligns with earlier feedback and improves the out-of-the-box experience.


2360-2383: Validate auth tokens before saving when useAuth is enabled

Right now, when useAuth is checked but the token is empty or whitespace, the Save handlers still persist an empty token:

  • LLM modal Save (lines 2395–2401)
  • Embedding modal Save (lines 2485–2490)

This leads to a configuration that looks authenticated in the UI but always fails at runtime, without immediate feedback.

Add a simple validation gate before building updatedSettings in both handlers, e.g.:

// LLM modal Save
 const Button
   onClick={async () => {
-    const updatedSettings = {
-      ...ragSettings,
-      LLM_BASE_URL: llmInstanceConfig.url,
-      OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : ''
-    };
+    if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) {
+      showToast('Authentication token is required when authentication is enabled', 'error');
+      return;
+    }
+
+    const updatedSettings = {
+      ...ragSettings,
+      LLM_BASE_URL: llmInstanceConfig.url,
+      OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken.trim() : ''
+    };
@@

// Embedding modal Save
-    const updatedSettings = {
-      ...ragSettings,
-      OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url,
-      OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''
-    };
+    if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) {
+      showToast('Authentication token is required when authentication is enabled', 'error');
+      return;
+    }
+
+    const updatedSettings = {
+      ...ragSettings,
+      OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url,
+      OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken.trim() : ''
+    };

This prevents “silent failure” configs and gives users actionable feedback.

Also applies to: 2395-2402, 2450-2473, 2485-2491

🧹 Nitpick comments (3)
python/tests/test_ollama_auth_token.py (2)

30-40: Unused fixture: mock_credential_service

This fixture is defined but never used in any of the tests in this file. Either remove it or add tests that utilize credential service mocking for more comprehensive coverage.

-    @pytest.fixture
-    def mock_credential_service(self):
-        """Mock credential service with auth token settings."""
-        mock_service = MagicMock()
-        mock_service.get_credentials_by_category = AsyncMock(return_value={
-            "LLM_BASE_URL": "http://localhost:11434",
-            "OLLAMA_EMBEDDING_URL": "http://localhost:11434",
-            "OLLAMA_CHAT_AUTH_TOKEN": "test-chat-token",
-            "OLLAMA_EMBEDDING_AUTH_TOKEN": "test-embedding-token",
-        })
-        return mock_service
-

62-64: Consider a clearer assertion for auth_token verification.

The or logic works but makes debugging harder when tests fail. Consider checking which argument style was actually used:

-            # Check that auth_token was passed as keyword argument
-            assert call_args.kwargs.get("auth_token") == "test-auth-token" or \
-                   (len(call_args.args) > 1 and call_args.args[1] == "test-auth-token")
+            # Check that auth_token was passed (either as kwarg or positional)
+            auth_token_in_kwargs = call_args.kwargs.get("auth_token") == "test-auth-token"
+            auth_token_in_args = len(call_args.args) > 1 and call_args.args[1] == "test-auth-token"
+            assert auth_token_in_kwargs or auth_token_in_args, (
+                f"auth_token not found in call. kwargs={call_args.kwargs}, args={call_args.args}"
+            )
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

212-221: Optionally DRY up the default Ollama URL usage

The default Ollama URL literal 'http://host.docker.internal:11434/v1' appears in several places despite the presence of DEFAULT_OLLAMA_URL:

  • Initial llmInstanceConfig and embeddingInstanceConfig (lines 212, 218)
  • Auto-load effects (to be refactored per earlier comment)
  • Page-load testing fallbacks (lines 1253–1263, 1285–1293)

To keep things consistent and easier to change later, you could centralize on DEFAULT_OLLAMA_URL:

const [llmInstanceConfig, setLLMInstanceConfig] = useState({
  name: '',
-  url: ragSettings.LLM_BASE_URL || 'http://host.docker.internal:11434/v1',
+  url: ragSettings.LLM_BASE_URL || DEFAULT_OLLAMA_URL,
  useAuth: false,
  authToken: ''
});
@@
const [embeddingInstanceConfig, setEmbeddingInstanceConfig] = useState({
  name: '',
-  url: ragSettings.OLLAMA_EMBEDDING_URL || 'http://host.docker.internal:11434/v1',
+  url: ragSettings.OLLAMA_EMBEDDING_URL || DEFAULT_OLLAMA_URL,
  useAuth: false,
  authToken: ''
});

Not critical, but it avoids subtle drift if the default host ever needs to change.

Also applies to: 682-692, 1253-1263, 1285-1293

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7b5ea6 and 3f13eb0.

📒 Files selected for processing (6)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (17 hunks)
  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1 hunks)
  • archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1 hunks)
  • python/src/server/api_routes/ollama_api.py (4 hunks)
  • python/src/server/services/llm_provider_service.py (5 hunks)
  • python/tests/test_ollama_auth_token.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/components/settings/utils/instanceConfigSync.ts
  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/components/settings/utils/instanceConfigSync.ts
  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/components/settings/utils/instanceConfigSync.ts
  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data

Files:

  • python/src/server/api_routes/ollama_api.py
  • python/src/server/services/llm_provider_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

  • python/src/server/api_routes/ollama_api.py
  • python/src/server/services/llm_provider_service.py
python/src/server/api_routes/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/api_routes/**/*.py: Follow the API route → Service → Database pattern for new API endpoints
Use database values directly without frontend mapping to maintain type-safe end-to-end communication from backend upward

Files:

  • python/src/server/api_routes/ollama_api.py
python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

  • python/src/server/api_routes/ollama_api.py
  • python/tests/test_ollama_auth_token.py
  • python/src/server/services/llm_provider_service.py
archon-ui-main/src/**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use Vitest with React Testing Library for frontend testing

Files:

  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
archon-ui-main/**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest with React Testing Library for frontend testing

Files:

  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
python/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use Pytest for testing with async support in Python backend

Use Pytest for testing Python code with async support

Files:

  • python/tests/test_ollama_auth_token.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations

Files:

  • python/src/server/services/llm_provider_service.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: zaksnet
Repo: coleam00/Archon PR: 473
File: python/src/providers_clean/api/routes/post_migrate_model_config.py:7-9
Timestamp: 2025-09-10T06:45:10.853Z
Learning: User zaksnet prefers to handle authentication/authorization in separate PRs rather than mixing security infrastructure with feature development.
📚 Learning: 2025-11-29T19:25:01.813Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: python/src/agent_work_orders/CLAUDE.md:0-0
Timestamp: 2025-11-29T19:25:01.813Z
Learning: Applies to python/src/agent_work_orders/**/*.py : Do not log sensitive data such as passwords, API keys, or tokens; mask them using format like `api_key[:8] + "..."`

Applied to files:

  • python/src/server/api_routes/ollama_api.py
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/**/*.test.{ts,tsx} : Use Vitest with React Testing Library for frontend testing

Applied to files:

  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{test,spec}.{ts,tsx} : Use Vitest with React Testing Library for frontend testing

Applied to files:

  • archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (4)
python/src/server/api_routes/ollama_api.py (2)
python/src/server/services/credential_service.py (1)
  • get_credentials_by_category (323-369)
python/src/server/services/ollama/model_discovery_service.py (3)
  • discover_models_from_multiple_instances (1026-1140)
  • check_instance_health (964-1024)
  • discover_models (122-243)
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (2)
  • InstanceConfig (5-10)
  • syncEmbeddingFromLLM (18-28)
python/src/server/services/llm_provider_service.py (3)
python/src/server/services/credential_service.py (1)
  • get_credentials_by_category (323-369)
python/src/server/api_routes/settings_api.py (1)
  • get_credentials_by_category (78-93)
python/src/server/services/ollama/model_discovery_service.py (1)
  • check_instance_health (964-1024)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (14)
python/src/server/api_routes/ollama_api.py (3)

99-143: Auth token propagation looks correct.

The token extraction from RAG settings, URL normalization, and mapping to auth_tokens_map is implemented correctly. The normalized URL comparison ensures consistent token matching.


181-218: Health check auth token handling is correct.

The implementation properly extracts tokens from RAG settings, normalizes URLs for consistent matching, and passes the appropriate token to check_instance_health(). Debug logging only outputs keys/URLs, not secret values.


277-308: Validation endpoint correctly propagates auth tokens.

The auth token is properly extracted, matched to the appropriate URL, and passed through both validate_provider_instance() and discover_models() calls.

python/src/server/services/llm_provider_service.py (4)

415-441: Ollama fallback auth token handling looks correct.

The token selection based on use_embedding_provider is appropriate, and the "required-but-ignored" fallback handles Ollama instances without auth enabled. Token values are not logged (only length would be logged in the main Ollama path).


455-480: Ollama auth token handling is well implemented.

Token selection based on operation type (use_embedding_provider or instance_type == "embedding") is correct. Logging token length (not the token itself) is appropriate for debugging while maintaining security. Based on learnings, this follows the guideline to "mask [tokens] using format like api_key[:8] + \"...\"" — logging only length is similarly safe.


616-622: Good addition of debugging logs for URL resolution.

The logging at lines 618 and 621-622 will help diagnose embedding URL configuration issues without exposing sensitive data.


1211-1235: validate_provider_instance correctly extended with auth_token support.

The function signature, docstring, and implementation properly handle the optional auth_token parameter and pass it through to check_instance_health().

python/tests/test_ollama_auth_token.py (1)

133-144: Patch target may not intercept the actual httpx.AsyncClient usage.

The patch should target where httpx.AsyncClient is used (in the service module), not where it's defined. Verify the import style in model_discovery_service.py—if it uses import httpx, the patch target should be the full qualified path: src.server.services.ollama.model_discovery_service.httpx.AsyncClient.

archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)

5-28: Utility and typing look solid for config mirroring

InstanceConfig and syncEmbeddingFromLLM are minimal, type-safe, and behave as expected (copy + default name fallback, no mutation). They fit the intended “single source of truth” pattern for LLM/Embedding instance configs.

archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)

1-90: Good, comprehensive tests aligned with utility behavior

These Vitest tests exercise the main paths (field copy, default/custom names, auth edge cases, immutability) and match syncEmbeddingFromLLM’s contract. They also follow the Vitest testing guideline for frontend code.

archon-ui-main/src/components/settings/RAGSettings.tsx (4)

571-585: Ollama metrics and model loading wiring looks good

The expanded ollamaMetrics state (per-instance counts + available model lists) and loadAvailableModels helper are cleanly structured and reused from both auto-load effects and manualTestConnection. Using instance URL normalization and separate chat/embedding contexts should make the UI’s “available models” views reliable.

Also applies to: 641-680, 885-903


189-221: Propagating auth settings and using syncEmbeddingFromLLM is a nice UX improvement

The new useAuth/authToken fields on llmInstanceConfig and embeddingInstanceConfig, coupled with:

  • the “Use same host for embedding instance” checkbox wired to syncEmbeddingFromLLM(llmInstanceConfig) (line 2350), and
  • per-modal auth sections with password inputs (lines 2360–2383, 2450–2473), and
  • persistence into ragSettings via OLLAMA_CHAT_AUTH_TOKEN / OLLAMA_EMBEDDING_AUTH_TOKEN (lines 2396–2400, 2486–2490),

make the Ollama auth story much clearer and consistent across chat and embeddings.

Once the typings for those ragSettings fields are added (see earlier comment), this whole flow will be in good shape.

Also applies to: 2350-2351, 2450-2473, 2485-2491


1596-1653: API mode toggle UX for Ollama is clear and self-contained

The new “Ollama API Mode” section with explicit “Native Ollama API” vs “OpenAI-Compatible” buttons is a nice addition:

  • State is stored on ragSettings.OLLAMA_API_MODE.
  • Visual selection and radio-circle affordance make the current mode unambiguous.
  • Helper text (“Uses /api/embeddings endpoint” vs “Uses /v1/embeddings endpoint”) sets expectations well.

Once the OLLAMA_API_MODE field is added to RAGSettingsProps, this section is in good shape.


145-145: The specific rendering issue described does not appear in the current code

The exact code patterns referenced in this review comment—ragSettings.CHAT_MODEL || <span className="text-gray-500 italic">Not selected</span> rendering, activeSelection === 'chat' logic, and the getDisplayedChatModel() function—do not exist in the current RAGSettings.tsx implementation. Line 145 only contains the field definition CHAT_MODEL?: string; as part of a type, not active rendering logic.

If there is actual misalignment between CHAT_MODEL and MODEL_CHOICE field usage in the component, it should be verified against the actual current implementation and resubmitted with specifics about where in the code the issue manifests.

Likely an incorrect or invalid review comment.

Comment thread archon-ui-main/src/components/settings/RAGSettings.tsx


async def validate_provider_instance(provider: str, instance_url: str | None = None) -> dict[str, any]:
async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type hint uses any instead of Any.

The return type annotation dict[str, any] uses lowercase any, which refers to the built-in any() function, not the Any type. This may cause type checking issues with MyPy.

-async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]:
+async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]:
async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]:
🤖 Prompt for AI Agents
In python/src/server/services/llm_provider_service.py around line 1211, the
return type annotation uses the builtin any instead of the typing Any; change
the annotation from dict[str, any] to dict[str, Any] and ensure you import Any
(e.g. add "from typing import Any" at the top of the file if it's not already
imported) so static type checkers like MyPy recognize the correct type.

Milofax and others added 2 commits December 1, 2025 11:17
Merges OpenRouter embedding provider feature from upstream.
Resolves conflict in llm_provider_service.py by keeping auth_token parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses CodeRabbit review: CHAT_MODEL was undefined because
MODEL_CHOICE is the actual key used. Now uses helper functions
that correctly resolve the displayed model name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

16-20: Remove OpenRouter from EMBEDDING_CAPABLE_PROVIDERS

EMBEDDING_CAPABLE_PROVIDERS includes 'openrouter', but OpenRouter does not provide embedding models—it only acts as a proxy for LLM chat completions. This inclusion allows users to select OpenRouter as an embedding provider in the UI, which the backend cannot serve.

Revert to:

-const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama'];
+const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];

If OpenRouter embedding support is needed in the future, implement it with a feature flag once backend support is confirmed.

♻️ Duplicate comments (5)
python/src/server/services/llm_provider_service.py (1)

1211-1211: Type hint uses any instead of Any (previously flagged).

This issue was already identified in a previous review. The return type annotation uses lowercase any (the built-in function) instead of Any from the typing module, which will cause MyPy type checking issues.

Apply the previously suggested fix:

-async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, any]:
+async def validate_provider_instance(provider: str, instance_url: str | None = None, auth_token: str | None = None) -> dict[str, Any]:

Ensure Any is imported at the top of the file (it's already imported on line 11).

archon-ui-main/src/components/settings/RAGSettings.tsx (4)

142-175: Type Ollama auth/API-mode fields in RAGSettingsProps and drop (ragSettings as any) casts

You’re now reading/writing ragSettings.OLLAMA_CHAT_AUTH_TOKEN, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN, and ragSettings.OLLAMA_API_MODE, but:

  • These keys are not declared in RAGSettingsProps['ragSettings'].
  • The effects rely on (ragSettings as any).OLLAMA_*_AUTH_TOKEN both in the body and dependency arrays, which breaks strict typing and defeats the “no implicit any” goal.

This is essentially the same issue the previous review already flagged for the auth tokens. Suggest tightening the types and removing the any casts:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
@@
     EMBEDDING_MODEL?: string;
     EMBEDDING_PROVIDER?: string;
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
@@
   };
   setRagSettings: (settings: any) => void;
 }
@@
-  const lastLLMConfigRef = useRef({ url: '', name: '', authToken: '' });
-  const lastEmbeddingConfigRef = useRef({ url: '', name: '', authToken: '' });
+  const lastLLMConfigRef = useRef<{ url: string; name: string; authToken: string }>({
+    url: '',
+    name: '',
+    authToken: '',
+  });
+  const lastEmbeddingConfigRef = useRef<{ url: string; name: string; authToken: string }>({
+    url: '',
+    name: '',
+    authToken: '',
+  });
@@
   useEffect(() => {
     const newLLMUrl = ragSettings.LLM_BASE_URL || '';
     const newLLMName = ragSettings.LLM_INSTANCE_NAME || '';
-    const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || '';
@@
-  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]);
+  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]);
@@
   useEffect(() => {
     const newEmbeddingUrl = ragSettings.OLLAMA_EMBEDDING_URL || '';
     const newEmbeddingName = ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME || '';
-    const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || '';
@@
-  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]);
+  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);

This keeps the state-sync logic the same but restores type safety and satisfies strict TS in this file.

Also applies to: 225-227, 231-244, 249-272


641-679: Auto-load Ollama models should not special‑case the default URL

loadAvailableModels itself looks good, but the auto-load effects still skip loading when the URL equals the hardcoded default 'http://host.docker.internal:11434/v1'. That means users running a working default Ollama instance won’t see any models until they tweak the URL.

You already have DEFAULT_OLLAMA_URL; I’d drop the special-case and only require a non-empty, trimmed URL:

  // Auto-load available models when URLs are configured
  useEffect(() => {
-    if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(llmInstanceConfig.url, 'chat');
+    if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) {
+      void loadAvailableModels(llmInstanceConfig.url, 'chat');
     }
   }, [llmInstanceConfig.url]);

   useEffect(() => {
-    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
+    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) {
+      void loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
     }
   }, [embeddingInstanceConfig.url]);

This also aligns with the earlier review suggestion to avoid hardcoding that URL.

Also applies to: 682-692


787-807: Keep instance config shape consistent and clear auth fields when deleting instances

llmInstanceConfig / embeddingInstanceConfig now include useAuth and authToken, but some paths still construct configs with { name, url } only and deletion doesn’t clear the persisted auth fields:

  • handleDeleteLLMInstance and handleDeleteEmbeddingInstance reset state to { name: '', url: '' }, dropping useAuth and authToken.
  • Deletion only deletes LLM_BASE_URL, MODEL_CHOICE, OLLAMA_EMBEDDING_URL, and EMBEDDING_MODEL from ragSettings; instance names and auth tokens remain.
  • The “auto-populate embedding instance” block on LLM URL change also only sets { name, url }, ignoring auth flags, while you already have syncEmbeddingFromLLM for shape-safe syncing.

This can lead to inconsistent state shape and “ghost” tokens surviving after delete. Consider:

  const handleDeleteLLMInstance = () => {
@@
-      setLLMInstanceConfig({
-        name: '',
-        url: ''
-      });
+      setLLMInstanceConfig({
+        name: '',
+        url: '',
+        useAuth: false,
+        authToken: ''
+      });
@@
       const updatedSettings = { ...ragSettings };
       delete updatedSettings.LLM_BASE_URL;
       delete updatedSettings.MODEL_CHOICE;
+      delete updatedSettings.LLM_INSTANCE_NAME;
+      delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN;
@@
  const handleDeleteEmbeddingInstance = () => {
@@
-      setEmbeddingInstanceConfig({
-        name: '',
-        url: ''
-      });
+      setEmbeddingInstanceConfig({
+        name: '',
+        url: '',
+        useAuth: false,
+        authToken: ''
+      });
@@
       const updatedSettings = { ...ragSettings };
       delete updatedSettings.OLLAMA_EMBEDDING_URL;
       delete updatedSettings.EMBEDDING_MODEL;
+      delete updatedSettings.OLLAMA_EMBEDDING_INSTANCE_NAME;
+      delete updatedSettings.OLLAMA_EMBEDDING_AUTH_TOKEN;
@@
-                    // Auto-populate embedding instance if it's empty (convenience for single-host users)
-                    if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) {
-                      setEmbeddingInstanceConfig({
-                        name: llmInstanceConfig.name || 'Default Ollama',
-                        url: newUrl
-                      });
-                    }
+                    // Auto-populate embedding instance if it's empty (convenience for single-host users)
+                    if (!embeddingInstanceConfig.url || !embeddingInstanceConfig.name) {
+                      // Reuse shared helper to keep shape (including auth flags) consistent
+                      setEmbeddingInstanceConfig(syncEmbeddingFromLLM({ ...llmInstanceConfig, url: newUrl }));
+                    }
@@
-                        // Sync embedding instance with LLM instance (including auth settings)
-                        setEmbeddingInstanceConfig(syncEmbeddingFromLLM(llmInstanceConfig));
+                        // Sync embedding instance with LLM instance (including auth settings)
+                        setEmbeddingInstanceConfig(syncEmbeddingFromLLM(llmInstanceConfig));

This keeps config state structurally consistent and ensures deleting an instance actually clears its auth settings as well.
[highlight: this overlaps with the earlier review about instance config shape.]

Also applies to: 809-829, 2331-2335, 2349-2351


2360-2383: Validate auth tokens before saving when Use Authentication is enabled

The LLM and Embedding edit modals now let users toggle “Use Authentication” and enter tokens, and you persist them into OLLAMA_CHAT_AUTH_TOKEN / OLLAMA_EMBEDDING_AUTH_TOKEN. However, if useAuth is true and the token input is empty/whitespace, you still write an empty token to ragSettings. This matches an earlier review concern and can lead to confusing failures when the backend attempts authenticated calls with an empty header.

Add a small guard before constructing updatedSettings in both modals:

                <Button
                  onClick={async () => {
+                   if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) {
+                     showToast('Authentication token is required when authentication is enabled', 'error');
+                     return;
+                   }
                     const updatedSettings = {
                       ...ragSettings,
                       LLM_BASE_URL: llmInstanceConfig.url,
                       OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : ''
                     };
@@
                <Button
                  onClick={async () => {
+                   if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) {
+                     showToast('Authentication token is required when authentication is enabled', 'error');
+                     return;
+                   }
                     const updatedSettings = {
                       ...ragSettings,
                       OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url,
                       OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''
                     };

This keeps the UX tight and avoids persisting obviously invalid credentials.

Also applies to: 2395-2401, 2485-2491

🧹 Nitpick comments (4)
python/src/server/services/llm_provider_service.py (2)

415-441: Consider extracting token resolution logic to reduce duplication.

The token resolution pattern here is duplicated at lines 455-480. Both segments fetch RAG settings, select the appropriate token (embedding vs chat), and apply the same "required-but-ignored" fallback logic.

Consider extracting this into a helper function:

+async def _resolve_ollama_auth_token(use_embedding_provider: bool, instance_type: str | None = None) -> str:
+    """Resolve Ollama auth token based on operation type."""
+    cache_key = "rag_strategy_settings"
+    rag_settings = _get_cached_settings(cache_key)
+    if rag_settings is None:
+        rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
+        _set_cached_settings(cache_key, rag_settings)
+    
+    # Select token based on operation type
+    if use_embedding_provider or instance_type == "embedding":
+        token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
+        logger.info(f"Using OLLAMA_EMBEDDING_AUTH_TOKEN (length: {len(token) if token else 0})")
+    else:
+        token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "")
+        logger.info(f"Using OLLAMA_CHAT_AUTH_TOKEN (length: {len(token) if token else 0})")
+    
+    # Use placeholder if no token is set
+    if not token:
+        token = "required-but-ignored"
+        logger.warning("No Ollama auth token found, using placeholder 'required-but-ignored'")
+    
+    return token

Then simplify both call sites:

-                    # Check for auth token in RAG settings for fallback
-                    cache_key = "rag_strategy_settings"
-                    rag_settings = _get_cached_settings(cache_key)
-                    if rag_settings is None:
-                        rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
-                        _set_cached_settings(cache_key, rag_settings)
-
-                    # Get correct auth token based on operation type
-                    if use_embedding_provider:
-                        ollama_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN", "")
-                    else:
-                        ollama_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN", "")
-
-                    # Use "required-but-ignored" as default if no token is set (Ollama doesn't validate when auth is disabled)
-                    if not ollama_auth_token:
-                        ollama_auth_token = "required-but-ignored"
+                    ollama_auth_token = await _resolve_ollama_auth_token(use_embedding_provider)

618-622: Consider consolidating the logging statements.

Lines 618 and 621 log very similar information about the embedding URL resolution. This could be simplified to a single log statement for clarity.

             embedding_url = rag_settings.get("OLLAMA_EMBEDDING_URL")
-            logger.info(f"Embedding URL from settings: {embedding_url}")
             if embedding_url:
                 final_url = embedding_url if embedding_url.endswith('/v1') else f"{embedding_url}/v1"
-                logger.info(f"Resolved embedding URL: {final_url}")
+                logger.info(f"Resolved embedding URL: {embedding_url} → {final_url}")
                 return final_url
archon-ui-main/src/components/settings/RAGSettings.tsx (2)

572-584: Ollama per-instance metrics + available model lists look consistent

The extended ollamaMetrics shape (per-instance counts for both LLM and embedding plus llmAvailableModels / embeddingAvailableModels) and the corresponding updates in fetchOllamaMetrics are internally consistent and line up with how you’re rendering:

  • Summary table rows use embeddingInstanceModels safely.
  • The “Available on this instance” row pulls counts by context, which matches the per-instance split.

If you don’t end up wiring llmAvailableModels / embeddingAvailableModels into the selection modals, you could later trim those from state to reduce surface area, but as-is this structure is fine and future‑proof.

Also applies to: 885-903


731-740: Toast model counts in manualTestConnection may be stale vs per-call instance status

Inside manualTestConnection, the success toast uses ollamaMetrics.llmInstanceModels / embeddingInstanceModels to derive modelCount, but those metrics are only refreshed later via fetchOllamaMetrics() and may still be at their initial or previous values. This can yield messages like “0 chat models available” even when the instance has models and the backend returned a model count.

Since instanceStatus from the health endpoint already contains models_available, consider basing the toast count on that and only using the metrics for more detailed UIs:

-          // Context-aware model count display
-          let modelCount = instanceStatus.models_available || 0;
-          let modelType = 'models';
-
-          if (context === 'chat') {
-            modelCount = ollamaMetrics.llmInstanceModels?.chat || 0;
-            modelType = 'chat models';
-          } else if (context === 'embedding') {
-            modelCount = ollamaMetrics.embeddingInstanceModels?.embedding || 0;
-            modelType = 'embedding models';
-          }
+          // Context-aware model count display
+          let modelCount = instanceStatus.models_available || 0;
+          let modelType = 'models';
+          if (context === 'chat') {
+            modelType = 'chat models';
+          } else if (context === 'embedding') {
+            modelType = 'embedding models';
+          }
@@
-            showToast(`${instanceName} connection successful: ${modelCount} ${modelType} available (${responseTime}ms)`, 'success');
+            showToast(
+              `${instanceName} connection successful: ${modelCount} ${modelType} available (${responseTime}ms)`,
+              'success'
+            );
@@
-          // Load available models after successful health check
-          if (context) {
-            await loadAvailableModels(url, context);
-          }
+          // Load available models after successful health check
+          if (context) {
+            await loadAvailableModels(url, context);
+          }

This keeps the metrics fetch for richer views while making the immediate toast reflect the actual response.

Also applies to: 752-755

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f13eb0 and 5ebcf11.

📒 Files selected for processing (2)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (17 hunks)
  • python/src/server/services/llm_provider_service.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data

Files:

  • python/src/server/services/llm_provider_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

  • python/src/server/services/llm_provider_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations

Files:

  • python/src/server/services/llm_provider_service.py
python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

  • python/src/server/services/llm_provider_service.py
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (3)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (2)
python/src/server/services/llm_provider_service.py (3)
python/src/server/services/credential_service.py (1)
  • get_credentials_by_category (323-369)
python/src/server/api_routes/settings_api.py (1)
  • get_credentials_by_category (78-93)
python/src/server/services/ollama/model_discovery_service.py (1)
  • check_instance_health (964-1024)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (4)
python/src/server/services/llm_provider_service.py (2)

455-480: Token resolution logic is correct.

The token selection correctly handles both use_embedding_provider and instance_type == "embedding" cases, ensuring the appropriate token is used for each operation type. The fallback to "required-but-ignored" is reasonable for Ollama instances without authentication enabled.

Note: The logging on lines 465 and 468 reveals which specific token type (embedding vs chat) is being used. While this doesn't expose the token value itself, consider whether this information disclosure is acceptable in your security model.


1218-1218: LGTM! Authentication token parameter properly documented and propagated.

The auth_token parameter is correctly documented in the function docstring and properly passed through to the check_instance_health service method on line 1235.

Also applies to: 1235-1235

archon-ui-main/src/components/settings/RAGSettings.tsx (2)

1594-1653: API mode UI is solid; just ensure OLLAMA_API_MODE is typed and persisted consistently

The “Ollama API Mode” toggle is a nice UX touch and correctly treats undefined as “native” by default. With the earlier type additions to RAGSettingsProps, this section becomes type-safe.

Nothing to change here beyond making sure OLLAMA_API_MODE?: 'native' | 'openai-compatible' is added to the ragSettings type (see previous comment) so these reads/writes don’t rely on structural typing.


1846-1873: Authentication and selected-model rows in the summary table are well designed

The new “Authentication” row correctly displays only a boolean status (“Token configured” vs “No authentication”) and avoids leaking token contents. Likewise, the “Selected Model” row now gracefully falls back to a “Not selected” placeholder for both chat and embeddings, which will help users debug misconfigurations.

No code changes needed here.

Also applies to: 1890-1899

Milofax and others added 2 commits December 1, 2025 13:53
- Add useAuth/authToken to instance config resets (RAGSettings.tsx)
- Clear auth tokens from ragSettings when deleting instances
- Fix type hint: dict[str, any] -> dict[str, Any] (llm_provider_service.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- ArchonChatPanel: Remove BookOpen, Search
- BugReportModal: Remove ExternalLink
- ErrorBoundaryWithBugReport: Remove React (use named imports)
- CodeViewerModal: Remove FileText
- DisconnectScreenOverlay: Remove Wifi, WifiOff
- APIKeysSection: Remove Key, Input

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)

142-176: Add Ollama auth/API mode fields to RAGSettingsProps and remove as any casts

The component now reads/writes OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN, and OLLAMA_API_MODE via ragSettings, but these keys are not declared on RAGSettingsProps["ragSettings"]. Accessing them via (ragSettings as any) both weakens type safety (violating strict mode) and makes it easy to desync UI from persisted settings. This also means ragSettings.OLLAMA_API_MODE is currently a type error.

You can fix this by extending the interface and wiring the effects to the typed fields, eliminating the as any usage:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
-    CHAT_MODEL?: string;
+    CHAT_MODEL?: string;
     USE_CONTEXTUAL_EMBEDDINGS: boolean;
@@
     EMBEDDING_MODEL?: string;
     EMBEDDING_PROVIDER?: string;
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
@@
   };
   setRagSettings: (settings: any) => void;
 }
@@
   useEffect(() => {
     const newLLMUrl = ragSettings.LLM_BASE_URL || '';
     const newLLMName = ragSettings.LLM_INSTANCE_NAME || '';
-    const newAuthToken = (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_CHAT_AUTH_TOKEN || '';
@@
-  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN]);
+  }, [ragSettings.LLM_BASE_URL, ragSettings.LLM_INSTANCE_NAME, ragSettings.OLLAMA_CHAT_AUTH_TOKEN]);
@@
   useEffect(() => {
     const newEmbeddingUrl = ragSettings.OLLAMA_EMBEDDING_URL || '';
     const newEmbeddingName = ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME || '';
-    const newAuthToken = (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN || '';
+    const newAuthToken = ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN || '';
@@
-  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, (ragSettings as any).OLLAMA_EMBEDDING_AUTH_TOKEN]);
+  }, [ragSettings.OLLAMA_EMBEDDING_URL, ragSettings.OLLAMA_EMBEDDING_INSTANCE_NAME, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN]);

This keeps the props contract honest and satisfies strict TypeScript without resorting to any.

Also applies to: 223-227, 228-249, 251-272, 1600-1659, 2404-2410, 2493-2499


641-693: Auto‑loading available models still skips the default Ollama URL

loadAvailableModels is a nice addition and the metrics wiring (llmAvailableModels / embeddingAvailableModels) looks good, but the auto‑load effects still short‑circuit when the URL equals the default http://host.docker.internal:11434/v1. This means users running Ollama on the default host won’t see model lists until they manually tweak the URL or hit “Test Connection”.

You can address this and simplify the condition by loading models whenever a non‑empty URL is present:

  // Auto-load available models when URLs are configured
  useEffect(() => {
-    if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(llmInstanceConfig.url, 'chat');
-    }
+    if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) {
+      void loadAvailableModels(llmInstanceConfig.url, 'chat');
+    }
   }, [llmInstanceConfig.url]);

   useEffect(() => {
-    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
-    }
+    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) {
+      void loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
+    }
   }, [embeddingInstanceConfig.url]);

This preserves the debounce behavior (hook still keys off URL changes) and makes the default local instance behave like any other.

Also applies to: 752-755, 572-584, 891-909


2368-2391: Require non‑empty auth tokens when authentication is enabled for LLM/embedding instances

The new auth controls in both edit modals correctly wire useAuth/authToken into local state and persist OLLAMA_CHAT_AUTH_TOKEN / OLLAMA_EMBEDDING_AUTH_TOKEN. However, if a user enables “Use Authentication” but leaves the token empty, the code still proceeds to “save” and pushes an empty string into ragSettings (effectively disabling auth) without any feedback.

This is the same edge case highlighted in the prior review; adding lightweight validation before constructing updatedSettings will prevent confusing failures:

// LLM instance modal save
-                <Button
-                  onClick={async () => {
-                    const updatedSettings = {
+                <Button
+                  onClick={async () => {
+                    if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) {
+                      showToast('Authentication token is required when authentication is enabled', 'error');
+                      return;
+                    }
+                    const updatedSettings = {
                       ...ragSettings,
                       LLM_BASE_URL: llmInstanceConfig.url,
                       OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : ''
                     };
                     setRagSettings(updatedSettings);
@@
// Embedding instance modal save
-                <Button
-                  onClick={async () => {
-                    const updatedSettings = {
+                <Button
+                  onClick={async () => {
+                    if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) {
+                      showToast('Authentication token is required when authentication is enabled', 'error');
+                      return;
+                    }
+                    const updatedSettings = {
                       ...ragSettings,
                       OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url,
                       OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''
                     };
                     setRagSettings(updatedSettings);

That keeps the UX aligned with the toggles: “Use Authentication” can’t be turned on with an empty token.

Also applies to: 2403-2432, 2458-2481, 2493-2521

🧹 Nitpick comments (4)
archon-ui-main/src/components/bug-report/BugReportModal.tsx (1)

230-235: Consider removing the as any type assertion.

Line 233 uses as any which bypasses TypeScript's type checking. The severity field should have a proper union type matching the dropdown options.

As per coding guidelines, TypeScript strict mode with no implicit any should be enforced.

You can define a proper type for severity and use it instead:

type BugSeverity = "low" | "medium" | "high" | "critical";

Then update the cast to:

 onChange={(e) =>
   setReport((r) => ({
     ...r,
-    severity: e.target.value as any,
+    severity: e.target.value as BugSeverity,
   }))
 }
python/src/server/services/llm_provider_service.py (2)

399-445: Ollama auth token resolution and fallback behavior look consistent

The new logic to:

  • Resolve OLLAMA_CHAT_AUTH_TOKEN / OLLAMA_EMBEDDING_AUTH_TOKEN from cached rag_strategy settings,
  • Fall back to a "required-but-ignored" placeholder when no token is configured, and
  • Use that value as api_key for both the OpenAI→Ollama fallback and direct "ollama" provider paths,

is coherent and avoids logging the actual token (only lengths are logged). This should work whether auth is disabled (placeholder ignored by Ollama) or enabled (real token present).

If you find this pattern spreading, consider extracting a small helper like get_ollama_auth_token(use_embedding_provider: bool, rag_settings: dict) -> str to keep the token selection and placeholder rules in one place, but it’s not strictly necessary right now.

Also applies to: 447-481


1211-1235: validate_provider_instance auth_token plumbing is correct; consider richer error logs

Extending validate_provider_instance to accept auth_token and forwarding it into model_discovery_service.check_instance_health(instance_url, auth_token=auth_token) correctly enables per‑instance authenticated health checks for Ollama. The return shape and non‑Ollama branches are preserved, and the return type now uses Any (fixing the earlier any typo).

To better align with the backend logging guidelines, consider adding exc_info=True to the error log in the except block so health‑check failures preserve full stack traces:

-    except Exception as e:
-        logger.error(f"Error validating provider {provider}: {e}")
+    except Exception as e:
+        logger.error(f"Error validating provider {provider}: {e}", exc_info=True)

Also applies to: 1275-1285

archon-ui-main/src/components/settings/RAGSettings.tsx (1)

1852-1879: Align “Authentication” summary row with useAuth state, not just token presence

The “Authentication” row in the Ollama configuration summary currently reports “Token configured” purely based on whether authToken is non‑empty, ignoring useAuth. Because you intentionally keep the token around in local state even when useAuth is toggled off, this can show “Token configured” while authentication is actually disabled.

You can make the summary reflect the real effective state by incorporating useAuth:

-                      <tr>
-                        <td className="py-2 text-gray-400">Authentication</td>
-                        <td className="py-2">
-                          {activeSelection === 'chat' ? (
-                            llmInstanceConfig.authToken ? (
+                      <tr>
+                        <td className="py-2 text-gray-400">Authentication</td>
+                        <td className="py-2">
+                          {activeSelection === 'chat' ? (
+                            llmInstanceConfig.useAuth && llmInstanceConfig.authToken ? (
                               <span className="text-teal-400 flex items-center">
@@
                             ) : (
                               <span className="text-gray-500 italic">No authentication</span>
                             )
                           ) : (
-                            embeddingInstanceConfig.authToken ? (
+                            embeddingInstanceConfig.useAuth && embeddingInstanceConfig.authToken ? (
                               <span className="text-teal-400 flex items-center">
@@
                             ) : (
                               <span className="text-gray-500 italic">No authentication</span>
                             )
                           )}
                        </td>
                      </tr>

This keeps “Token configured” in sync with the toggle while still letting you cache the token client‑side for convenience.

Also applies to: 2368-2390, 2458-2480

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ebcf11 and 7d4c702.

📒 Files selected for processing (8)
  • archon-ui-main/src/components/DisconnectScreenOverlay.tsx (1 hunks)
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx (1 hunks)
  • archon-ui-main/src/components/bug-report/BugReportModal.tsx (1 hunks)
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx (1 hunks)
  • archon-ui-main/src/components/code/CodeViewerModal.tsx (0 hunks)
  • archon-ui-main/src/components/settings/APIKeysSection.tsx (1 hunks)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (20 hunks)
  • python/src/server/services/llm_provider_service.py (5 hunks)
💤 Files with no reviewable changes (1)
  • archon-ui-main/src/components/code/CodeViewerModal.tsx
🧰 Additional context used
📓 Path-based instructions (7)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/DisconnectScreenOverlay.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/DisconnectScreenOverlay.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/DisconnectScreenOverlay.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
  • archon-ui-main/src/components/settings/RAGSettings.tsx
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data

Files:

  • python/src/server/services/llm_provider_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

  • python/src/server/services/llm_provider_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations

Files:

  • python/src/server/services/llm_provider_service.py
python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

  • python/src/server/services/llm_provider_service.py
🧠 Learnings (12)
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Use Radix UI primitives from src/features/ui/primitives/ for new UI components in features directory

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/!(features)/**/*.{ts,tsx} : Use ESLint with standard React rules for legacy code in the frontend

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/settings/APIKeysSection.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx,js,jsx} : Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
  • archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Apply Tron-inspired glassmorphism styling with Tailwind and follow responsive design patterns with mobile-first approach

Applied to files:

  • archon-ui-main/src/components/bug-report/BugReportModal.tsx
📚 Learning: 2025-09-04T16:30:05.227Z
Learnt from: stevepresley
Repo: coleam00/Archon PR: 573
File: archon-ui-main/src/config/api.ts:15-25
Timestamp: 2025-09-04T16:30:05.227Z
Learning: Archon UI API config: Prefer lazy getters getApiFullUrl() and getWsUrl() over module-load constants to avoid SSR/test crashes. Avoid CommonJS exports patterns (Object.defineProperty(exports,…)) in ESM. Add typeof window guards with VITE_API_URL fallback inside getApiUrl()/getWebSocketUrl() when SSR safety is required.

Applied to files:

  • archon-ui-main/src/components/settings/APIKeysSection.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Apply Tron-inspired glassmorphism styling with Tailwind in features directory UI components

Applied to files:

  • archon-ui-main/src/components/settings/APIKeysSection.tsx
📚 Learning: 2025-08-28T12:03:56.388Z
Learnt from: Wirasm
Repo: coleam00/Archon PR: 514
File: archon-ui-main/src/components/layouts/ArchonChatPanel.tsx:71-81
Timestamp: 2025-08-28T12:03:56.388Z
Learning: The agent chat service is disabled in the Archon codebase, so optimizations or fixes related to agent chat functionality may not be necessary to implement.

Applied to files:

  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Use TanStack Query for all data fetching with smart HTTP polling where appropriate

Applied to files:

  • archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
🔇 Additional comments (7)
archon-ui-main/src/components/agent-chat/ArchonChatPanel.tsx (1)

2-2: LGTM! Clean removal of unused imports.

The BookOpen and Search icons are not referenced anywhere in the component, so removing them from the import statement is appropriate and improves code cleanliness.

archon-ui-main/src/components/DisconnectScreenOverlay.tsx (1)

2-2: LGTM! Cleanup of unused imports.

The removal of Wifi and WifiOff imports is correct since only the X icon is used in the component (line 40). This cleanup improves code maintainability.

archon-ui-main/src/components/bug-report/BugReportModal.tsx (1)

3-3: LGTM! Clean removal of unused import.

The ExternalLink icon was not referenced anywhere in the component, so removing it from the import is appropriate cleanup.

archon-ui-main/src/components/bug-report/ErrorBoundaryWithBugReport.tsx (1)

1-1: LGTM! Modern import pattern aligns with new JSX transform.

The refactoring to named imports is correct and aligns with the new JSX runtime. All necessary imports (Component, ErrorInfo, ReactNode) are present and properly used throughout the file.

archon-ui-main/src/components/settings/APIKeysSection.tsx (1)

2-2: Import surface reduction looks good

Removing unused icons from the lucide-react import keeps the bundle lean and avoids unused‑symbol lint noise. No behavioral changes introduced.

python/src/server/services/llm_provider_service.py (1)

612-622: Embedding URL logging is helpful and safe

The additional logging of OLLAMA_EMBEDDING_URL and the resolved /v1 URL in _get_optimal_ollama_instance improves observability for routing issues without exposing secrets. Behavior for chat vs embedding URLs and the fallback to LLM_BASE_URL remains unchanged.

archon-ui-main/src/components/settings/RAGSettings.tsx (1)

1600-1659: Ollama API mode toggle and instance auth state wiring look solid

The new “Ollama API Mode” radio group correctly writes OLLAMA_API_MODE as either 'native' or 'openai-compatible' into ragSettings, and the top‑level “Save Settings” path will persist it via credentialsService.updateRagSettings(updatedSettings).

Similarly, extending llmInstanceConfig / embeddingInstanceConfig with useAuth and authToken, ensuring deletion resets include those fields, and using syncEmbeddingFromLLM(llmInstanceConfig) when “Use same host for embedding instance” is checked all keep the instance configuration shape consistent and address the earlier state‑shape/reset concerns.

No changes needed here beyond the type additions already suggested.

Also applies to: 210-221, 217-221, 2357-2358, 787-821

Milofax and others added 2 commits December 1, 2025 14:26
- Add pymupdf4llm>=0.2.5 as primary PDF extraction library
- Produces clean Markdown with proper word separation
- Falls back to pdfplumber, then PyPDF2 if needed
- Validated: pymupdf4llm extracts "Since 1996, Bloomberg" vs
  pdfplumber's "Since1996,Bloomberg" (no spaces)

Phase0 validated, Phase1 complete. Phase2 (OCR) pending.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…de block protection

- Add heading boundary (\n#) as highest priority breakpoint
- Protect code blocks from being split mid-block
- Add is_inside_code_block() helper to check if position is within ```...```
- Only break at paragraphs/sentences when outside code blocks

This improves RAG quality for Markdown documents (especially from pymupdf4llm)
by preserving semantic structure during chunking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Milofax and others added 3 commits December 1, 2025 15:52
…dels

BREAKING: Search now supports all embedding dimensions (384, 768, 1024, 1536, 3072)

Previously search was hardcoded to use 1536-dim functions, which meant:
- snowflake-arctic-embed2 (1024-dim) → NO RESULTS
- nomic-embed-text (768-dim) → NO RESULTS
- Only OpenAI text-embedding-3-small (1536-dim) worked

Fix:
- Detect dimension from query_embedding: len(query_embedding)
- Use _multi RPC functions with embedding_dimension parameter
- base_search_strategy.py: match_archon_*_multi
- hybrid_search_strategy.py: hybrid_search_*_multi

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds automatic OCR extraction for scanned/image-based PDFs that don't
contain extractable text layers.

New files:
- ocr_processing.py: Tesseract OCR engine wrapper
  - extract_text_with_ocr(): Convert PDF pages to images, run OCR
  - is_ocr_available(): Check for pytesseract/pdf2image
  - get_supported_languages(): List installed Tesseract languages

Changes:
- document_processing.py: OCR fallback when text extraction fails
- pyproject.toml: Add pytesseract>=0.3.10, pdf2image>=1.17.0

Flow: pymupdf4llm → pdfplumber → PyPDF2 → Tesseract OCR

Requires system dependencies:
- macOS: brew install tesseract poppler
- Linux: apt install tesseract-ocr poppler-utils

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@tazmon95

tazmon95 commented Dec 3, 2025

Copy link
Copy Markdown
Collaborator

@Milofax, We already supported Ollama and have all the capabilities listed in this PR except for the Ollama API selection capability but don't see a need for this as we work with the native Ollama method and I've never heard of someone using Ollama via the OpenAI specification when the native one is available.

@Milofax

Milofax commented Dec 3, 2025

Copy link
Copy Markdown
Author

Hey @tazmon95, thanks for the feedback! Let me provide some context on why this PR exists.

To be honest, I wouldn't have touched the Ollama implementation at all if it had worked for my use case. The changes
Here weren't about adding features for the sake of it—they came from real issues I encountered:

  1. Remote Ollama instances weren't properly supported
    The original implementation was essentially hardcoded to localhost host.docker.internal. For anyone running Ollama on
    a separate server (which is common for GPU offloading or shared infrastructure), this didn't work reliably.

  2. Auth token support was missing
    Protected Ollama instances (behind reverse proxies with authentication) had no way to pass credentials through the
    embedding and health-check flows.

  3. The API mode toggle exists because the original code used OpenAI-compatible mode
    I didn't add this toggle because I wanted it—it's there for backwards compatibility. The existing codebase was calling
    Ollama via the OpenAI-compatible /v1/embeddings endpoint. We added the native /api/embeddings adapter (which is more reliable for Ollama) but kept the old approach as a fallback option.

If the toggle adds unnecessary complexity and nobody uses the OpenAI-compatible mode for Ollama, I'm totally fine with
removing it and just defaulting to native. The core value of this PR is really about making remote/authenticated Ollama
setups work properly.

Happy to discuss or simplify where it makes sense! 🙂

@tazmon95

tazmon95 commented Dec 3, 2025

Copy link
Copy Markdown
Collaborator

Thanks @Milofax ,

  1. I've always used Ollama with remote hosts, neve with the host.docker.internal address. This has worked fine since the beginning.

  2. This is a good addition, I didn't try it before.

  3. Didn't realize that, I must be mistaken then. I'll take a look, it makes more sense to me to use the native method.

Milofax added a commit to Milofax/Archon that referenced this pull request Dec 3, 2025
…issues

Phase 3 Summary Feature:
- Beautiful View with react-markdown + MermaidRenderer
- Edit View with textarea + live preview
- Backend summary update support
- Regenerate warning dialog
- E2E tests (14/14 passed)

CodeRabbit Fixes (PR coleam00#875):
- Fix authToken persistence in credentialsService
- Remove unnecessary 'as any' casts in RAGSettings
- Add authToken to OllamaInstance load/save

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…stence

- Add authToken persistence to setOllamaInstances in credentialsService.ts
- Remove 'as any' type casts for OLLAMA_*_AUTH_TOKEN in RAGSettings.tsx
- Types are already defined in RAGSettings interface (lines 171-172)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
python/src/server/services/storage/base_storage_service.py (1)

78-80: Move helper function outside the loop to avoid repeated redefinition.

is_inside_code_block is redefined on every iteration of the while loop (line 64), which is inefficient. Since it doesn't capture any loop-local state, extract it to module level or as a static/class method.

+def _is_inside_code_block(text_segment: str, pos: int) -> bool:
+    """Check if position is inside a code block."""
+    return text_segment[:pos].count("```") % 2 == 1
+
+
 class BaseStorageService(ABC):

Then remove the nested definition and reference _is_inside_code_block directly inside smart_chunk_text.

python/src/server/utils/document_processing.py (3)

253-254: Consider moving imports to module level.

tempfile and os are standard library modules with negligible import overhead. Moving them to the top of the file improves clarity and avoids repeated import resolution on each function call.

 import io
+import os
+import tempfile
 
 # Removed direct logging import - using unified config

Then remove the inline imports at lines 253-254.


264-268: Extract minimum text thresholds to named constants.

The magic numbers 100 (lines 264, 289) and 50 (line 325) control fallback behavior but their purpose isn't immediately clear. Named constants would improve maintainability and make tuning easier.

# At module level
MIN_TEXT_CHARS_PRIMARY = 100  # Minimum chars for pymupdf4llm/pdfplumber success
MIN_TEXT_CHARS_OCR = 50       # Lower threshold for OCR (typically yields less text)

265-273: Inconsistent logging between logger and logfire.

Success messages use logger.info() (line 265) while failure/warning messages use logfire.warning() (lines 268, 273). This pattern continues throughout the function. Consider using one consistently for warnings/fallbacks to simplify log aggregation—typically logger.warning() for operational messages and logfire for structured observability events.

python/src/server/utils/ocr_processing.py (3)

109-111: Include exc_info=True for exception logging.

Per coding guidelines, exception handlers should preserve full stack traces. This aids debugging when OCR fails on specific pages.

             except Exception as e:
-                logger.warning(f"OCR failed on page {page_num}: {e}")
+                logger.warning(f"OCR failed on page {page_num}: {e}", exc_info=True)
                 continue

Apply the same pattern at lines 122-124 and 141-143.


122-124: Add stack trace to the error log.

The exception is logged but without the full traceback. For external API calls (Tesseract subprocess), detailed traces help diagnose issues like missing binaries or corrupted PDFs.

     except Exception as e:
-        logfire.error(f"OCR extraction failed: {e}")
+        logfire.error(f"OCR extraction failed: {e}", exc_info=True)
         return None

93-111: Consider validating language availability upfront.

If an invalid language is passed, Tesseract will fail on every page. While the per-page exception handler prevents crashes, it's inefficient. An upfront check could provide a clearer error:

if language not in get_supported_languages():
    logger.warning(f"Language '{language}' may not be installed, falling back to 'eng'")
    language = "eng"

This is optional since the current behavior is still correct.

python/src/server/services/search/hybrid_search_strategy.py (1)

143-163: Copy filter dict before mutation and align error logging in code example hybrid search

The multi‑dimensional code example hybrid search wiring looks consistent with the documents path. Two small tweaks will improve safety and observability:

  1. Avoid mutating filter_metadata

Same pattern as above: filter_json = filter_metadata or {} followed by filter_json.pop("source") mutates the caller’s dict. Copying is cheap and avoids surprises:

-                filter_json = filter_metadata or {}
+                filter_json = dict(filter_metadata or {})
@@
-                if not final_source_filter and "source" in filter_json:
-                    final_source_filter = filter_json.pop("source")
+                if not final_source_filter and "source" in filter_json:
+                    final_source_filter = filter_json.pop("source")

(Here only the first line changes; the pop still acts on a local copy.)

  1. Upgrade error logging to capture stack traces and context

In the except at the end of this method, mirroring the pattern suggested for vector_search (using exc_info=True and including match_count / source_id or final_source_filter in the log message) would bring this in line with the stated logging guidelines and make Supabase/DB issues much easier to troubleshoot.

Also applies to: 201-204

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d4c702 and 85e6276.

⛔ Files ignored due to path filters (1)
  • python/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • python/pyproject.toml (1 hunks)
  • python/src/server/services/search/base_search_strategy.py (2 hunks)
  • python/src/server/services/search/hybrid_search_strategy.py (2 hunks)
  • python/src/server/services/storage/base_storage_service.py (1 hunks)
  • python/src/server/utils/document_processing.py (5 hunks)
  • python/src/server/utils/ocr_processing.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
python/pyproject.toml

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.12 for backend code

Files:

  • python/pyproject.toml
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data

Files:

  • python/src/server/utils/document_processing.py
  • python/src/server/utils/ocr_processing.py
  • python/src/server/services/search/hybrid_search_strategy.py
  • python/src/server/services/storage/base_storage_service.py
  • python/src/server/services/search/base_search_strategy.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

  • python/src/server/utils/document_processing.py
  • python/src/server/utils/ocr_processing.py
  • python/src/server/services/search/hybrid_search_strategy.py
  • python/src/server/services/storage/base_storage_service.py
  • python/src/server/services/search/base_search_strategy.py
python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

  • python/src/server/utils/document_processing.py
  • python/src/server/utils/ocr_processing.py
  • python/src/server/services/search/hybrid_search_strategy.py
  • python/src/server/services/storage/base_storage_service.py
  • python/src/server/services/search/base_search_strategy.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations

Files:

  • python/src/server/services/search/hybrid_search_strategy.py
  • python/src/server/services/storage/base_storage_service.py
  • python/src/server/services/search/base_search_strategy.py
🧬 Code graph analysis (1)
python/src/server/utils/document_processing.py (1)
python/src/server/utils/ocr_processing.py (2)
  • is_ocr_available (36-38)
  • extract_text_with_ocr (41-124)
🔇 Additional comments (10)
python/src/server/services/storage/base_storage_service.py (3)

82-89: Heading boundary logic looks correct.

Good addition of heading-based break points. The 30% threshold prevents excessively small chunks, and the code block awareness is properly implemented.


91-95: Paragraph break logic correctly enhanced with code-block awareness.

The cascading check (end == original_end) ensures proper priority ordering, and the code block guard prevents breaking mid-code.


97-101: Sentence break fallback correctly updated with code-block awareness.

The priority cascading and code block guard are properly implemented. Including the period (+ 1) in the current chunk is correct behavior.

python/src/server/utils/document_processing.py (3)

13-18: LGTM on the availability check pattern.

The try/except pattern for optional dependency detection is consistent with the existing patterns for PyPDF2, pdfplumber, and docx. This gracefully handles environments where pymupdf4llm isn't installed.


321-338: Good error handling with actionable messages.

The OCR fallback path and error messages provide clear guidance for users, including installation instructions for missing dependencies. This aligns with the guideline to fail fast with descriptive errors when extraction completely fails.


47-83: LGTM on the code block preservation logic.

The regex-based approach correctly handles code blocks split by page markers. Good use of reversed() to maintain string positions during in-place replacements. The docstring clearly explains the problem being solved.

python/src/server/utils/ocr_processing.py (2)

127-143: LGTM on language discovery logic.

Good defensive implementation with a sensible fallback to ["eng"] when Tesseract's language list can't be retrieved. Filtering out osd is correct as it's a detection mode, not a language.


36-38: LGTM.

Simple and effective availability check.

python/src/server/services/search/base_search_strategy.py (1)

48-76: Add guard for empty embeddings and enrich error logging context

The multi-dimensional wiring (computing embedding_dimension, using f"{table_rpc}_multi", and logging the dimension) looks good, but two robustness gaps need attention per backend guidelines:

  1. Guard against empty/invalid query_embedding before deriving dimension

If query_embedding is None or empty, len(query_embedding) will fail downstream. Add an early validation check that logs and returns cleanly:

         with safe_span("base_vector_search", table=table_rpc, match_count=match_count) as span:
             try:
+                if not query_embedding:
+                    logger.error("Vector search received empty query_embedding")
+                    span.set_attribute("error", "empty_query_embedding")
+                    return []
+
                 # Detect embedding dimension from the query vector
                 embedding_dimension = len(query_embedding)
  1. Include stack trace and structured context in error logs

Update the exception handler to include exc_info=True and contextual fields (table_rpc, match_count) as required by backend guidelines for production debuggability:

             except Exception as e:
-                logger.error(f"Vector search failed: {e}")
+                logger.error(
+                    "Vector search failed for table_rpc=%s with match_count=%s",
+                    table_rpc,
+                    match_count,
+                    exc_info=True,
+                )
python/src/server/services/search/hybrid_search_strategy.py (1)

54-67: Avoid mutating caller filters and guard/document embedding assumptions

The switch to hybrid_search_archon_crawled_pages_multi with an explicit embedding_dimension is solid. A couple of small improvements would make this more robust and side‑effect free:

  1. Do not mutate filter_metadata via pop

filter_json = filter_metadata or {} creates an alias; the subsequent pop("source", ...) mutates the original dict passed by the caller. That can be surprising if the same filter dict is reused elsewhere.

You can cheaply avoid this by copying:

-                filter_json = filter_metadata or {}
+                filter_json = dict(filter_metadata or {})
                 source_filter = filter_json.pop("source", None) if "source" in filter_json else None
  1. Explicitly handle empty/None query_embedding before computing length

Even if callers are expected to always supply a valid embedding, a lightweight guard will prevent obscure failures when that assumption is violated:

-                # Detect embedding dimension from the query vector
-                embedding_dimension = len(query_embedding)
+                if not query_embedding:
+                    logger.error("Hybrid document search received empty query_embedding")
+                    span.set_attribute("error", "empty_query_embedding")
+                    return []
+
+                # Detect embedding dimension from the query vector
+                embedding_dimension = len(query_embedding)
  1. Align error logging with backend guidelines

As in BaseSearchStrategy.vector_search, consider switching the except block here to use exc_info=True and include relevant context (e.g., match_count, whether a source_filter was applied), so operational debugging is easier.

Also applies to: 108-111

Comment thread python/pyproject.toml
Comment thread python/src/server/utils/ocr_processing.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
archon-ui-main/src/services/credentialsService.ts (1)

62-75: Persisting Ollama auth tokens as plaintext credentials

authToken is now part of OllamaInstance and is persisted via setOllamaInstances with is_encrypted: false, meaning these per‑instance tokens will be stored in clear text like non‑secret config fields. For remote/authenticated Ollama hosts these tokens are effectively API credentials and should get the same at‑rest protections as other API keys.

Consider treating authToken as a secret when creating credentials, e.g. marking it encrypted (and aligning backend handling) while leaving non‑secret fields unchanged:

-      instances.forEach(instance => {
+      instances.forEach(instance => {
         const fields: Record<string, any> = {
           name: instance.name,
           baseUrl: instance.baseUrl,
           isEnabled: instance.isEnabled,
           isPrimary: instance.isPrimary,
           instanceType: instance.instanceType || 'both',
           loadBalancingWeight: instance.loadBalancingWeight || 100
         };
@@
-        if (instance.authToken) {
-          fields.authToken = instance.authToken;
-        }
+        if (instance.authToken) {
+          fields.authToken = instance.authToken;
+        }
@@
-        Object.entries(fields).forEach(([field, value]) => {
+        Object.entries(fields).forEach(([field, value]) => {
+          const isSecret = field === 'authToken';
           promises.push(
             this.createCredential({
               key: `ollama_instance_${instance.id}_${field}`,
               value: value.toString(),
-              is_encrypted: false,
+              is_encrypted: isSecret,
               category: 'ollama_instances'
             })
           );
         });

You may need a matching backend adjustment so ollama_instances credentials with is_encrypted: true are stored/decrypted consistently with api_keys.

Also applies to: 438-485, 496-551

archon-ui-main/src/components/settings/RAGSettings.tsx (1)

18-20: Remove OpenRouter from embedding-capable providers

EMBEDDING_CAPABLE_PROVIDERS incorrectly includes 'openrouter'. OpenRouter only proxies chat/completions endpoints and does not provide embedding models. Including it here misrepresents its capabilities to users and can cause confusion, as embeddings would actually come from the underlying routed provider (OpenAI, etc.), not OpenRouter itself. This was correctly excluded in the prior version.

-const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'openrouter', 'ollama'];
+const EMBEDDING_CAPABLE_PROVIDERS: ProviderKey[] = ['openai', 'google', 'ollama'];
♻️ Duplicate comments (3)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)

142-175: Add missing Ollama auth/API-mode fields to RAGSettingsProps.ragSettings

The component reads and writes ragSettings.OLLAMA_CHAT_AUTH_TOKEN, ragSettings.OLLAMA_EMBEDDING_AUTH_TOKEN, and ragSettings.OLLAMA_API_MODE in multiple places (effects, modals, API-mode toggle), but these fields are not declared on the ragSettings type. Under strict TypeScript this will be a type error and prevents catching typos on these keys.

Extend the inline type to include the missing fields:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
@@
     EMBEDDING_PROVIDER?: string;
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
     // Crawling Performance Settings

After this, you can continue to access those properties directly without any as any workarounds.

As per coding guidelines.


571-584: Model loading behavior and default Ollama URL handling

The new loadAvailableModels helper and ollamaMetrics extensions are wired cleanly, and calling loadAvailableModels after a successful manualTestConnection is a nice touch to keep model lists in sync.

One lingering issue is the hardcoded default URL check in the auto‑load effects:

useEffect(() => {
  if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
    loadAvailableModels(llmInstanceConfig.url, 'chat');
  }
}, [llmInstanceConfig.url]);

useEffect(() => {
  if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
    loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
  }
}, [embeddingInstanceConfig.url]);

This still skips auto‑loading models for users who keep the default local Ollama URL, forcing them to rely on the page‑load tests or manual “Test Connection” to populate model lists.

You can remove the special‑case while still guarding against empty values:

-  useEffect(() => {
-    if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(llmInstanceConfig.url, 'chat');
-    }
-  }, [llmInstanceConfig.url]);
+  useEffect(() => {
+    if (llmInstanceConfig.url && llmInstanceConfig.url.trim()) {
+      loadAvailableModels(llmInstanceConfig.url, 'chat');
+    }
+  }, [llmInstanceConfig.url]);
@@
-  useEffect(() => {
-    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
-      loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
-    }
-  }, [embeddingInstanceConfig.url]);
+  useEffect(() => {
+    if (embeddingInstanceConfig.url && embeddingInstanceConfig.url.trim()) {
+      loadAvailableModels(embeddingInstanceConfig.url, 'embedding');
+    }
+  }, [embeddingInstanceConfig.url]);

This matches prior guidance and makes the behavior consistent for default vs custom URLs.

Also applies to: 641-692, 752-755, 891-909


2350-2366: Validate authentication tokens before saving Ollama instance settings

The new auth sections in the LLM and Embedding modals let users enable useAuth and enter an authToken, and the save handlers persist tokens into ragSettings:

LLM:
  OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : ''

Embedding:
  OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''

However, if useAuth is enabled while the token field is empty or whitespace, the code still saves an empty token. That will cause authenticated calls to fail without immediate feedback.

Add a simple guard before constructing updatedSettings in both handlers:

// LLM Save Changes
<Button
  onClick={async () => {
+   if (llmInstanceConfig.useAuth && !llmInstanceConfig.authToken.trim()) {
+     showToast('Authentication token is required when authentication is enabled', 'error');
+     return;
+   }
    const updatedSettings = {
      ...ragSettings,
      LLM_BASE_URL: llmInstanceConfig.url,
      OLLAMA_CHAT_AUTH_TOKEN: llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : ''
    };
@@

// Embedding Save Changes
<Button
  onClick={async () => {
+   if (embeddingInstanceConfig.useAuth && !embeddingInstanceConfig.authToken.trim()) {
+     showToast('Authentication token is required when authentication is enabled', 'error');
+     return;
+   }
    const updatedSettings = {
      ...ragSettings,
      OLLAMA_EMBEDDING_URL: embeddingInstanceConfig.url,
      OLLAMA_EMBEDDING_AUTH_TOKEN: embeddingInstanceConfig.useAuth ? embeddingInstanceConfig.authToken : ''
    };

This keeps the UX tight and avoids silently persisting unusable auth configurations.

Also applies to: 2368-2391, 2403-2411, 2443-2456, 2458-2481, 2493-2501

🧹 Nitpick comments (2)
archon-ui-main/src/services/credentialsService.ts (1)

13-45: RagSettings additions for CHAT_MODEL and OLLAMA_API_MODE look consistent

Defaults plus string-field mapping for CHAT_MODEL and OLLAMA_API_MODE are wired correctly into getRagSettings, so these values will survive round‑trips through credentials. Just ensure all consumers (e.g., RAGSettings.tsx’s props type) declare the same fields to keep TypeScript happy; I’ve flagged that separately in the UI file.

Also applies to: 191-281

archon-ui-main/src/components/settings/RAGSettings.tsx (1)

225-272: Align auth token state, persistence, and summary display

The new refs/effects correctly sync OLLAMA_*_AUTH_TOKEN from ragSettings into llmInstanceConfig / embeddingInstanceConfig, but there’s a subtle mismatch with how the “Authentication” summary row is rendered:

  • Effects set useAuth: !!newAuthToken but keep authToken: newAuthToken || prev.authToken.
  • The summary row at lines 1852–1879 uses only authToken truthiness to decide between “Token configured” and “No authentication”.

That means if a user disables auth (useAuth = false) but the previous authToken string remains in state (or when ragSettings clears the token but local state still has prev.authToken), the summary can still show “Token configured” even though auth is off and no token will be sent.

You can make the UI match actual behavior by keying off useAuth && authToken:

-  {activeSelection === 'chat' ? (
-    llmInstanceConfig.authToken ? (
+  {activeSelection === 'chat' ? (
+    llmInstanceConfig.useAuth && llmInstanceConfig.authToken ? (
       <span className="text-teal-400 flex items-center">
@@
-    ) : (
-      <span className="text-gray-500 italic">No authentication</span>
-    )
-  ) : (
-    embeddingInstanceConfig.authToken ? (
+    ) : (
+      <span className="text-gray-500 italic">No authentication</span>
+    )
+  ) : (
+    embeddingInstanceConfig.useAuth && embeddingInstanceConfig.authToken ? (
       <span className="text-teal-400 flex items-center">
@@
-    ) : (
-      <span className="text-gray-500 italic">No authentication</span>
-    )
-  )}
+    ) : (
+      <span className="text-gray-500 italic">No authentication</span>
+    )
+  )}

Optionally, you could also drop the || prev.authToken fallback in the effects if you want state to always mirror what’s actually persisted in ragSettings.

Also applies to: 1852-1879

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85e6276 and 8c109ed.

📒 Files selected for processing (2)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (20 hunks)
  • archon-ui-main/src/services/credentialsService.ts (6 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (3)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx} : Use TypeScript in strict mode with no implicit any in frontend code

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

1600-1659: Ollama API mode toggle wiring looks correct

The new “Ollama API Mode” selector cleanly toggles ragSettings.OLLAMA_API_MODE between 'native' and 'openai-compatible', with sensible defaulting to native when the value is missing:

(!ragSettings.OLLAMA_API_MODE || ragSettings.OLLAMA_API_MODE === 'native')

and

onClick={() => setRagSettings({ ...ragSettings, OLLAMA_API_MODE: 'native' })}
onClick={() => setRagSettings({ ...ragSettings, OLLAMA_API_MODE: 'openai-compatible' })}

Once the props type is updated to declare OLLAMA_API_MODE, this should integrate smoothly with the backend’s mode selection logic.

- Remove secret logging: replace logger.info(rag_settings) with keys-only debug logging
- Add auth state reset and token validation: add OLLAMA_CHAT_AUTH_TOKEN and
  OLLAMA_EMBEDDING_AUTH_TOKEN to RAGSettingsProps interface; validate auth token
  is non-empty before persisting when useAuth is enabled for LLM and Embedding instances
- Replace hardcoded URL checks with configurable values: remove hardcoded
  http://host.docker.internal:11434/v1 exclusion from auto-load model effects
- Handle None OLLAMA_EMBEDDING_URL safely: use (value or "") pattern instead
  of .get(key, "") to avoid AttributeError when value is None
- Make api_mode comparison case-insensitive in _get_embedding_adapter and
  create_embeddings_batch to handle "Native"/"NATIVE" config values
- Add missing OCR dependencies (pymupdf4llm, pytesseract, pdf2image) to the
  all dependency group in pyproject.toml
- Remove unused imports (io, tempfile, os) from ocr_processing.py
- Extract long line in state_reconciliation.py to satisfy 120-char limit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

2405-2431: ⚠️ Potential issue | 🟠 Major

Avoid auto-testing protected instances before the token is persisted.

These modal saves only update local React state, then immediately call backend health/model endpoints that are keyed by instance_urls only. For a newly entered Ollama token, the backend still has no persisted credential yet, so authenticated remote instances can show a false failure until the page-level “Save Settings” runs.

Either persist updatedSettings before the follow-up test, or skip the automatic test/model refresh here and leave it to the main save flow.

Also applies to: 2499-2525

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 2405 -
2431, The save handler currently updates local state (setRagSettings) then
unconditionally runs manualTestConnection and fetchOllamaMetrics, causing false
failures for authenticated Ollama instances because the token isn't persisted;
modify the onClick flow in the LLM edit modal so that if
llmInstanceConfig.useAuth is true you either (A) persist updatedSettings to the
backend (call the existing settings save API / trigger the parent-level save
flow) and await success before invoking manualTestConnection and
fetchOllamaMetrics, or (B) skip the automatic test/metrics refresh entirely and
leave testing to the main "Save Settings" action; ensure this change targets the
onClick handler where updatedSettings is built and references
manualTestConnection, fetchOllamaMetrics, setRagSettings, and
setShowEditLLMModal so the automatic test only runs when credentials are
actually persisted.
♻️ Duplicate comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

142-176: ⚠️ Potential issue | 🟠 Major

Add OLLAMA_API_MODE to the local ragSettings type.

Lines 1612-1658 read ragSettings.OLLAMA_API_MODE, but RAGSettingsProps["ragSettings"] still omits that field. In strict TS this is an unsafe property access and typically a compile-time error. As per coding guidelines, archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code.

#!/bin/bash
set -euo pipefail

echo "RAGSettingsProps block:"
sed -n '142,176p' archon-ui-main/src/components/settings/RAGSettings.tsx

echo
echo "OLLAMA_API_MODE references:"
rg -n "OLLAMA_API_MODE" \
  archon-ui-main/src/components/settings/RAGSettings.tsx \
  archon-ui-main/src/services/credentialsService.ts
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 142 -
176, The ragSettings type inside RAGSettingsProps is missing the OLLAMA_API_MODE
field referenced elsewhere (ragSettings.OLLAMA_API_MODE); update the
RAGSettingsProps["ragSettings"] interface to include OLLAMA_API_MODE with the
correct type (e.g., string | undefined or a stricter enum/union if applicable)
so TypeScript strict mode no longer flags the property access in
functions/components that read ragSettings.OLLAMA_API_MODE (e.g., places using
that symbol in this file and in credentialsService).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 800-805: The handlers that delete Ollama config keys (they clone
ragSettings into updatedSettings, delete LLM_BASE_URL / MODEL_CHOICE /
OLLAMA_CHAT_AUTH_TOKEN and call setRagSettings) must also remove the persisted
instance name keys to avoid them being repopulated; update those blocks to
delete LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME from updatedSettings
before calling setRagSettings, and apply the same change in the other handler
referenced (the similar block around the second delete at lines 826-830).
- Around line 158-159: The RAG token keys OLLAMA_CHAT_AUTH_TOKEN and
OLLAMA_EMBEDDING_AUTH_TOKEN are missing from the
credentialsService.getRagSettings() shape/defaults and hydration logic; update
credentialsService.getRagSettings() to add these two keys to the RagSettings
type/defaults and include them in the string-field hydration path so values
persisted by the UI are restored on reload. Specifically, extend the RagSettings
object/return to include OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN
with appropriate default (empty string/null) and add those exact keys to any
code that iterates/string-hydrates token fields so they aren’t dropped. Ensure
references to RagSettings, getRagSettings, and the string-field hydration code
are updated together.
- Around line 238-245: The code builds newConfig using `newAuthToken ||
prev.authToken` which preserves a prior secret when the user clears the saved
token; change the logic in the `newConfig` construction so `authToken` is set to
`newAuthToken` if it is a non-empty value, otherwise set it to an empty string
(or null/undefined) when the user cleared the field and/or `useAuth` is false,
and ensure `useAuth` is properly computed from `newAuthToken`/the auth toggle;
update the same pattern elsewhere in this component where `newAuthToken ||
prev.authToken` is used (e.g., the second occurrence) so clearing the token
truly clears the stored secret and the summary no longer shows “Token
configured.”

In `@python/src/server/services/embeddings/embedding_service.py`:
- Line 459: The URL normalization for ollama_base_url currently uses
.replace("/v1", "") which can remove occurrences anywhere in the string; change
the logic that builds ollama_base_url (the expression using
rag_settings.get("OLLAMA_EMBEDDING_URL")) to use .removesuffix("/v1") followed
by .rstrip("/") so only a trailing "/v1" is removed and the URL is still trimmed
of trailing slashes.
- Around line 120-128: The current use of asyncio.gather with
self._fetch_single_embedding and httpx.AsyncClient launches all requests for
texts concurrently and can overwhelm the Ollama server or OS limits; change to
bounded concurrency by creating an asyncio.Semaphore (or
asyncio.BoundedSemaphore) with a reasonable max (e.g., 10–20), wrap calls to
self._fetch_single_embedding in a small async worker that acquires/releases the
semaphore, and then gather those bounded tasks instead of launching one task per
text unbounded; keep using the same AsyncClient instance inside the
semaphore-controlled tasks to reuse connections and return the collected
embeddings.

In `@python/src/server/utils/ocr_processing.py`:
- Around line 119-121: The except block catching Exception as e in the OCR
extraction flow currently logs with logfire.error(f"OCR extraction failed: {e}")
and returns None; update that call to include exc_info=True so the full
traceback is preserved in logs (i.e., change the logfire.error invocation in the
OCR extraction exception handler to pass exc_info=True while keeping the
descriptive message), ensuring this occurs in the function handling OCR
extraction where logfire is used.
- Around line 106-108: The except block that currently calls
logger.warning(f"OCR failed on page {page_num}: {e}") should preserve the full
stack trace; update the logging call in the OCR error handler (the except
handling OCR failures where logger.warning is used and page_num is referenced)
to include exc_info=True (or replace with logger.exception) so the full
exception stack is recorded (e.g., logger.warning(..., exc_info=True) or
logger.exception(...)).
- Around line 138-140: The except block in ocr_processing.py currently logs the
error with logger.warning(f"Could not get Tesseract languages: {e}") and returns
a default; update that logger call to pass exc_info=True (e.g.,
logger.warning("Could not get Tesseract languages", exc_info=True)) so the full
stack trace is preserved while still returning the fallback ["eng"]; locate the
logger.warning call in the function that retrieves Tesseract languages and
modify it accordingly.

---

Outside diff comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 2405-2431: The save handler currently updates local state
(setRagSettings) then unconditionally runs manualTestConnection and
fetchOllamaMetrics, causing false failures for authenticated Ollama instances
because the token isn't persisted; modify the onClick flow in the LLM edit modal
so that if llmInstanceConfig.useAuth is true you either (A) persist
updatedSettings to the backend (call the existing settings save API / trigger
the parent-level save flow) and await success before invoking
manualTestConnection and fetchOllamaMetrics, or (B) skip the automatic
test/metrics refresh entirely and leave testing to the main "Save Settings"
action; ensure this change targets the onClick handler where updatedSettings is
built and references manualTestConnection, fetchOllamaMetrics, setRagSettings,
and setShowEditLLMModal so the automatic test only runs when credentials are
actually persisted.

---

Duplicate comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 142-176: The ragSettings type inside RAGSettingsProps is missing
the OLLAMA_API_MODE field referenced elsewhere (ragSettings.OLLAMA_API_MODE);
update the RAGSettingsProps["ragSettings"] interface to include OLLAMA_API_MODE
with the correct type (e.g., string | undefined or a stricter enum/union if
applicable) so TypeScript strict mode no longer flags the property access in
functions/components that read ragSettings.OLLAMA_API_MODE (e.g., places using
that symbol in this file and in credentialsService).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0bb9ebf8-4d93-479c-ad40-fae14b0ebb4d

📥 Commits

Reviewing files that changed from the base of the PR and between 8c109ed and 83b3ef4.

📒 Files selected for processing (5)
  • archon-ui-main/src/components/settings/RAGSettings.tsx
  • python/pyproject.toml
  • python/src/agent_work_orders/utils/state_reconciliation.py
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/utils/ocr_processing.py
✅ Files skipped from review due to trivial changes (1)
  • python/src/agent_work_orders/utils/state_reconciliation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/pyproject.toml

Comment on lines +158 to +159
OLLAMA_CHAT_AUTH_TOKEN?: string;
OLLAMA_EMBEDDING_AUTH_TOKEN?: string;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror these new token keys in credentialsService.getRagSettings().

The UI now writes OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN, but archon-ui-main/src/services/credentialsService.ts still doesn't include them in its RagSettings shape/defaults or its string-field hydration path. Those values will be silently dropped on reload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 158 -
159, The RAG token keys OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN
are missing from the credentialsService.getRagSettings() shape/defaults and
hydration logic; update credentialsService.getRagSettings() to add these two
keys to the RagSettings type/defaults and include them in the string-field
hydration path so values persisted by the UI are restored on reload.
Specifically, extend the RagSettings object/return to include
OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN with appropriate default
(empty string/null) and add those exact keys to any code that
iterates/string-hydrates token fields so they aren’t dropped. Ensure references
to RagSettings, getRagSettings, and the string-field hydration code are updated
together.

Comment on lines 238 to +245
const newConfig = {
url: newLLMUrl || prev.url,
name: newLLMName || prev.name
name: newLLMName || prev.name,
useAuth: !!newAuthToken,
authToken: newAuthToken || prev.authToken
};
// Only update if actually different to prevent loops
if (newConfig.url !== prev.url || newConfig.name !== prev.name) {
if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.authToken !== prev.authToken) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't retain the previous auth token when the saved value is empty.

Using newAuthToken || prev.authToken means disabling auth never clears the local secret. The summary will still show “Token configured”, and re-enabling auth can silently reuse the stale token.

Suggested fix
         const newConfig = {
           url: newLLMUrl || prev.url,
           name: newLLMName || prev.name,
           useAuth: !!newAuthToken,
-          authToken: newAuthToken || prev.authToken
+          authToken: newAuthToken
         };
@@
         const newConfig = {
           url: newEmbeddingUrl || prev.url,
           name: newEmbeddingName || prev.name,
           useAuth: !!newAuthToken,
-          authToken: newAuthToken || prev.authToken
+          authToken: newAuthToken
         };

Also applies to: 261-268

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 238 -
245, The code builds newConfig using `newAuthToken || prev.authToken` which
preserves a prior secret when the user clears the saved token; change the logic
in the `newConfig` construction so `authToken` is set to `newAuthToken` if it is
a non-empty value, otherwise set it to an empty string (or null/undefined) when
the user cleared the field and/or `useAuth` is false, and ensure `useAuth` is
properly computed from `newAuthToken`/the auth toggle; update the same pattern
elsewhere in this component where `newAuthToken || prev.authToken` is used
(e.g., the second occurrence) so clearing the token truly clears the stored
secret and the summary no longer shows “Token configured.”

Comment on lines 800 to 805
// Clear related RAG settings
const updatedSettings = { ...ragSettings };
delete updatedSettings.LLM_BASE_URL;
delete updatedSettings.MODEL_CHOICE;
delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN;
setRagSettings(updatedSettings);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear the persisted instance names when deleting an Ollama config.

These handlers remove the URL/model/token, but they leave LLM_INSTANCE_NAME and OLLAMA_EMBEDDING_INSTANCE_NAME behind. The sync effects at Lines 230-274 can then repopulate the “deleted” instance with its old name on the next render.

Suggested fix
       const updatedSettings = { ...ragSettings };
       delete updatedSettings.LLM_BASE_URL;
+      delete updatedSettings.LLM_INSTANCE_NAME;
       delete updatedSettings.MODEL_CHOICE;
       delete updatedSettings.OLLAMA_CHAT_AUTH_TOKEN;
@@
       const updatedSettings = { ...ragSettings };
       delete updatedSettings.OLLAMA_EMBEDDING_URL;
+      delete updatedSettings.OLLAMA_EMBEDDING_INSTANCE_NAME;
       delete updatedSettings.EMBEDDING_MODEL;
       delete updatedSettings.OLLAMA_EMBEDDING_AUTH_TOKEN;

Also applies to: 826-830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 800 -
805, The handlers that delete Ollama config keys (they clone ragSettings into
updatedSettings, delete LLM_BASE_URL / MODEL_CHOICE / OLLAMA_CHAT_AUTH_TOKEN and
call setRagSettings) must also remove the persisted instance name keys to avoid
them being repopulated; update those blocks to delete LLM_INSTANCE_NAME and
OLLAMA_EMBEDDING_INSTANCE_NAME from updatedSettings before calling
setRagSettings, and apply the same change in the other handler referenced (the
similar block around the second delete at lines 826-830).

Comment on lines +120 to +128
try:
async with httpx.AsyncClient(timeout=60.0) as http_client:
embeddings = await asyncio.gather(
*(
self._fetch_single_embedding(http_client, model, text)
for text in texts
)
)
return embeddings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unbounded parallelism may overwhelm the Ollama server.

asyncio.gather launches all requests simultaneously without concurrency limits. For large batches (e.g., 100+ texts), this creates many simultaneous HTTP connections that could exhaust Ollama server resources or hit OS connection limits.

Consider limiting concurrency with a semaphore:

🛠️ Proposed fix
+    _MAX_CONCURRENT_REQUESTS = 10  # class-level constant

     async def create_embeddings(
         self,
         texts: list[str],
         model: str,
         dimensions: int | None = None,
     ) -> list[list[float]]:
         try:
+            semaphore = asyncio.Semaphore(self._MAX_CONCURRENT_REQUESTS)
+
+            async def limited_fetch(text: str) -> list[float]:
+                async with semaphore:
+                    return await self._fetch_single_embedding(http_client, model, text)
+
             async with httpx.AsyncClient(timeout=60.0) as http_client:
                 embeddings = await asyncio.gather(
-                    *(
-                        self._fetch_single_embedding(http_client, model, text)
-                        for text in texts
-                    )
+                    *(limited_fetch(text) for text in texts)
                 )
             return embeddings
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/services/embeddings/embedding_service.py` around lines 120
- 128, The current use of asyncio.gather with self._fetch_single_embedding and
httpx.AsyncClient launches all requests for texts concurrently and can overwhelm
the Ollama server or OS limits; change to bounded concurrency by creating an
asyncio.Semaphore (or asyncio.BoundedSemaphore) with a reasonable max (e.g.,
10–20), wrap calls to self._fetch_single_embedding in a small async worker that
acquires/releases the semaphore, and then gather those bounded tasks instead of
launching one task per text unbounded; keep using the same AsyncClient instance
inside the semaphore-controlled tasks to reuse connections and return the
collected embeddings.

embedding_dimensions = int(rag_settings.get("EMBEDDING_DIMENSIONS", "1536"))

# For Ollama, get native API URL, auth token, and API mode
ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use removesuffix instead of replace for URL normalization.

.replace("/v1", "") removes all occurrences of /v1 anywhere in the URL, not just the trailing suffix. While unlikely in practice, URLs like http://v1-proxy.example.com/ollama/v1 would be incorrectly modified.

🔧 Proposed fix
-                    ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")
+                    ollama_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").rstrip("/").removesuffix("/v1")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/services/embeddings/embedding_service.py` at line 459, The
URL normalization for ollama_base_url currently uses .replace("/v1", "") which
can remove occurrences anywhere in the string; change the logic that builds
ollama_base_url (the expression using rag_settings.get("OLLAMA_EMBEDDING_URL"))
to use .removesuffix("/v1") followed by .rstrip("/") so only a trailing "/v1" is
removed and the URL is still trimmed of trailing slashes.

Comment on lines +106 to +108
except Exception as e:
logger.warning(f"OCR failed on page {page_num}: {e}")
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add exc_info=True to preserve stack traces in error logging.

Per coding guidelines, error logging should include full stack traces for debugging. The current warning loses valuable diagnostic information.

🔧 Proposed fix
             except Exception as e:
-                logger.warning(f"OCR failed on page {page_num}: {e}")
+                logger.warning(f"OCR failed on page {page_num}: {e}", exc_info=True)
                 continue

As per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/utils/ocr_processing.py` around lines 106 - 108, The except
block that currently calls logger.warning(f"OCR failed on page {page_num}: {e}")
should preserve the full stack trace; update the logging call in the OCR error
handler (the except handling OCR failures where logger.warning is used and
page_num is referenced) to include exc_info=True (or replace with
logger.exception) so the full exception stack is recorded (e.g.,
logger.warning(..., exc_info=True) or logger.exception(...)).

Comment on lines +119 to +121
except Exception as e:
logfire.error(f"OCR extraction failed: {e}")
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add exc_info=True to the error log for complete stack traces.

The outer exception handler should preserve the full stack trace for debugging OCR failures. This is especially important since the function returns None on failure, making post-mortem debugging reliant on logs.

🔧 Proposed fix
     except Exception as e:
-        logfire.error(f"OCR extraction failed: {e}")
+        logfire.error(f"OCR extraction failed: {e}", exc_info=True)
         return None

As per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logfire.error(f"OCR extraction failed: {e}")
return None
except Exception as e:
logfire.error(f"OCR extraction failed: {e}", exc_info=True)
return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/utils/ocr_processing.py` around lines 119 - 121, The except
block catching Exception as e in the OCR extraction flow currently logs with
logfire.error(f"OCR extraction failed: {e}") and returns None; update that call
to include exc_info=True so the full traceback is preserved in logs (i.e.,
change the logfire.error invocation in the OCR extraction exception handler to
pass exc_info=True while keeping the descriptive message), ensuring this occurs
in the function handling OCR extraction where logfire is used.

Comment on lines +138 to +140
except Exception as e:
logger.warning(f"Could not get Tesseract languages: {e}")
return ["eng"] # Default fallback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add exc_info=True for consistency with error logging guidelines.

🔧 Proposed fix
     except Exception as e:
-        logger.warning(f"Could not get Tesseract languages: {e}")
+        logger.warning(f"Could not get Tesseract languages: {e}", exc_info=True)
         return ["eng"]  # Default fallback

As per coding guidelines: "preserve full stack traces with exc_info=True in Python logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/utils/ocr_processing.py` around lines 138 - 140, The except
block in ocr_processing.py currently logs the error with logger.warning(f"Could
not get Tesseract languages: {e}") and returns a default; update that logger
call to pass exc_info=True (e.g., logger.warning("Could not get Tesseract
languages", exc_info=True)) so the full stack trace is preserved while still
returning the fallback ["eng"]; locate the logger.warning call in the function
that retrieves Tesseract languages and modify it accordingly.

@Wirasm

Wirasm commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator

🔄 This repository is being replaced by a new version of Archon.

The original Python/MCP codebase is being archived to the archive/v1-python-mcp branch. The new Archon (TypeScript workflow engine for AI coding agents) is replacing it.

This PR is being closed as part of the migration. Thank you for your contribution!

@Wirasm Wirasm closed this Apr 6, 2026
coleam00 added a commit that referenced this pull request Apr 7, 2026
* fix: auto-select first DAG node on workflow run load (#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes #781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (#854) (#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes #854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes #889

* fix(cli): address review findings for PR #890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
* fix: auto-select first DAG node on workflow run load (coleam00#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes coleam00#781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes coleam00#854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (coleam00#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (coleam00#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes coleam00#889

* fix(cli): address review findings for PR coleam00#890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
* fix: auto-select first DAG node on workflow run load (coleam00#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes coleam00#781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes coleam00#854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (coleam00#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (coleam00#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes coleam00#889

* fix(cli): address review findings for PR coleam00#890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants