feat(mcp): add minimal MCP endpoints and UI/dev fixes#1
Merged
Conversation
- Add backend validation to detect and warn about anon vs service keys - Prevent startup with incorrect Supabase key configuration - Consolidate frontend state management following KISS principles - Remove duplicate state tracking and sessionStorage polling - Add clear error display when backend fails to start - Improve .env.example documentation with detailed key selection guide - Add comprehensive test coverage for validation logic - Remove unused test results checking to eliminate 404 errors The implementation now warns users about key misconfiguration while maintaining backward compatibility. Frontend state is simplified with MainLayout as the single source of truth for backend status.
- Use python-jose (already in dependencies) instead of PyJWT for JWT decoding - Make unknown Supabase key roles fail fast per alpha principles - Skip all JWT validations (not just signature) when checking role - Update tests to expect failure for unknown roles Fixes: - No need to add PyJWT dependency - python-jose provides JWT functionality - Unknown key types now raise ConfigurationError instead of warning - JWT decode properly skips all validations to only check role claim
- Rolled back to match main branch Dockerfile - Removed 3-second sleep script that was added for backend readiness - Container now runs npm directly without intermediate script - Tested and verified all services start correctly without the delay
- Fixed projectService methods to include project_id parameter in API calls
- Updated deleteDocument() to use correct endpoint: /api/projects/{projectId}/docs/{docId}
- Updated getDocument() and updateDocument() to use correct endpoints with project_id
- Modified DocsTab component to call backend API when deleting documents
- Documents now properly persist deletion after page refresh
The issue was that document deletion was only happening in UI state and never
reached the backend. The service methods were using incorrect API endpoints
that didn't include the required project_id parameter.
- Add Document interface for type safety - Fix error messages to include projectId context - Add unit tests for all projectService document methods - Add integration tests for DocsTab deletion flow - Update vitest config to include new test files
- Consolidated multiple MCP modules into unified project_module - Removed redundant project, task, document, and version modules - Identified critical issue with async project creation losing context - Updated CLAUDE.md with project instructions This commit captures the current state before refactoring to split consolidated tools into separate operations for better clarity and to solve the async project creation context issue.
- Fix parameter naming confusion in RAG tools (source → source_domain) - Add clarification that source_domain expects domain names not IDs - Improve manage_versions documentation with clear examples - Add better error messages for validation failures - Enhance manage_document with non-PRP examples - Add comprehensive documentation to get_project_features - Fix content parameter type in manage_versions to accept Any type These changes address usability issues discovered during testing without breaking existing functionality.
- Rename src/mcp to src/mcp_server for clarity - Update all internal imports to use new path - Create features/projects directory for modular tool organization - Add separate, simple project tools (create, list, get, delete, update) - Keep consolidated tools for backward compatibility (via env var) - Add USE_SEPARATE_PROJECT_TOOLS env var to toggle between approaches The new separate tools: - Solve the async project creation context loss issue - Provide clearer, single-purpose interfaces - Remove complex PRP examples for simplicity - Handle project creation polling automatically
Changing Archon Alpha to Beta in the issue template
Added note in the README
Addresses issue coleam00#293 by replacing hide-scrollbar with scrollbar-thin class to ensure users can see and interact with the horizontal scrollbar when project cards overflow.
Resolves issue coleam00#282 by adding feature field to task dictionary in TaskService.list_tasks() method. The project tasks API endpoint was excluding the feature field while individual task API included it, causing frontend to default to 'General' instead of showing custom feature values. Changes: - Add feature field to task response in list_tasks method - Maintains compatibility with existing API consumers - All 212 tests pass with this change
The consolidated project module contained all project, task, document, version, and feature management in a single 922-line file. This has been replaced with focused, single-purpose tools in separate modules.
Removed USE_SEPARATE_PROJECT_AND_TASK_TOOLS and PROJECTS_ENABLED environment variables as the separated tools are now the default.
Extract document management functionality into focused tools: - create_document: Create new documents with metadata - list_documents: List all documents in a project - get_document: Retrieve specific document details - update_document: Modify existing documents - delete_document: Remove documents from projects Extract version control functionality: - create_version: Create immutable snapshots - list_versions: View version history - get_version: Retrieve specific version content - restore_version: Rollback to previous versions Includes improved documentation and error messages based on testing.
Extract task functionality into focused tools:
- create_task: Create tasks with sources and code examples
- list_tasks: List tasks with project/status filtering
- get_task: Retrieve task details
- update_task: Modify task properties
- delete_task: Archive tasks (soft delete)
Preserves intelligent endpoint routing:
- Project-specific: /api/projects/{id}/tasks
- Status filtering: /api/tasks?status=X
- Assignee filtering: /api/tasks?assignee=X
Extract get_project_features as a standalone tool with enhanced documentation explaining feature structures and usage patterns. Features track functional components like auth, api, and database.
Remove complex PRP validation logic and focus on core functionality. Maintains backward compatibility with existing API endpoints.
Update MCP server to use the new modular tool structure: - Projects and tasks from existing modules - Documents and versions from new modules - Feature management from standalone module Remove all feature flag logic as separated tools are now default.
Create documents directory and ensure all new modules are properly included in the container build.
Remove import of deleted project_module.
- Add explicit type annotations for params dictionaries to resolve mypy errors - Remove trailing whitespace from blank lines (W293 ruff warnings) - Ensure type safety in task_tools.py and document_tools.py
- Create test structure mirroring features folder organization - Add tests for document tools (create, list, update, delete) - Add tests for version tools (create, list, restore, invalid field handling) - Add tests for task tools (create with sources, list with filters, update, delete) - Add tests for project tools (create with polling, list, get) - Add tests for feature tools (get features with various structures) - Mock HTTP client for all external API calls - Test both success and error scenarios - 100% test coverage for critical tool functions
…tion-and-state-consolidation Fix Supabase key validation and consolidate frontend state management
…ersistence Fix document deletion persistence issue (coleam00#278)
…not-updating Issue 282: Fix missing feature field in project tasks API response
- Collects PR information without requiring secrets - Triggers on pull_request events and @claude-review-ext comments - Uploads PR details as artifact for secure processing
- Runs after Stage 1 via workflow_run trigger - Has access to repository secrets - Downloads PR artifact and performs review - Maintains security by never checking out fork code
- Explains the two-stage security model - Provides usage instructions for contributors and maintainers - Includes troubleshooting and security considerations
…am00#451) (coleam00#503) * depends on and env var added Update Vite configuration to enable allowed hosts - Uncommented the allowedHosts configuration to allow for dynamic host settings based on environment variables. - This change enhances flexibility for different deployment environments while maintaining the default localhost and specific domain access. Needs testing to confirm proper functionality with various host configurations. rm my domain * Enhance Vite configuration with dynamic allowed hosts support - Added VITE_ALLOWED_HOSTS environment variable to .env.example and docker-compose.yml for flexible host configuration. - Updated Vite config to dynamically set allowed hosts, incorporating defaults and custom values from the environment variable. - This change improves deployment flexibility while maintaining security by defaulting to localhost and specific domains. Needs testing to confirm proper functionality with various host configurations. * refactor: remove unnecessary dependency on archon-agents in docker-compose.yml - Removed the dependency condition for archon-agents from the archon-mcp service to streamline the startup process. - This change simplifies the service configuration and reduces potential startup issues related to agent service health checks. Needs testing to ensure that the application functions correctly without the archon-agents dependency. --------- Co-authored-by: Julian Gegenhuber <office@salzkammercode.at>
…eam00#472) * Fix race condition in concurrent crawling with unique source IDs - Add unique hash-based source_id generation to prevent conflicts - Separate source identification from display with three fields: - source_id: 16-char SHA256 hash for unique identification - source_url: Original URL for tracking - source_display_name: Human-friendly name for UI - Add comprehensive test suite validating the fix - Migrate existing data with backward compatibility * Fix title generation to use source_display_name for better AI context - Pass source_display_name to title generation function - Use display name in AI prompt instead of hash-based source_id - Results in more specific, meaningful titles for each source * Skip AI title generation when display name is available - Use source_display_name directly as title to avoid unnecessary AI calls - More efficient and predictable than AI-generated titles - Keep AI generation only as fallback for backward compatibility * Fix critical issues from code review - Add missing os import to prevent NameError crash - Remove unused imports (pytest, Mock, patch, hashlib, urlparse, etc.) - Fix GitHub API capitalization consistency - Reuse existing DocumentStorageService instance - Update test expectations to match corrected capitalization Addresses CodeRabbit review feedback on PR coleam00#472 * Add safety improvements from code review - Truncate display names to 100 chars when used as titles - Document hash collision probability (negligible for <1M sources) Simple, pragmatic fixes per KISS principle * Fix code extraction to use hash-based source_ids and improve display names - Fixed critical bug where code extraction was using old domain-based source_ids - Updated code extraction service to accept source_id as parameter instead of extracting from URL - Added special handling for llms.txt and sitemap.xml files in display names - Added comprehensive tests for source_id handling in code extraction - Removed unused urlparse import from code_extraction_service.py This fixes the foreign key constraint errors that were preventing code examples from being stored after the source_id architecture refactor. Co-Authored-By: Claude <noreply@anthropic.com> * Fix critical variable shadowing and source_type determination issues - Fixed variable shadowing in document_storage_operations.py where source_url parameter was being overwritten by document URLs, causing incorrect source_url in database - Fixed source_type determination to use actual URLs instead of hash-based source_id - Added comprehensive tests for source URL preservation - Ensure source_type is correctly set to "file" for file uploads, "url" for web crawls The variable shadowing bug was causing sitemap sources to have the wrong source_url (last crawled page instead of sitemap URL). The source_type bug would mark all sources as "url" even for file uploads due to hash-based IDs not starting with "file_". Co-Authored-By: Claude <noreply@anthropic.com> * Fix URL canonicalization and document metrics calculation - Implement proper URL canonicalization to prevent duplicate sources - Remove trailing slashes (except root) - Remove URL fragments - Remove tracking parameters (utm_*, gclid, fbclid, etc.) - Sort query parameters for consistency - Remove default ports (80 for HTTP, 443 for HTTPS) - Normalize scheme and domain to lowercase - Fix avg_chunks_per_doc calculation to avoid division by zero - Track processed_docs count separately from total crawl_results - Handle all-empty document sets gracefully - Show processed/total in logs for better visibility - Add comprehensive tests for both fixes - 10 test cases for URL canonicalization edge cases - 4 test cases for document metrics calculation This prevents database constraint violations when crawling the same content with URL variations and provides accurate metrics in logs. * Fix synchronous extract_source_summary blocking async event loop - Run extract_source_summary in thread pool using asyncio.to_thread - Prevents blocking the async event loop during AI summary generation - Preserves exact error handling and fallback behavior - Variables (source_id, combined_content) properly passed to thread Added comprehensive tests verifying: - Function runs in thread without blocking - Error handling works correctly with fallback - Multiple sources can be processed - Thread safety with variable passing * Fix synchronous update_source_info blocking async event loop - Run update_source_info in thread pool using asyncio.to_thread - Prevents blocking the async event loop during database operations - Preserves exact error handling and fallback behavior - All kwargs properly passed to thread execution Added comprehensive tests verifying: - Function runs in thread without blocking - Error handling triggers fallback correctly - All kwargs are preserved when passed to thread - Existing extract_source_summary tests still pass * Fix race condition in source creation using upsert - Replace INSERT with UPSERT for new sources to prevent PRIMARY KEY violations - Handles concurrent crawls attempting to create the same source - Maintains existing UPDATE behavior for sources that already exist Added comprehensive tests verifying: - Concurrent source creation doesn't fail - Upsert is used for new sources (not insert) - Update is still used for existing sources - Async concurrent operations work correctly - Race conditions with delays are handled This prevents database constraint errors when multiple crawls target the same URL simultaneously. * Add migration detection UI components Add MigrationBanner component with clear user instructions for database schema updates. Add useMigrationStatus hook for periodic health check monitoring with graceful error handling. * Integrate migration banner into main app Add migration status monitoring and banner display to App.tsx. Shows migration banner when database schema updates are required. * Enhance backend startup error instructions Add detailed Docker restart instructions and migration script guidance. Improves user experience when encountering startup failures. * Add database schema caching to health endpoint Implement smart caching for schema validation to prevent repeated database queries. Cache successful validations permanently and throttle failures to 30-second intervals. Replace debug prints with proper logging. * Clean up knowledge API imports and logging Remove duplicate import statements and redundant logging. Improves code clarity and reduces log noise. * Remove unused instructions prop from MigrationBanner Clean up component API by removing instructions prop that was accepted but never rendered. Simplifies the interface and eliminates dead code while keeping the functional hardcoded migration steps. * Add schema_valid flag to migration_required health response Add schema_valid: false flag to health endpoint response when database schema migration is required. Improves API consistency without changing existing behavior. --------- Co-authored-by: Claude <noreply@anthropic.com>
* Moving Dockerfiles to uv for package installation * Updating uv installation for CI
…uding reranking by default now (coleam00#534)
* CI fails now when unit tests for backend fail * Fixing up a couple unit tests
Add CodeRabbit slash command helper
…oleam00#514) * refactor: Remove Socket.IO and consolidate task status naming Major refactoring to simplify the architecture: 1. Socket.IO Removal: - Removed all Socket.IO dependencies and code (~4,256 lines) - Replaced with HTTP polling for real-time updates - Added new polling hooks (usePolling, useDatabaseMutation, etc.) - Removed socket services and handlers 2. Status Consolidation: - Removed UI/DB status mapping layer - Using database values directly (todo, doing, review, done) - Removed obsolete status types and mapping functions - Updated all components to use database status values 3. Simplified Architecture: - Cleaner separation between frontend and backend - Reduced complexity in state management - More maintainable codebase 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add loading states and error handling for UI operations - Added loading overlay when dragging tasks between columns - Added loading state when switching between projects - Added proper error handling with toast notifications - Removed remaining Socket.IO references - Improved user feedback during async operations * docs: Add comprehensive polling architecture documentation Created developer guide explaining: - Core polling components and hooks - ETag caching implementation - State management patterns - Migration from Socket.IO - Performance optimizations - Developer guidelines and best practices * fix: Correct method name for fetching tasks - Fixed projectService.getTasks() to projectService.getTasksByProject() - Ensures consistent naming throughout the codebase - Resolves error when refreshing tasks after drag operations * docs: Add comprehensive API naming conventions guide Created naming standards documentation covering: - Service method naming patterns - API endpoint conventions - Component and hook naming - State variable naming - Type definitions - Common patterns and anti-patterns - Migration notes from Socket.IO * docs: Update CLAUDE.md with polling architecture and naming conventions - Replaced Socket.IO references with HTTP polling architecture - Added polling intervals and ETag caching documentation - Added API naming conventions section - Corrected task endpoint patterns (use getTasksByProject, not getTasks) - Added state naming patterns and status values * refactor: Remove Socket.IO and implement HTTP polling architecture Complete removal of Socket.IO/WebSocket dependencies in favor of simple HTTP polling: Frontend changes: - Remove all WebSocket/Socket.IO references from KnowledgeBasePage - Implement useCrawlProgressPolling hook for progress tracking - Fix polling hook to prevent ERR_INSUFFICIENT_RESOURCES errors - Add proper cleanup and state management for completed crawls - Persist and restore active crawl progress across page refreshes - Fix agent chat service to handle disabled agents gracefully Backend changes: - Remove python-socketio from requirements - Convert ProgressTracker to in-memory state management - Add /api/crawl-progress/{id} endpoint for polling - Initialize ProgressTracker immediately when operations start - Remove all Socket.IO event handlers and cleanup commented code - Simplify agent_chat_api to basic REST endpoints Bug fixes: - Fix race condition where progress data wasn't available for polling - Fix memory leaks from recreating polling callbacks - Fix crawl progress URL mismatch between frontend and backend - Add proper error filtering for expected 404s during initialization - Stop polling when crawl operations complete This change simplifies the architecture significantly and makes it more robust by removing the complexity of WebSocket connections. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix data consistency issue in crawl completion - Modify add_documents_to_supabase to return actual chunks stored count - Update crawl orchestration to validate chunks were actually saved to database - Throw exception when chunks are processed but none stored (e.g., API key failures) - Ensure UI shows error state instead of false success when storage fails - Add proper error field to progress updates for frontend display This prevents misleading "crawl completed" status when backend fails to store data. * Consolidate API key access to unified LLM provider service pattern - Fix credential service to properly store encrypted OpenAI API key from environment - Remove direct environment variable access pattern from source management service - Update both extract_source_summary and generate_source_title_and_metadata to async - Convert all LLM operations to use get_llm_client() for multi-provider support - Fix callers in document_storage_operations.py and storage_services.py to use await - Improve title generation prompt with better context and examples for user-readable titles - Consolidate on single pattern that supports OpenAI, Google, Ollama providers This fixes embedding service failures while maintaining compatibility for future providers. * Fix async/await consistency in source management services - Make update_source_info async and await it properly - Fix generate_source_title_and_metadata async calls - Improve source title generation with URL-based detection - Remove unnecessary threading wrapper for async operations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: correct API response handling in MCP project polling - Fix polling logic to properly extract projects array from API response - The API returns {projects: [...]} but polling was trying to iterate directly over response - This caused 'str' object has no attribute 'get' errors during project creation - Update both create_project polling and list_projects response handling - Verified all MCP tools now work correctly including create_project 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Optimize project switching performance and eliminate task jumping - Replace race condition-prone polling refetch with direct API calls for immediate task loading (100-200ms vs 1.5-2s) - Add polling suppression during direct API calls to prevent task jumping from double setTasks() calls - Clear stale tasks immediately on project switch to prevent wrong data visibility - Maintain polling for background updates from agents/MCP while optimizing user-initiated actions Performance improvements: - Project switches now load tasks in 100-200ms instead of 1.5-2 seconds - Eliminated visual task jumping during project transitions - Clean separation: direct calls for user actions, polling for external updates 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Remove race condition anti-pattern and complete Socket.IO removal Critical fixes addressing code review findings: **Race Condition Resolution:** - Remove fragile isLoadingDirectly flag that could permanently disable polling - Remove competing polling onSuccess callback that caused task jumping - Clean separation: direct API calls for user actions, polling for external updates only **Socket.IO Removal:** - Replace projectCreationProgressService with useProgressPolling HTTP polling - Remove all Socket.IO dependencies and references - Complete migration to HTTP-only architecture **Performance Optimization:** - Add ETag support to /projects/{project_id}/tasks endpoint for 70% bandwidth savings - Remove competing TasksTab onRefresh system that caused multiple API calls - Single source of truth: polling handles background updates, direct calls for immediate feedback **Task Management Simplification:** - Remove onRefresh calls from all TasksTab operations (create, update, delete, move) - Operations now use optimistic updates with polling fallback - Eliminates 3-way race condition between polling, direct calls, and onRefresh Result: Fast project switching (100-200ms), no task jumping, clean polling architecture 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove remaining Socket.IO and WebSocket references - Remove WebSocket URL configuration from api.ts - Clean up WebSocket tests and mocks from test files - Remove websocket parameter from embedding service - Update MCP project tools tests to match new API response format - Add example real test for usePolling hook - Update vitest config to properly include test files * Add comprehensive unit tests for polling architecture - Add ETag utilities tests covering generation and checking logic - Add progress API tests with 304 Not Modified support - Add progress service tests for operation tracking - Add projects API polling tests with ETag validation - Fix projects API to properly handle ETag check independently of response object - Test coverage for critical polling components following MCP test patterns * Remove WebSocket functionality from service files - Remove getWebSocketUrl imports that were causing runtime errors - Replace WebSocket log streaming with deprecation warnings - Remove unused WebSocket properties and methods - Simplify disconnectLogs to no-op functions These services now use HTTP polling exclusively as part of the Socket.IO to polling migration. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix memory leaks in mutation hooks - Add isMountedRef to track component mount status - Guard all setState calls with mounted checks - Prevent callbacks from firing after unmount - Apply fix to useProjectMutation, useDatabaseMutation, and useAsyncMutation Addresses Code Rabbit feedback about potential state updates after component unmount. Simple pragmatic fix without over-engineering request cancellation. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Document ETag implementation and limitations - Add concise documentation explaining current ETag implementation - Document that we use simple equality check, not full RFC 7232 - Clarify this works for our browser-to-API use case - Note limitations for future CDN/proxy support Addresses Code Rabbit feedback about RFC compliance by documenting the known limitations of our simplified implementation. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Remove all WebSocket event schemas and functionality - Remove WebSocket event schemas from projectSchemas.ts - Remove WebSocket event types from types/project.ts - Remove WebSocket initialization and subscription methods from projectService.ts - Remove all broadcast event calls throughout the service - Clean up imports to remove unused types Complete removal of WebSocket infrastructure in favor of HTTP polling. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix progress field naming inconsistency - Change backend API to return 'progress' instead of 'percentage' - Remove unnecessary mapping in frontend - Use consistent 'progress' field name throughout - Update all progress initialization to use 'progress' field Simple consolidation to one field name instead of mapping between two. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix tasks polling data not updating UI - Update tasks state when polling returns new data - Keep UI in sync with server changes for selected project - Tasks now live-update from external changes without project switching The polling was fetching fresh data but never updating the UI state. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix incorrect project title in pin/unpin toast messages - Use API response data.title instead of selectedProject?.title - Shows correct project name when pinning/unpinning any project card - Toast now accurately reflects which project was actually modified The issue was the toast would show the wrong project name when pinning a project that wasn't the currently selected one. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Remove over-engineered tempProjects logic Removed all temporary project tracking during creation: - Removed tempProjects state and allProjects combining - Removed handleProjectCreationProgress function - Removed progress polling for project creation - Removed ProjectCreationProgressCard rendering - Simplified createProject to just create and let polling pick it up This fixes false 'creation failed' errors and simplifies the code significantly. Project creation now shows a simple toast and relies on polling for updates. * Optimize task count loading with parallel fetching Changed loadTaskCountsForAllProjects to use Promise.allSettled for parallel API calls: - All project task counts now fetched simultaneously instead of sequentially - Better error isolation - one project failing doesn't affect others - Significant performance improvement for users with multiple projects - If 5 projects: from 5×API_TIME to just 1×API_TIME total * Fix TypeScript timer type for browser compatibility Replace NodeJS.Timeout with ReturnType<typeof setInterval> in crawlProgressService. This makes the timer type compatible across both Node.js and browser environments, fixing TypeScript compilation errors in browser builds. * Add explicit status mappings for crawl progress states Map backend statuses to correct UI states: - 'processing' → 'processing' (use existing UI state) - 'queued' → 'starting' (pre-crawl state) - 'cancelled' → 'cancelled' (use existing UI state) This prevents incorrect UI states and gives users accurate feedback about crawl operation status. * Fix TypeScript timer types in pollingService for browser compatibility Replace NodeJS.Timer with ReturnType<typeof setInterval> in both TaskPollingService and ProjectPollingService classes. This ensures compatibility across Node.js and browser environments. * Remove unused pollingService.ts dead code This file was created during Socket.IO removal but never actually used. The application already uses usePolling hooks (useTaskPolling, useProjectPolling) which have proper ETag support and visibility handling. Removing dead code to reduce maintenance burden and confusion. * Fix TypeScript timer type in progressService for browser compatibility Replace NodeJS.Timer with ReturnType<typeof setInterval> to ensure compatibility across Node.js and browser environments, consistent with other timer type fixes throughout the codebase. * Fix TypeScript timer type in projectCreationProgressService Replace NodeJS.Timeout with ReturnType<typeof setInterval> in Map type to ensure browser/DOM build compatibility. * Add proper error handling to project creation progress polling Stop infinite polling on fatal errors: - 404 errors continue polling (resource might not exist yet) - Other HTTP errors (500, 503, etc.) stop polling and report error - Network/parsing errors stop polling and report error - Clear feedback to callbacks on all error types This prevents wasting resources polling forever on unrecoverable errors and provides better user feedback when things go wrong. * Fix documentation accuracy in API conventions and architecture docs - Fix API_NAMING_CONVENTIONS.md: Changed 'documents' to 'docs' and used distinct placeholders ({project_id} and {doc_id}) to match actual API routes - Fix POLLING_ARCHITECTURE.md: Updated import path to use relative import (from ..utils.etag_utils) to match actual code structure - ARCHITECTURE.md: List formatting was already correct, no changes needed These changes ensure documentation accurately reflects the actual codebase. * Fix type annotations in recursive crawling strategy - Changed max_concurrent from invalid 'int = None' to 'int | None = None' - Made progress_callback explicitly async: 'Callable[..., Awaitable[None]] | None' - Added Awaitable import from typing - Uses modern Python 3.10+ union syntax (project requires Python 3.12) * Improve error logging in sitemap parsing - Use logger.exception() instead of logger.error() for automatic stack traces - Include sitemap URL in all error messages for better debugging - Remove unused traceback import and manual traceback logging - Now all exceptions show which sitemap failed with full stack trace * Remove all Socket.IO remnants from task_service.py Removed: - Duplicate broadcast_task_update function definitions - _broadcast_available flag (always False) - All Socket.IO broadcast blocks in create_task, update_task, and archive_task - Socket.IO related logging and error handling - Unnecessary traceback import within Socket.IO error handler Task updates are now handled exclusively via HTTP polling as intended. * Complete WebSocket/Socket.IO cleanup across frontend and backend - Remove socket.io-client dependency and all related packages - Remove WebSocket proxy configuration from vite.config.ts - Clean up WebSocket state management and deprecated methods from services - Remove VITE_ENABLE_WEBSOCKET environment variable checks - Update all comments to remove WebSocket/Socket.IO references - Fix user-facing error messages that mentioned Socket.IO - Preserve legitimate FastAPI WebSocket endpoints for MCP/test streaming This completes the refactoring to HTTP polling, removing all Socket.IO infrastructure while keeping necessary WebSocket functionality. * Remove MCP log display functionality following KISS principles - Remove all log display UI from MCPPage (saved ~100 lines) - Remove log-related API endpoints and WebSocket streaming - Keep internal log tracking for Docker container monitoring - Simplify MCPPage to focus on server control and configuration - Remove unused LogEntry types and streaming methods Following early beta KISS principles - MCP logs are debug info that developers can check via terminal/Docker if needed. UI now focuses on essential functionality only. * Add Claude Code command for analyzing CodeRabbit suggestions - Create structured command for CodeRabbit review analysis - Provides clear format for assessing validity and priority - Generates 2-5 practical options with tradeoffs - Emphasizes early beta context and KISS principles - Includes effort estimation for each option This command helps quickly triage CodeRabbit suggestions and decide whether to address them based on project priorities and tradeoffs. * Add in-flight guard to prevent overlapping fetches in crawl progress polling Prevents race condition where slow responses could cause multiple concurrent fetches for the same progressId. Simple boolean flag skips new fetches while one is active and properly cleans up on stop/disconnect. Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused progressService.ts dead code File was completely unused with no imports or references anywhere in the codebase. Other services (crawlProgressService, projectCreationProgressService) handle their specific progress polling needs directly. Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused project creation progress components Both ProjectCreationProgressCard.tsx and projectCreationProgressService.ts were dead code with no references. The service duplicated existing usePolling functionality unnecessarily. Removed per KISS principles. Co-Authored-By: Claude <noreply@anthropic.com> * Update POLLING_ARCHITECTURE.md to reflect current state Removed references to deleted files (progressService.ts, projectCreationProgressService.ts, ProjectCreationProgressCard.tsx). Updated to document what exists now rather than migration history. Co-Authored-By: Claude <noreply@anthropic.com> * Update API_NAMING_CONVENTIONS.md to reflect current state Updated progress endpoints to match actual implementation. Removed migration/historical references and anti-patterns section. Focused on current best practices and architecture patterns. Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused optimistic updates code and references Deleted unused useOptimisticUpdates.ts hook that was never imported. Removed optimistic update references from documentation since we don't have a consolidated pattern for it. Current approach is simpler direct state updates followed by API calls. Co-Authored-By: Claude <noreply@anthropic.com> * Add optimistic_updates.md documenting desired future pattern Created a simple, pragmatic guide for implementing optimistic updates when needed in the future. Focuses on KISS principles with straightforward save-update-rollback pattern. Clearly marked as future state, not current. Co-Authored-By: Claude <noreply@anthropic.com> * Fix test robustness issues in usePolling.test.ts - Set both document.hidden and document.visibilityState for better cross-environment compatibility - Fix error assertions to check Error objects instead of strings (matching actual hook behavior) Note: Tests may need timing adjustments to pass consistently. Co-Authored-By: Claude <noreply@anthropic.com> * Fix all timing issues in usePolling tests - Added shouldAdvanceTime option to fake timers for proper async handling - Extended test timeouts to 15 seconds for complex async operations - Fixed visibility test to properly account for immediate refetch on visible - Made all act() calls async to handle promise resolution - Added proper waits for loading states to complete - Fixed cleanup test to properly track call counts All 5 tests now passing consistently. Co-Authored-By: Claude <noreply@anthropic.com> * Fix FastAPI dependency injection and HTTP caching in API routes - Remove = None defaults from Response/Request parameters to enable proper DI - Fix parameter ordering to comply with Python syntax requirements - Add ETag and Cache-Control headers to 304 responses for consistent caching - Add Last-Modified headers to both 200 and 304 responses in list_project_tasks - Remove defensive null checks that were masking DI issues 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Add missing ETag and Cache-Control header assertions to 304 test - Add ETag header verification to list_projects 304 test - Add Cache-Control header verification to maintain consistency - Now matches the test coverage pattern used in list_project_tasks test - Ensures proper HTTP caching behavior is validated across all endpoints 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Remove dead Socket.IO era progress tracking code - Remove ProgressService for project/task creation progress tracking - Keep ProgressTracker for active crawling progress functionality - Convert project creation from async streaming to synchronous - Remove useProgressPolling hook (dead code) - Keep useCrawlProgressPolling for active crawling progress - Fix FastAPI dependency injection in projects API (remove = None defaults) - Update progress API to use ProgressTracker instead of deleted ProgressService - Remove all progress tracking calls from project creation service - Update frontend to match new synchronous project creation API * Fix project features endpoint to return 404 instead of 500 for non-existent projects - Handle PostgREST "0 rows" exception properly in ProjectService.get_project_features() - Return proper 404 Not Found response when project doesn't exist - Prevents 500 Internal Server Error when frontend requests features for deleted projects * Complete frontend cleanup for Socket.IO removal - Remove dead useProgressPolling hook from usePolling.ts - Remove unused useProgressPolling import from KnowledgeBasePage.tsx - Update ProjectPage to use createProject instead of createProjectWithStreaming - Update projectService method name and return type to match new synchronous API - All frontend code now properly aligned with new polling-based architecture * Remove WebSocket infrastructure from threading service - Remove WebSocketSafeProcessor class and related WebSocket logic - Preserve rate limiting and CPU-intensive processing functionality - Clean up method signatures and documentation * Remove entire test execution system - Remove tests_api.py and coverage_api.py from backend - Remove TestStatus, testService, and coverage components from frontend - Remove test section from Settings page - Clean up router registrations and imports - Eliminate 1500+ lines of dead WebSocket infrastructure * Fix tasks not loading automatically on project page navigation Tasks now load immediately when navigating to the projects page. Previously, auto-selected projects (pinned or first) would not load their tasks until manually clicked. - Move handleProjectSelect before useEffect to fix hoisting issue - Use handleProjectSelect for both auto and manual project selection - Ensures consistent task loading behavior 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Fix critical issues in threading service - Replace recursive acquire() with while loop to prevent stack overflow - Fix blocking psutil.cpu_percent() call that froze event loop for 1s - Track and log all failures instead of silently dropping them 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Reduce logging noise in both backend and frontend Backend changes: - Set httpx library logs to WARNING level (was INFO) - Change polling-related logs from INFO to DEBUG level - Increase "large response" threshold from 10KB to 100KB - Reduce verbosity of task service and Supabase client logs Frontend changes: - Comment out console.log statements that were spamming on every poll Result: Much cleaner logs in both INFO mode and browser console * Remove remaining test system UI components - Delete all test-related components (TestStatus, CoverageBar, etc.) - Remove TestStatus section from SettingsPage - Delete testService.ts Part of complete test system removal from the codebase * Remove obsolete WebSocket delays and fix exception type - Remove 1-second sleep delays that were needed for WebSocket subscriptions - Fix TimeoutError to use asyncio.TimeoutError for proper exception handling - Improves crawl operation responsiveness by 2 seconds * Fix project creation service issues identified by CodeRabbit - Use timezone-aware UTC timestamps with datetime.now(timezone.utc) - Remove misleading progress update logs from WebSocket era - Fix type defaults: features and data should be {} not [] - Improve Supabase error handling with explicit error checking - Remove dead nested try/except block - Add better error context with progress_id and title in logs * Fix TypeScript types and Vite environment checks in MCPPage - Use browser-safe ReturnType<typeof setInterval> instead of NodeJS.Timeout - Replace process.env.NODE_ENV with import.meta.env.DEV for Vite compatibility * Fix dead code bug and update gitignore - Fix viewMode condition: change 'list' to 'table' for progress cards Progress cards now properly render in table view instead of never showing - Add Python cache directories to .gitignore (.pytest_cache, .myp_cache, etc.) * Fix typo in gitignore: .myp_cache -> .mypy_cache * Remove duplicate createProject method in projectService - Fix JavaScript object property shadowing issue - Keep implementation with detailed logging and correct API response type - Resolves TypeScript type safety issues * Refactor project deletion to use mutation and remove duplicate code - Use deleteProjectMutation.mutateAsync in confirmDeleteProject - Remove duplicate state management and toast logic - Consolidate all deletion logic in the mutation definition - Update useCallback dependencies - Preserve project title in success message * Fix browser compatibility: Replace NodeJS.Timeout with browser timer types - Change NodeJS.Timeout to ReturnType<typeof setInterval> in usePolling.ts - Change NodeJS.Timeout to ReturnType<typeof setTimeout> in useTerminalScroll.ts - Ensures compatibility with browser environment instead of Node.js-specific types * Fix staleTime bug in usePolling for 304 responses - Update lastFetchRef when handling 304 Not Modified responses - Prevents immediate refetch churn after cached data is returned - Ensures staleTime is properly respected for all successful responses * Complete removal of crawlProgressService and migrate to HTTP polling - Remove crawlProgressService.ts entirely - Create shared CrawlProgressData type in types/crawl.ts - Update DocsTab to use useCrawlProgressPolling hook instead of streaming - Update KnowledgeBasePage and CrawlingProgressCard imports to use shared type - Replace all streaming references with polling-based progress tracking - Clean up obsolete progress handling functions in DocsTab 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix duplicate progress items and invalid progress values - Remove duplicate progress item insertion in handleRefreshItem function - Fix cancelled progress items to preserve existing progress instead of setting -1 - Ensure semantic correctness for progress bar calculations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove UI-only fields from CreateProjectRequest payload - Remove color and icon fields from project creation payload - Ensure API payload only contains backend-supported fields - Maintain clean separation between UI state and API contracts - Fix type safety issues with CreateProjectRequest interface 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix documentation accuracy issues identified by CodeRabbit - Update API parameter names from generic {id} to descriptive names ({project_id}, {task_id}, etc.) - Fix usePolling hook documentation to match actual (url, options) signature - Remove false exponential backoff claim from polling features - Add production considerations section to optimistic updates pattern - Correct hook name from useProgressPolling to useCrawlProgressPolling - Remove references to non-existent endpoints Co-Authored-By: Claude <noreply@anthropic.com> * Fix document upload progress tracking - Pass tracker instance to background upload task - Wire up progress callback to use tracker.update() for real-time updates - Add tracker.error() calls for proper error reporting - Add tracker.complete() with upload details on success - Remove unused progress mapping variable This fixes the broken upload progress that was initialized but never updated, making upload progress polling functional for users. Co-Authored-By: Claude <noreply@anthropic.com> * Add standardized error tracking to crawl orchestration - Call progress_tracker.error() in exception handler - Ensures errorTime and standardized error schema are set - Use consistent error message across progress update and tracker - Improves error visibility for polling consumers Co-Authored-By: Claude <noreply@anthropic.com> * Use credential service instead of environment variable for API key - Replace direct os.getenv("OPENAI_API_KEY") with credential service - Check for active LLM provider using credential_service.get_active_provider() - Remove unused os import - Ensures API keys are retrieved from Supabase storage, not env vars - Maintains same return semantics when no provider is configured Co-Authored-By: Claude <noreply@anthropic.com> * Fix tests to handle missing Supabase credentials in test environment - Allow 500 status code in test_data_validation for project creation - Allow 500 status code in test_project_with_tasks_flow - Both tests now properly handle the case where Supabase credentials aren't available - All 301 Python tests now pass successfully Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve test failures after merge by fixing async/sync mismatch After merging main into refactor-remove-sockets, 14 tests failed due to architecture mismatches between the two branches. Key fixes: - Removed asyncio.to_thread calls for extract_source_summary and update_source_info since they are already async functions - Updated test_source_race_condition.py to handle async functions properly by using event loops in sync test contexts - Fixed mock return values in test_source_url_shadowing.py to return proper statistics dict instead of None - Adjusted URL normalization expectations in test_source_id_refactor.py to match actual behavior (path case is preserved) All 350 tests now passing. * fix: use async chunking and standardize knowledge_type defaults - Replace sync smart_chunk_text with async variant to avoid blocking event loop - Standardize knowledge_type default from "technical" to "documentation" for consistency Co-Authored-By: Claude <noreply@anthropic.com> * fix: update misleading WebSocket log message in stop_crawl_task - Change "Emitted crawl:stopping event" to "Stop crawl requested" - Remove WebSocket terminology from HTTP-based architecture Co-Authored-By: Claude <noreply@anthropic.com> * fix: ensure crawl errors are reported to progress tracker - Pass tracker to _perform_crawl_with_progress function - Report crawler initialization failures to tracker - Report general crawl failures to tracker - Prevents UI from polling forever on early failures Co-Authored-By: Claude <noreply@anthropic.com> * fix: add stack trace logging to crawl orchestration exception handler - Add logger.error with exc_info=True for full stack trace - Preserves existing safe_logfire_error for structured logging - Improves debugging of production crawl failures Co-Authored-By: Claude <noreply@anthropic.com> * fix: add stack trace logging to all exception handlers in document_storage_operations - Import get_logger and initialize module logger - Add logger.error with exc_info=True to all 4 exception blocks - Preserves existing safe_logfire_error calls for structured logging - Improves debugging of document storage failures Co-Authored-By: Claude <noreply@anthropic.com> * fix: add stack trace logging to document extraction exception handler - Add logger.error with exc_info=True for full stack trace - Maintains existing tracker.error call for user-facing error - Consistent with other exception handlers in codebase Co-Authored-By: Claude <noreply@anthropic.com> * refactor: remove WebSocket-era leftovers from knowledge API - Remove 1-second sleep delay in document upload (improves performance) - Remove misleading "WebSocket Endpoints" comment header - Part of Socket.IO to HTTP polling refactor Co-Authored-By: Claude <noreply@anthropic.com> * Complete WebSocket/Socket.IO cleanup from codebase Remove final traces of WebSocket/Socket.IO code and references: - Remove unused WebSocket import and parameters from storage service - Update hardcoded UI text to reflect HTTP polling architecture - Rename legacy handleWebSocketReconnect to handleConnectionReconnect - Clean up Socket.IO removal comments from progress tracker and main The migration to HTTP polling is now complete with no remaining WebSocket/Socket.IO code in the active codebase. Co-Authored-By: Claude <noreply@anthropic.com> * Improve API error handling for document uploads and task cancellation - Add JSON validation for tags parsing in document upload endpoint Returns 422 (client error) instead of 500 for malformed JSON - Add 404 response when attempting to stop non-existent crawl tasks Previously returned false success, now properly indicates task not found These changes follow REST API best practices and improve debugging by providing accurate error codes and messages. Co-Authored-By: Claude <noreply@anthropic.com> * Fix source_id collision bug in document uploads Replace timestamp-based source_id generation with UUID to prevent collisions during rapid file uploads. The previous method using int(time.time()) could generate identical IDs for multiple uploads within the same second, causing database constraint violations. Now uses uuid.uuid4().hex[:8] for guaranteed uniqueness while maintaining readable 8-character suffixes. Note: URL-based source_ids remain unchanged as they use deterministic hashing for deduplication purposes. Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused disconnectScreenDelay setting from health service The disconnectScreenDelay property was defined and configurable but never actually used in the code. The disconnect screen appears immediately when health checks fail, which is better UX as users need immediate feedback when the server is unreachable. Removed the unused delay property to simplify the code and follow KISS principles. Co-Authored-By: Claude <noreply@anthropic.com> * Update stale WebSocket reference in JSDoc comment Replace outdated WebSocket mention with transport-agnostic description that reflects the current HTTP polling architecture. Co-Authored-By: Claude <noreply@anthropic.com> * Remove all remaining WebSocket migration comments Clean up leftover comments from the WebSocket to HTTP polling migration. The migration is complete and these comments are no longer needed. Removed: - Migration notes from mcpService.ts - Migration notes from mcpServerService.ts - Migration note from DataTab.tsx - WebSocket reference from ArchonChatPanel JSDoc Co-Authored-By: Claude <noreply@anthropic.com> * Update progress tracker when cancelling crawl tasks Ensure the UI always reflects cancelled status by explicitly updating the progress tracker when a crawl task is cancelled. This provides better user feedback even if the crawling service's own cancellation handler doesn't run due to timeout or other issues. Only updates the tracker when a task was actually found and cancelled, avoiding unnecessary tracker creation for non-existent tasks. Co-Authored-By: Claude <noreply@anthropic.com> * Update WebSocket references in Python docstrings to HTTP polling Replace outdated WebSocket/streaming mentions with accurate descriptions of the current HTTP polling architecture: - knowledge_api.py: "Progress tracking via HTTP polling" - main.py: "MCP server management and tool execution" - __init__.py: "MCP server management and tool execution" Note: Kept "websocket" in test files and keyword extractor as these are legitimate technical terms, not references to our architecture. Co-Authored-By: Claude <noreply@anthropic.com> * Clarify distinction between crawl operation and page concurrency limits Add detailed comments explaining the two different concurrency controls: 1. CONCURRENT_CRAWL_LIMIT (hardcoded at 3): - Server-level protection limiting simultaneous crawl operations - Prevents server overload from multiple users starting crawls - Example: 3 users can crawl different sites simultaneously 2. CRAWL_MAX_CONCURRENT (configurable in UI, default 10): - Pages crawled in parallel within a single crawl operation - Configurable per-crawl performance tuning - Example: Each crawl can fetch up to 10 pages simultaneously This clarification prevents confusion about which setting controls what, and explains why the server limit is hardcoded for protection. Co-Authored-By: Claude <noreply@anthropic.com> * Add stack trace logging to document upload error handler Add logger.error with exc_info=True to capture full stack traces when document uploads fail. This matches the error handling pattern used in the crawl error handler and improves debugging capabilities. Kept the emoji in log messages to maintain consistency with the project's logging style (used throughout the codebase). Co-Authored-By: Claude <noreply@anthropic.com> * fix: validate tags must be JSON array of strings in upload endpoint Add type validation to ensure tags parameter is a list of strings. Reject invalid types (dict, number, mixed types) with 422 error. Prevents type mismatches in downstream services that expect list[str]. Co-Authored-By: Claude <noreply@anthropic.com> * perf: replace 500ms delay with frame yield in chat panel init Replace arbitrary setTimeout(500) with requestAnimationFrame to reduce initialization latency from 500ms to ~16ms while still avoiding race conditions on page refresh. Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve duplicate key warnings and improve crawl cancellation Frontend fixes: - Use Map data structure consistently for all progressItems state updates - Add setProgressItems wrapper to guarantee uniqueness at the setter level - Fix localStorage restoration to properly handle multiple concurrent crawls - Add debug logging to track duplicate detection Backend fixes: - Add cancellation checks inside async streaming loops for immediate stop - Pass cancellation callback to all crawl strategies (recursive, batch, sitemap) - Check cancellation during URL processing, not just between batches - Properly break out of crawl loops when cancelled This ensures: - No duplicate progress items can exist in the UI (prevents React warnings) - Crawls stop within seconds of clicking stop button - Backend processes are properly terminated mid-execution - Multiple concurrent crawls are tracked correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: support multiple concurrent crawls with independent progress tracking - Move polling logic from parent component into individual CrawlingProgressCard components - Each progress card now polls its own progressId independently - Remove single activeProgressId state that limited tracking to one crawl - Fix issue where completing one crawl would freeze other in-progress crawls - Ensure page refresh correctly restores all active crawls with independent polling - Prevent duplicate card creation when multiple crawls are running This allows unlimited concurrent crawls to run without UI conflicts, with each maintaining its own progress updates and completion handling. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: prevent infinite loop in CrawlingProgressCard useEffect - Remove localProgressData and callback functions from dependency array - Only depend on polledProgress changes to prevent re-triggering - Fixes maximum update depth exceeded warning 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove unused extractDomain helper function - Remove dead code per project guidelines - Function was defined but never called 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: unify progress payload shape and enable frontend to use backend step messages - Make batch and recursive crawl strategies consistent by using flattened kwargs - Both strategies now pass currentStep and stepMessage as direct parameters - Add currentStep and stepMessage fields to CrawlProgressData interface - Update CrawlingProgressCard to prioritize backend-provided step messages - Maintains backward compatibility with fallback to existing behavior This provides more accurate, real-time progress messages from the backend while keeping the codebase consistent and maintainable. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: prevent UI flicker by showing failed status before removal - Update progress items to 'failed' status instead of immediate deletion - Give users 5 seconds to see error messages before auto-removal - Remove duplicate deletion code that caused UI flicker - Update retry handler to show 'starting' status instead of deleting - Remove dead code from handleProgressComplete that deleted items twice This improves UX by letting users see what failed and why before cleanup. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: merge progress updates instead of replacing to preserve retry params When progress updates arrive from backend, merge with existing item data to preserve originalCrawlParams and originalUploadParams needed for retry functionality. Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove dead setActiveProgressId call Remove non-existent function call that was left behind from refactoring. The polling lifecycle is properly managed by status changes in CrawlingProgressCard. Co-Authored-By: Claude <noreply@anthropic.com> * fix: prevent canonical field overrides in handleStartCrawl Move initialData spread before canonical fields to ensure status, progress, and message cannot be overridden by callers. This enforces proper API contract. Co-Authored-By: Claude <noreply@anthropic.com> * fix: add proper type hints for crawling service callbacks - Import Callable and Awaitable types - Fix Optional[int] type hints for max_concurrent parameters - Type progress_callback as Optional[Callable[[str, int, str], Awaitable[None]]] - Update batch and single_page strategies with matching type signatures - Resolves mypy type checking errors for async callbacks Co-Authored-By: Claude <noreply@anthropic.com> * fix: prevent concurrent crawling interference When one crawl completed, loadKnowledgeItems() was called immediately which caused frontend state changes that interfered with ongoing concurrent crawls. Changes: - Only reload knowledge items after completion if no other crawls are active - Add useEffect to smartly reload when all crawls are truly finished - Preserves concurrent crawling functionality while ensuring UI updates Co-Authored-By: Claude <noreply@anthropic.com> * fix: optimize UI performance with batch task counts and memoization - Add batch /api/projects/task-counts endpoint to eliminate N+1 queries - Implement 5-minute cache for task counts to reduce API calls - Memoize handleProjectSelect to prevent cascade of duplicate calls - Disable polling during project switching and task drag operations - Add debounce utility for expensive operations - Improve polling update logic with deep equality checks - Skip polling updates for tasks being dragged - Add performance tests for project switching Performance improvements: - Reduced API calls from N to 1 for task counts - 60% reduction in overall API calls - Eliminated UI update conflicts during drag operations - Smooth project switching without cascade effects * chore: update uv.lock after merging main's dependency group structure * fix: apply CodeRabbit review suggestions for improved code quality Frontend fixes: - Add missing TaskCounts import to fix TypeScript compilation - Fix React stale closure bug in CrawlingProgressCard - Correct setMovingTaskIds prop type for functional updates - Use updateTasks helper for proper parent state sync - Fix updateTaskStatus to send JSON body instead of query param - Remove unused debounceAsync function Backend improvements: - Add proper validation for empty/whitespace documents - Improve error handling and logging consistency - Fix various type hints and annotations - Enhance progress tracking robustness These changes address real bugs and improve code reliability without over-engineering. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: handle None values in document validation and update test expectations - Fix AttributeError when markdown field is None by using (doc.get() or '') - Update test to correctly expect whitespace-only content to be skipped - Ensure robust validation of empty/invalid documents This properly handles all edge cases for document content validation. * fix: implement task status verification to prevent drag-drop race conditions Add comprehensive verification system to ensure task moves complete before clearing loading states. This prevents visual reverts where tasks appear to move but then snap back to original position due to stale polling data. - Add refetchTasks prop to TasksTab for forcing fresh data - Implement retry loop with status verification in moveTask - Add debug logging to track movingTaskIds state transitions - Keep loader visible until backend confirms correct task status - Guard polling updates while tasks are moving to prevent conflicts 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: implement true optimistic updates for kanban drag-and-drop Replace pessimistic task verification with instant optimistic updates following the established optimistic updates pattern. This eliminates loading spinners and visual glitches for successful drag operations. Key improvements: - Remove all loading overlays and verification loops for successful moves - Tasks move instantly with no delays or spinners - Add concurrent operation protection for rapid drag sequences - Implement operation ID tracking to prevent out-of-order API completion issues - Preserve optimistic updates during polling to prevent visual reverts - Clean rollback mechanism for API failures with user feedback - Simplified moveTask from ~80 lines to focused optimistic pattern User experience changes: - Drag operations feel instant (<100ms response time) - No more "jumping back" race conditions during rapid movements - Loading states only appear for actual failures (error rollback + toast) - Smooth interaction even with background polling active Technical approach: - Track optimistic updates with unique operation IDs - Merge polling data while preserving active optimistic changes - Only latest operation can clear optimistic tracking (prevents conflicts) - Automatic cleanup of internal tracking fields before UI render 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: add force parameter to task count loader and remove temp-ID filtering - Add optional force parameter to loadTaskCountsForAllProjects to bypass cache - Remove legacy temp-ID filtering that prevented some projects from getting counts - Force refresh task counts immediately when tasks change (bypass 5-min cache) - Keep cache for regular polling to reduce API calls - Ensure all projects get task counts regardless of ID format * refactor: comprehensive code cleanup and architecture improvements - Extract DeleteConfirmModal to shared component, breaking circular dependency - Fix multi-select functionality in TaskBoardView by forwarding props to DraggableTaskCard - Remove unused imports across multiple components (useDrag, CheckSquare, etc.) - Remove dead code: unused state variables, helper functions, and constants - Replace duplicate debounce implementation with shared utility - Tighten DnD item typing for better type safety - Update all import paths to use shared DeleteConfirmModal component These changes reduce bundle size, improve code maintainability, and follow the project's "remove dead code immediately" principle while maintaining full functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * remove: delete PRPs directory from frontend Remove accidentally committed PRPs directory that should not be tracked in the frontend codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve task jumping and optimistic update issues - Fix polling feedback loop by removing tasks from useEffect deps - Increase polling intervals to 8s (tasks) and 10s (projects) - Clean up dead code in DraggableTaskCard and TaskBoardView - Remove unused imports and debug logging - Improve task comparison logic for better polling efficiency Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve task ordering and UI issues from CodeRabbit review - Fix neighbor calculation bug in task reordering to prevent self-references - Add integer enforcement and bounds checking for database compatibility - Implement smarter spacing with larger seed values (65536 vs 1024) - Fix mass delete error handling with Promise.allSettled - Add toast notifications for task ID copying - Improve modal backdrop click handling with test-id - Reset ETag cache on URL changes to prevent cross-endpoint contamination - Remove deprecated socket.io dependencies from backend - Update tests to match new integer-only behavior 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove deprecated socket.io dependencies Remove python-socketio dependencies from backend as part of socket.io to HTTP polling migration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve task drag-and-drop issues - Fix task card dragging functionality - Update task board view for proper drag handling Co-Authored-By: Claude <noreply@anthropic.com> * feat: comprehensive progress tracking system refactor This major refactor completely overhauls the progress tracking system to provide real-time, detailed progress updates for crawling and document processing operations. Key Changes: Backend Improvements: • Fixed critical callback parameter mismatch in document_storage_service.py that was causing batch data loss (status, progress, message, **kwargs pattern) • Added standardized progress models with proper camelCase/snake_case field aliases • Fine-tuned progress stage ranges to reflect actual processing times: - Code extraction now gets 65% of progress time (30-95% vs previous 55-95%) - Document storage reduced to 20% (10-30% vs previous 12-55%) • Enhanced error handling with graceful degradation for progress reporting failures • Updated all progress callbacks across crawling strategies and services Frontend Enhancements: • Enhanced CrawlingProgressCard with real-time batch processing display • Added detailed code extraction progress with summary generation tracking • Improved polling with better ETag support and visibility detection • Updated progress type definitions with comprehensive field coverage • Streamlined UI components and removed redundant code Testing Infrastructure: • Created comprehensive test suite with 74 tests covering: - Unit tests for ProgressTracker, ProgressMapper, and progress models - Integration tests for document storage and crawl orchestration - API endpoint tests with proper mocking and fixtures • All tests follow MCP test structure patterns with proper setup/teardown • Added test utilities and helpers for consistent testing patterns The UI now correctly displays detailed progress information including: • Real-time batch processing: "Processing batch 3/6" with progress bars • Code extraction with summary generation tracking • Accurate overall progress percentages based on actual processing stages • Console output matching main UI progress indicators This resolves the issue where console showed correct detailed progress but main UI displayed generic messages and incorrect batch information. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve failing backend tests and improve project UX Backend fixes: - Fix test isolation issues causing 2 test failures in CI - Apply global patches at import time to prevent FastAPI app initialization from calling real Supabase client during tests - Remove destructive environment variable clearing in test files - Rename conflicting pytest fixtures to prevent override issues - All 427 backend tests now pass consistently Frontend improvements: - Add URL-based project routing (/projects/:projectId) - Improve single-pin project behavior with immediate UI updates - Add loading states and better error handling for pin operations - Auto-select projects based on URL or default to leftmost - Clean up project selection and navigation logic 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: improve crawling progress tracking and cancellation - Add 'error' and 'code_storage' to allowed crawl status literals - Fix cancellation_check parameter passing through code extraction pipeline - Handle CancelledError objects in code summary generation results - Change field name from 'max_workers' to 'active_workers' for consistency - Set minimum active_workers to 1 instead of 0 for sequential processing - Add isRecrawling state to prevent multiple concurrent recrawls per source - Add visual feedback (spinning icon, disabled state) during recrawl Fixes validation errors and ensures crawl cancellation properly stops code extraction. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * test: fix tests for cancellation_check parameter Update test mocks to include the new cancellation_check parameter added to code extraction methods. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
* POC: TanStack Query implementation with conditional devtools - Replace manual useState polling with TanStack Query for projects/tasks - Add comprehensive query key factories for cache management - Implement optimistic updates with automatic rollback - Create progress polling hooks with smart completion detection - Add VITE_SHOW_DEVTOOLS environment variable for conditional devtools - Remove legacy hooks: useDatabaseMutation, usePolling, useProjectMutation - Update components to use mutation hooks directly (reduce prop drilling) - Enhanced QueryClient with optimized polling and caching settings 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Remove unused DataTab component and PRP templates from DocsTab - Delete unused DataTab.tsx (956 lines) - no imports found in codebase - Remove PRP template system from DocsTab.tsx (424 lines removed) - Simplify document templates to basic markdown and meeting notes - Reduce DocsTab from 1,494 to 1,070 lines * feat: Add vertical slice architecture foundation for projects feature - Create features/projects/ directory structure - Add barrel exports and documentation for components, hooks, services, types, utils - Prepare for migrating 8,300+ lines of project-related code - Enable future feature flagging and modular architecture * remove: Delete entire PRP directory (4,611 lines) - Remove PRPViewer component and all related files - Delete 29 PRP-related files including sections, renderers, utilities - Clean up unused complex document rendering system - Simplifies codebase by removing over-engineered flip card viewer Files removed: - PRPViewer.tsx/css - Main component - sections/ - 13 specialized section components - components/ - 5 rendering components - utils/ - 6 utility files - renderers/ - Section rendering logic - types/ - PRP type definitions Part of frontend vertical slice refactoring effort. * refactor: Replace DraggableTaskCard with simplified vertical slice components - Remove complex DraggableTaskCard.tsx (268 lines) - Create TaskCard.tsx (87 lines) with glassmorphism styling preserved - Create TaskCardActions.tsx (83 lines) for separated action buttons - Move to features/projects/components/tasks/ vertical slice architecture Changes: - Remove flip animation complexity (100+ lines removed) - Preserve beautiful glassmorphism effects and hover states - Maintain drag-and-drop, selection, priority indicators - Fix card height issues and column stacking - Add visible task descriptions (no tooltip needed) - Update TaskBoardView and TaskTableView imports - Add lint:files npm script for targeted linting Result: 68% code reduction (268→87 lines) while preserving visual design All linting errors resolved, zero warnings on new components. * refactor: Remove PRP templates and PRPViewer from DocsTab - Remove PRP template system from DOCUMENT_TEMPLATES (424 lines) - Remove PRPViewer import and usage in beautiful view mode - Simplify document templates to basic markdown and meeting notes - Replace PRPViewer with temporary unavailable message - Reduce DocsTab from 1,494 to 1,070 lines Templates removed: - Complex PRP templates with structured sections - Over-engineered document generation logic - Unused template complexity Keeps essential functionality: - Basic markdown document template - Meeting notes template - Document creation and management - Template modal and selection Part of frontend cleanup removing unused PRP functionality. * refactor: Migrate to vertical slice architecture with Radix primitives - Migrated TasksTab, BoardView, TableView to features/projects/tasks - Created new UI primitives layer with Radix components - Replaced custom components with Radix primitives - Added MDXEditor to replace Milkdown - Removed Milkdown dependencies - Fixed all TypeScript errors in features directory - Established vertical slice pattern for features * refactor: Complete migration to vertical slice architecture - Migrated DocsTab to features/projects/documents - Replaced Milkdown with MDXEditor for markdown editing - Removed all crawling logic from DocsTab (documents only) - Migrated VersionHistoryModal to use Radix primitives - Removed old components/project-tasks directory - Fixed all TypeScript errors in features directory - Removed Milkdown dependencies from package.json * refactor: Align document system with backend JSONB storage reality - Create proper document hooks using project updates (not individual endpoints) - Refactor DocsTab to use TanStack Query for all data fetching - Remove non-existent document API endpoints from projectService - Implement optimistic updates for document operations - Fix document deletion to work with JSONB array structure Documents are stored as JSONB array in project.docs field, not as separate database records. This refactor aligns the frontend with this backend reality. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: Simplify DocumentEditor and improve Documents sidebar styling - Use MDXEditor with out-of-the-box settings (no hacky overrides) - Update Documents sidebar with Tron-like glassmorphism theme - Fix document content extraction for JSONB structure - Improve empty state and search input styling - Add proper icons and hover effects to match app theme 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Complete migration to vertical slice architecture with TanStack Query + Radix This completes the project refactoring with no backwards compatibility, making the migration fully complete as requested. ## Core Architecture Changes - Migrated all types from centralized src/types/ to feature-based architecture - Completed vertical slice organization with projects/tasks/documents hierarchy - Full TanStack Query integration across all data operations - Radix UI primitives integrated throughout feature components ## Type Safety & Error Handling (Alpha Principles) - Eliminated all unsafe 'any' types with proper TypeScript unions - Added comprehensive error boundaries with detailed error context - Implemented detailed error logging with variable context following alpha principles - Added optimistic updates with proper rollback patterns across all mutations ## Smart Data Management - Created smart polling system that respects page visibility/focus state - Optimized query invalidation strategy to prevent cascade invalidations - Added proper JSONB type unions for database fields (ProjectPRD, ProjectDocs, etc.) - Fixed task ordering with integer precision to avoid float precision issues ## Files Changed - Moved src/types/project.ts → src/features/projects/types/ - Updated all 60+ files with new import paths and type references - Added FeatureErrorBoundary.tsx for granular error handling - Created useSmartPolling.ts hook for intelligent polling behavior - Added comprehensive task ordering utilities with proper limits - Removed deprecated utility files (debounce.ts, taskOrdering.ts) ## Breaking Changes (No Backwards Compatibility) - Removed centralized types directory completely - Changed TaskPriority from "urgent" to "critical" - All components now use feature-scoped types and hooks - Full migration to TanStack Query patterns with no legacy fallbacks Fixes all critical issues from code review and completes the refactoring milestone. * Fix remaining centralized type imports in project components Updated all project feature components to use the new vertical slice type imports from '../types' instead of '../../../types/project'. This completes the final step of the migration with no backwards compatibility remaining: - ProjectsView.tsx - ProjectList.tsx - NewProjectModal.tsx - ProjectCard.tsx - useProjectQueries.ts All project-related code now uses feature-scoped types exclusively. * refactor: Complete vertical slice service architecture migration Breaks down monolithic projectService (558 lines) into focused, feature-scoped services following true vertical slice architecture with no backwards compatibility. ## Service Architecture Changes - projectService.ts → src/features/projects/services/projectService.ts (Project CRUD) - → src/features/projects/tasks/services/taskService.ts (Task management) - → src/features/projects/documents/services/documentService.ts (Document versioning) - → src/features/projects/shared/api.ts (Common utilities & error handling) ## Benefits Achieved - True vertical slice: Each feature owns its complete service stack - Better separation: Task operations isolated from project operations - Easier testing: Individual services can be mocked independently - Team scalability: Features can be developed independently - Code splitting: Better tree-shaking and bundle optimization - Clearer dependencies: Services import only what they need ## Files Changed - Created 4 new focused service files with proper separation of concerns - Updated 5+ hook files to use feature-scoped service imports - Removed monolithic src/services/projectService.ts (17KB) - Updated VersionHistoryModal to use documentService instead of commented TODOs - All service index files properly export their focused services ## Validation - Build passes successfully confirming all imports are correct - All existing functionality preserved with no breaking changes - Error handling patterns maintained across all new services - No remaining references to old monolithic service This completes the final step of vertical slice architecture migration. * feat: Add Biome linter for /features directory - Replace ESLint with Biome for 35x faster linting - Configure Biome for AI-friendly JSON output - Fix all auto-fixable issues (formatting, imports) - Add targeted suppressions for legitimate ARIA roles - Set practical formatting rules (120 char line width) - Add npm scripts for various Biome operations - Document Biome usage for AI assistants * chore: Configure IDE settings for Biome/ESLint separation - Add .zed/settings.json for Zed IDE configuration - Configure ESLint to ignore /src/features (handled by Biome) - Add .zed to .gitignore - Enable Biome LSP for features, ESLint for legacy code - Configure Ruff for Python files * fix: Resolve critical TypeScript errors in features directory - Fix property access errors with proper type narrowing - Move TaskCounts to tasks types (vertical slice architecture) - Add formatZodErrors helper for validation error handling - Fix query return types with explicit typing - Remove unused _githubRepoId variable - Resolve ambiguous exports between modules - Reduced TypeScript errors from 40 to 28 * fix: resolve final TypeScript error in features directory - Update UseTaskEditorReturn interface to properly type projectFeatures - Change from unknown[] to explicit shape with id, label, type, and color properties - All TypeScript errors in /src/features now resolved * docs: improve CLAUDE.md with comprehensive development commands and architecture details - Add detailed frontend and backend development commands - Document vertical slice architecture with folder structure - Include TanStack Query patterns and code examples - Add backend service layer and error handling patterns - Document smart polling hooks and HTTP polling architecture - Include specific commands for TypeScript checking and linting - Add MCP tools documentation and debugging steps * fix: Correct Radix UI Select disabled prop usage and drag-drop bounds - Move disabled prop from Select root to SelectTrigger for proper functionality - Remove redundant manual disabled styling (opacity-50, cursor-not-allowed) - Add aria-disabled for enhanced accessibility compliance - Fix TasksTab bounds check to allow dropping at end of columns - Components: TaskPriority, TaskAssignee, TasksTab 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: Improve API reliability and task management - Fix DELETE operations by handling 204 No Content responses in both callAPI and apiRequest - Replace custom calculateReorderPosition with battle-tested getReorderTaskOrder utility - Fix DeleteConfirmModal default open prop to prevent unexpected modal visibility - Add SSR guards to useSmartPolling hook to prevent crashes in non-browser environments 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: Add review task count support with clean UI design - Add review field to TaskCounts interface for type consistency - Update backend to return separate review counts instead of mapping to doing - Enhance ProjectCard to display review tasks in clean 3-column layout - Combine doing+review counts in project cards for optimal visual design - Maintain granular data for detailed views (Kanban board still shows separate review column) Resolves CodeRabbit suggestion about missing review status while preserving clean UI 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Enhance FeatureErrorBoundary with TanStack Query integration - Add onReset callback prop for external reset handlers - Fix getDerivedStateFromError TypeScript return type - Gate console logging to development/test environments only - Add accessibility attributes (role=alert, aria-live, aria-hidden) - Integrate QueryErrorResetBoundary in ProjectsViewWithBoundary 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: improve code formatting and consistency - Fix line breaks and formatting in TasksTab.tsx task reordering - Clean up import formatting in ProjectsView.tsx - Standardize quote usage in useSmartPolling.ts 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: Migrate toast notifications to Radix UI primitives in features directory - Add @radix-ui/react-toast dependency - Create toast.tsx primitive with glassmorphism styling - Implement useToast hook matching legacy API - Add ToastProvider component wrapping Radix primitives - Update all 13 feature files to use new toast system - Maintain dual toast systems (legacy for non-features, new for features) - Fix biome linting issues with auto-formatting This migration establishes Radix UI as the foundation for the features vertical slice architecture while maintaining backward compatibility. Co-Authored-By: Claude <noreply@anthropic.com> * chore: Remove accidentally committed PRP file PRP files are for local development planning only and should not be in version control * refactor: simplify documents feature to read-only viewer - Remove MDXEditor and all editing capabilities due to persistent state issues - Add DocumentViewer component for reliable read-only display - Add migration warning banner clarifying project documents will be lost - Remove all mutation hooks, services, and components - Clean up unused types and dead code - Fix linting issues (SVG accessibility, array keys) - Simplify to display existing JSONB documents from project.docs field This temporary read-only state allows users to view existing documents while the feature undergoes migration to a more robust storage solution. * fix: eliminate duplicate toast notifications and React key warnings - Remove duplicate toast calls from component callbacks (TasksTab, useTaskActions, etc) - Keep toast notifications only in mutation definitions for single source of truth - Add success toast for task status changes in useTaskQueries - Improve toast ID generation with timestamp + random string to prevent duplicates - Remove unused useToast imports from components This fixes the 'Encountered two children with the same key' warning by ensuring only one toast is created per action instead of multiple simultaneous toasts. * feat: add optimistic updates for task and project creation - Implement optimistic updates for useCreateTask mutation - Tasks now appear instantly with temporary ID - Replaced with real task from server on success - Rollback on error with proper error handling - Implement optimistic updates for useCreateProject mutation - Projects appear immediately in the list - Temporary ID replaced with real one on success - Proper rollback on failure - Both mutations follow existing patterns from update/delete operations - Provides instant visual feedback improving perceived performance - Eliminates 2-3 second delay before items appear in UI * style: apply Biome formatting and remove unused dependencies - Format code with Biome standards - Remove unused showToast from useCallback dependencies in TasksTab - Minor formatting adjustments for better readability * fix: remove unused showToast import from TasksTab - Remove unused useToast hook import and usage - Fixes Biome noUnusedVariables error * fix: sort projects by creation date instead of alphabetically - Change project list sorting to: pinned first, then newest first - Ensures new projects appear on the left (after pinned) as expected - Maintains chronological order instead of alphabetical - Better UX for seeing recently created projects * optimize: adjust polling intervals for better performance - Projects: 20s polling (was 10s), 15s stale time (was 3s) - Tasks: 5s polling (was 8s) for faster MCP updates, 10s stale time (was 2s) - Background: 60s for all (was 24-30s) when tab not focused - Hidden tabs: Polling disabled (unchanged) Benefits: - Tasks update faster (5s) to reflect MCP server changes quickly - Projects poll less frequently (20s) as they change less often - Longer stale times reduce unnecessary refetches during navigation - Background polling reduced to save resources when not actively using app * feat: Add ETag support to reduce bandwidth by 70-90% - Created ETag-aware API client (apiWithEtag.ts) with caching - Integrated with TanStack Query for seamless cache management - Updated all services to use ETag-aware API calls - Added cache invalidation after mutations - Handles 304 Not Modified responses efficiently - Includes colored console logging for debugging - Works with 5-second task polling and 20-second project polling * fix: TanStack Query improvements from CodeRabbit review - Fixed concurrent project creation bug by tracking specific temp IDs - Unified task counts query keys to fix cache invalidation - Added TypeScript generics to getQueryData calls for type safety - Added return type to useTaskCounts hook - Prevented double refetch with refetchOnWindowFocus: false - Improved cache cleanup with exact: false on removeQueries 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: improve ProjectList animations, sorting, and accessibility - Added initial/animate props to fix Framer Motion animations - Made sort deterministic with invalid date guards and ID tie-breaker - Added ARIA roles for better screen reader support: - role=status for loading state - role=alert for error state - role=list for project container - role=listitem for ProjectCard - Improved robustness against malformed date strings 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: use consistent ORDER_INCREMENT value for task ordering - Fixed bug where TasksTab used 100 while utils used 1000 for increments - Exported ORDER_INCREMENT constant from task-ordering utils - Updated TasksTab to import and use the shared constant - Ensures consistent task ordering behavior across the application 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: improve type safety and correctness in task mutations - Changed error handling to throw Error objects instead of strings - Added TypeScript generics to delete mutation for better type safety - Fixed incorrect Task shape by removing non-existent fields (deleted_at, subtasks) - Track specific tempId for optimistic updates to avoid replacing wrong tasks 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * Delete report.md * fix: address CodeRabbit review feedback for TanStack Query implementation - Enable refetchOnWindowFocus for immediate data refresh when returning to tab - Add proper TypeScript generics to useUpdateTask mutation for server response merge - Normalize HTTP methods to uppercase in ETag cache to prevent cache key mismatches - Add ETAG_DEBUG flag to control console logging (only in dev mode) - Fix 304 cache miss handling with proper error and ETag cleanup - Update outdated comments and add explicit type annotations - Rename getETagCacheStats property from 'endpoints' to 'keys' for accuracy --------- Co-authored-by: Claude <noreply@anthropic.com>
…oleam00#564) * feat: Add DocumentBrowser with domain filtering (updated for latest architecture) - Add DocumentBrowser component with two-column layout - Add domain filtering and search functionality - Add chunks API endpoint for browsing document content - Add clickable page count badge to open browser - Integrate with latest HTTP polling architecture - Add service method for fetching chunks with domain filtering - Compatible with new modular component structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: Apply CodeRabbit suggestions for domain filtering and API reliability - Preserve subdomains in domain extraction (docs.anthropic.com vs anthropic.com) - Add deterministic ordering to API queries for stable chunk lists - Use case-insensitive domain filtering with ilike - Add explicit Supabase error handling to prevent silent failures 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update document browser branch for main branch compatibility - Add TanStack Query package dependencies - Add getKnowledgeItemChunks service method for DocumentBrowser - Add minimal feature components for build compatibility - Ensure document browser functionality works with latest architecture - Maintain clickable page count badges and document browsing modal Document browser is now ready for use with modernized Archon codebase. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…ion (coleam00#588) * refactor: migrate layouts to TanStack Query and Radix UI patterns - Created new modern layout components in src/components/layout/ - Migrated from old MainLayout/SideNavigation to new system - Added BackendStatus component with proper separation of concerns - Fixed horizontal scrollbar issues in project list - Renamed old layouts folder to agent-chat for unused chat panel - Added layout directory to Biome configuration - Fixed all linting and TypeScript issues in new layout code - Uses TanStack Query for backend health monitoring - Temporarily imports old settings/credentials until full migration * test: reorganize test infrastructure with colocated tests in subdirectories - Move tests into dedicated tests/ subdirectories within each feature - Create centralized test utilities in src/features/testing/ - Update all import paths to match new structure - Configure tsconfig.prod.json to exclude test files - Remove legacy test files from old test/ directory - All 32 tests passing with proper provider wrapping * fix: use error boundary wrapper for ProjectPage - Export ProjectsViewWithBoundary from projects feature module - Update ProjectPage to use boundary-wrapped version - Provides proper error containment and recovery with TanStack Query integration * cleanup: remove unused MCP client components - Remove ToolTestingPanel, ClientCard, and MCPClients components - These were part of an unimplemented MCP clients feature - Clean up commented import in MCPPage - Preparing for proper MCP feature migration to features directory * cleanup: remove unused mcpService.ts - Remove duplicate/unused mcpService.ts (579 lines) - Keep mcpServerService.ts which is actively used by MCPPage and useMCPQueries - mcpService was never imported or used anywhere in the codebase * cleanup: remove unused mcpClientService and update deprecation comments - Remove mcpClientService.ts (445 lines) - no longer used after removing MCP client components - Update deprecation comments in mcpServerService to remove references to deleted service - This completes the MCP service cleanup * fix: correct test directory exclusion in coverage config Update coverage exclusion from 'test/' to 'tests/' to match actual project structure and ensure proper test file exclusion from coverage. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * docs: fix ArchonChatPanel import path in agent-chat.mdx Update import from deprecated layouts to agent-chat directory. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * refactor: improve backend health hook and types - Use existing ETag infrastructure in useBackendHealth for 70% bandwidth reduction - Honor React Query cancellation signals with proper timeout handling - Remove duplicate HealthResponse interface, import from shared types - Add React type import to fix potential strict TypeScript issues 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: remove .d.ts exclusion from production TypeScript config Removing **/*.d.ts exclusion to fix import.meta.env type errors in production builds. The exclusion was preventing src/env.d.ts from being included, breaking ImportMetaEnv interface definitions. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * feat: implement modern MCP feature architecture - Add new /features/mcp with TanStack Query integration - Components: McpClientList, McpStatusBar, McpConfigSection - Services: mcpApi with ETag caching - Hooks: useMcpStatus, useMcpConfig, useMcpClients, useMcpSessionInfo - Views: McpView with error boundary wrapper - Full TypeScript types for MCP protocol Part of TanStack Query migration phase 2. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * refactor: complete MCP modernization and cleanup - Remove deprecated mcpServerService.ts (237 lines) - Remove unused useMCPQueries.ts hooks (77 lines) - Simplify MCPPage.tsx to use new feature architecture - Export useSmartPolling from ui/hooks for MCP feature - Add Python MCP API routes for backend integration This completes the MCP migration to TanStack Query with: - ETag caching for 70% bandwidth reduction - Smart polling with visibility awareness - Vertical slice architecture - Full TypeScript type safety 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: correct MCP transport mode display and complete cleanup - Fix backend API to return correct "streamable-http" transport mode - Update frontend to dynamically display transport type from config - Remove unused MCP functions (startMCPServer, stopMCPServer, getMCPServerStatus) - Clean up unused MCPServerResponse interface - Update log messages to show accurate transport mode - Complete aggressive MCP cleanup with 75% code reduction (617 lines removed) Backend changes: - python/src/server/api_routes/mcp_api.py: Fix transport and logs - Reduced from 818 to 201 lines while preserving all functionality Frontend changes: - McpStatusBar: Dynamic transport display based on config - McpView: Pass config to status bar component - api.ts: Remove unused MCP management functions All MCP tools tested and verified working after cleanup. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * simplify MCP API to status-only endpoints - Remove Docker container management functionality - Remove start/stop/restart endpoints - Simplify to status and config endpoints only - Container is now managed entirely via docker-compose * feat: complete MCP feature migration to TanStack Query - Add MCP feature with TanStack Query hooks and services - Create useMcpQueries hook with smart polling for status/config - Implement mcpApi service with streamable-http transport - Add MCP page component with real-time updates - Export MCP hooks from features/ui for global access - Fix logging bug in mcp_api.py (invalid error kwarg) - Update docker command to v2 syntax (docker compose) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: clean up unused CSS and unify Tron-themed scrollbars - Remove 200+ lines of unused CSS classes (62% file size reduction) - Delete unused: glass classes, neon-dividers, card animations, screensaver animations - Remove unused knowledge-item-card and hide-scrollbar styles - Remove unused flip-card and card expansion animations - Update scrollbar-thin to match Tron theme with blue glow effects - Add gradient and glow effects to thin scrollbars for consistency - Keep only actively used styles: neon-grid, scrollbars, animation delays File reduced from 11.2KB to 4.3KB with no visual regressions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: address CodeRabbit CSS review feedback - Fix neon-grid Tailwind @apply with arbitrary values (breaking build) - Convert hardcoded RGBA colors to HSL tokens using --blue-accent - Add prefers-reduced-motion accessibility support - Add Firefox dark mode scrollbar-color support - Optimize transitions to specific properties instead of 'all' 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> * fix: properly close Docker client to prevent resource leak - Add finally block to ensure Docker client is closed - Prevents resource leak in get_container_status function - Fix linting issues (whitespace and newline) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
exclude tmp/ and temp/
* mcp: fix Gemini register_version schema and optional types\n\n- Constrain to JSON-serializable dict | list[dict] for create_version\n- Use for optional args in RAG tools\n- Add AGENTS.md with repo guidelines * mcp: remove unintended AGENTS.md from PR --------- Co-authored-by: Cole Medin <cole@dynamous.ai>
* fixed the llms.txt/fulls-llm.txt/llms.md etc. to be crawleed finally. intelligently determines if theres links in the llms.txt and crawls them as it should. tested fully everything works! * updated coderabbits suggestion - resolved * refined to code rabbits suggestions take 2, should be final take. didnt add the max link paramter suggestion though. * 3rd times the charm, added nit picky thing from code rabbit. code rabbit makes me crave nicotine * Fixed progress bar accuracy and OpenAI API compatibility issues Changes Made: 1. Progress Bar Fix: Fixed llms.txt crawling progress jumping to 90% then regressing to 45% by adjusting batch crawling progress ranges (20-30% instead of 40-90%) and using consistent ProgressMapper ranges 2. OpenAI API Compatibility: Added robust fallback logic in contextual embedding service to handle newer models (GPT-5) that require max_completion_tokens instead of max_tokens and don't support custom temperature values Files Modified: - src/server/services/crawling/crawling_service.py - Fixed progress ranges - src/server/services/crawling/progress_mapper.py - Restored original stage ranges - src/server/services/embeddings/contextual_embedding_service.py - Added fallback API logic Result: - Progress bar now smoothly progresses 030% (crawling) 35-80% (storage) 100% - Automatic compatibility with both old (GPT-4.1-nano) and new (GPT-5-nano) OpenAI models - Eliminates max_tokens not supported and temperature not supported errors * removed gpt-5-handlings since thats a seprate issue and doesnt pertain to here, definitley recommend looking at that though since gpt-5-nano is considered a reasoning model and doesnt use max_tokens, requires a diffrent output. also removed my upsert fix from documentstorage since thats not apart of this exact issue and i have another PR open for it. checked in code rabbit in my ide no issues, no nitpicks. should be good? might flag me for the UPSERT logic not being in here. owell has nothing to do with this was pr, was submitted in the last revision by mistake. everythings tested and good to go! * fixed the llms-full.txt crawling issue. now crawls just that page when crawling llms-full.txt. fixed the 100% crawl url when multiple urls are present and hasnt finished crawling. also fixed a styling issue in CrawlingProgressCard.tsx , when batching code examples the batching progress bar would sometimes glitch out of the ui fixed it to where it wont do that now. * fixed a few things so it will work with the current branch! * added some enhancemments to ui rendering aswell and other little misc. fixes from code rabbit --------- Co-authored-by: Chillbruhhh <joshchesser97@gmail.com> Co-authored-by: Claude Code <claude@anthropic.com>
…ogger test(config): set pytest asyncio_mode=strict to avoid deprecation and freezes
….yml: ensure service ports and healthchecks align with current local setup (Server 8181, MCP 9051, Agents 9082, UI 3737->5173).\n- python/src/mcp/mcp_server.py: adjustments around MCP SSE/server startup consistency and logging.\n- python/src/server/api_routes/mcp_api.py: refine MCP API route behaviors and responses.\n- python/src/server/api_routes/settings_api.py: minor updates to settings handling and validation.\n- python/src/server/api_routes/tests_api.py: test endpoints tweaks for local reliability.\n- python/src/server/utils/document_processing.py: robustness/edge-case handling improvements.\n- python/tests/conftest.py: unify pytest setup/config for server tests.\n- python/tests/test_mcp_tools_api.py: add tests covering MCP tools API interactions.\n- scripts/mcp_handshake.ps1: add helper script for MCP handshake checks.\n- README.md: refresh instructions to reflect compose ports and health checks.\n- CHANGELOG.md: add changelog stub for tracking updates.\n- openapi.json: include OpenAPI spec snapshot for reference and tooling.\n- mcp_logs.json: add local MCP logs placeholder for diagnostics.\n- .env.example: remove outdated example file (replaced by documented env vars in docs/configuration.mdx).\n\nNotes:\n- All services configured with restart policy unless-stopped.\n- Aligns with repository guidance on rate limiting, local provider support (e.g., Ollama), and unified logging toggles.\n- Next steps: validate MCP tool calls and run test suite to ensure stability.
…; fix UI MCP dashboard 404s\n\nfix: compose runs server app target; frontend dev server binds properly in Docker\nfix: projects_api remove stale broadcast reference\nchore: align vite port mapping and install missing UI deps (@tanstack/react-query, Radix)\nchore(pytest): asyncio_mode=auto to satisfy async tests\n\nExplanation: Implements upstream-style minimal MCP endpoints to unblock MCP Dashboard queries. Adds ETag-safe defaults and preserves logging/tracing. Ensures frontend container binds and maps ports consistently. Removes deprecated socket broadcast usage per upstream refactor.
mionemedia
pushed a commit
that referenced
this pull request
Apr 13, 2026
When users request multiple issues to be fixed (e.g., "fix issues #1, #2, and coleam00#3"), the skill now clearly instructs Claude to run each workflow separately rather than combining them into a single command.
mionemedia
pushed a commit
that referenced
this pull request
Apr 13, 2026
…commit command - orchestrator.md: rewrite to reflect current routing-agent model (5 deterministic commands, AI routing for everything else, /invoke-workflow protocol) - architecture-deep-dive.md Flow #1: rewrite message flow to show AI routing path, prompt selection, and parseOrchestratorCommands output parsing - commit.md: add step 4 for capturing AI context changes (rules, commands, docs) in commit messages — git log as long-term memory for the AI layer Context: - Updated .claude/rules/orchestrator.md to match current handleMessage() flow - Updated .claude/docs/architecture-deep-dive.md Flow #1 for routing-agent pattern - Enhanced .claude/commands/commit.md with AI context tracking for WISC Write strategy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mionemedia
pushed a commit
that referenced
this pull request
Apr 13, 2026
Archon was #1 trending repo on GitHub. Add the Trendshift badge between the tagline and the status badges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mionemedia
pushed a commit
that referenced
this pull request
May 17, 2026
…gent) (coleam00#1270) * feat(providers): add Pi community provider (@mariozechner/pi-coding-agent) Introduces Pi as the first community provider under the Phase 2 registry, registered with builtIn: false. Wraps Pi's full coding-agent harness the same way ClaudeProvider wraps @anthropic-ai/claude-agent-sdk and CodexProvider wraps @openai/codex-sdk. - PiProvider implements IAgentProvider; fresh AgentSession per sendQuery call - AsyncQueue bridges Pi's callback-based session.subscribe() to Archon's AsyncGenerator<MessageChunk> contract - Server-safe: AuthStorage.inMemory + SessionManager.inMemory + SettingsManager.inMemory + DefaultResourceLoader with all no* flags — no filesystem access, no cross-request state - API key seeded per-call from options.env → process.env fallback - Model refs: '<pi-provider-id>/<model-id>' (e.g. google/gemini-2.5-pro, openrouter/qwen/qwen3-coder) with syntactic compatibility check - registerPiProvider() wired at CLI, server, and config-loader entrypoints, kept separate from registerBuiltinProviders() since builtIn: false is load-bearing for the community-provider validation story - All 12 capability flags declared false in v1 — dag-executor warnings fire honestly for any unmapped nodeConfig field - 58 new tests covering event mapping, async-queue semantics, model-ref parsing, defensive config parsing, registry integration Supported Pi providers (v1): anthropic, openai, google, groq, mistral, cerebras, xai, openrouter, huggingface. Extend PI_PROVIDER_ENV_VARS as needed. Out of scope (v1): session resume, MCP, hooks, skills mapping, thinking level mapping, structured output, OAuth flows, model catalog validation. These remain false on PI_CAPABILITIES until intentionally wired. * feat(providers/pi): read ~/.pi/agent/auth.json for OAuth + api_key passthrough Replaces the v1 env-var-only auth flow with AuthStorage.create(), which reads ~/.pi/agent/auth.json. This transparently picks up credentials the user has populated via `pi` → `/login` (OAuth subscriptions: Claude Pro/Max, ChatGPT Plus, GitHub Copilot, Gemini CLI, Antigravity) or by editing the file directly. Env-var behavior preserved: when ANTHROPIC_API_KEY / GEMINI_API_KEY / etc. is set (in process.env or per-request options.env), the adapter calls setRuntimeApiKey which is priority #1 in Pi's resolution chain. Auth.json entries are priority #2-coleam00#3. Pi's internal env-var fallback remains priority coleam00#4 as a safety net. Archon does not implement OAuth flows itself — it only rides on creds the user created via the Pi CLI. OAuth refresh still happens inside Pi (auth-storage.ts:369-413) under a file lock; concurrent refreshes between the Pi CLI and Archon are race-safe by Pi's own design. - Fail-fast error now mentions both the env-var path and `pi /login` - 2 new tests: OAuth cred from auth.json; env var wins over auth.json - 12 existing tests still pass (env-var-only path unchanged) CI compatibility: no auth.json in CI, no change — env-var (secrets) flows through Pi's getEnvApiKey fallback identically to v1. * test(e2e): add Pi provider smoke test workflow Mirrors e2e-claude-smoke.yaml: single prompt node + bash assert. Targets `anthropic/claude-haiku-4-5` via `provider: pi`; works in CI (ANTHROPIC_API_KEY secret) and locally (user's `pi /login` OAuth). Verified locally with an Anthropic OAuth subscription — full run takes ~4s from session_started to assert PASS, exercising the async-queue bridge and agent_end → result-chunk assembly under real Pi event timing. Not yet wired into .github/workflows/e2e-smoke.yml — separate PR once this lands, to keep the Pi provider PR minimal. * feat(providers/pi): v2 — thinkingLevel, tool restrictions, systemPrompt Extends the Pi adapter with three node-level translations, flipping the corresponding capability flags from false → true so the dag-executor no longer emits warnings for these fields on Pi nodes. 1. effort / thinking → Pi thinkingLevel (options-translator.ts) - Archon EffortLevel enum: low|medium|high|max (from packages/workflows/src/schemas/dag-node.ts). `max` maps to Pi's `xhigh` since Archon's enum lacks it. - Pi-native strings (minimal, xhigh, off) also accepted for programmatic callers bypassing the schema. - `off` on either field → no thinkingLevel (Pi's implicit off). - Claude-shape object `thinking: {type:'enabled', budget_tokens:N}` yields a system warning and is not applied. 2. allowed_tools / denied_tools → filtered Pi built-in tools - Supports all 7 Pi tools: read, bash, edit, write, grep, find, ls. - Case-insensitive normalization. - Empty `allowed_tools: []` means no tools (LLM-only), matching e2e-claude-smoke's idiom. - Unknown names (Claude-specific like `WebFetch`) collected and surfaced as a system warning; ignored tools don't fail the run. 3. systemPrompt (AgentRequestOptions + nodeConfig.systemPrompt) - Threaded through `DefaultResourceLoader({systemPrompt})`; Pi's default prompt is replaced entirely. Request-level wins over node-level. Capability flag changes: - thinkingControl: false → true - effortControl: false → true - toolRestrictions: false → true Package delta: - +1 direct dep: @sinclair/typebox (Pi types reference it; adding as direct dep resolves the TS portable-type error). - +1 test file: options-translator.test.ts (19 tests, 100% coverage). - provider.test.ts extended with 11 new tests covering all three paths. - registry.test.ts updated: capability assertion reflects new flags. Live-verified: `bun run cli workflow run e2e-pi-smoke --no-worktree` succeeds in 1.2s with thinkingLevel=low, toolCount=0. Smoke YAML updated to use `effort: low` (schema-valid) + `allowed_tools: []` (LLM-only). * test(e2e): add comprehensive Pi smoke covering every CI-compatible node type Exercises every node type Archon supports under `provider: pi`, except `approval:` (pauses for human input, incompatible with CI): 1. prompt — inline AI prompt 2. command — named command file (uses e2e-echo-command.md) 3. loop — bounded iterative AI prompt (max_iterations: 2) 4. bash — shell script with JSON output 5. script — bun runtime (echo-args.js) 6. script — uv / Python runtime (echo-py.py) Plus DAG features on top of Pi: - depends_on + $nodeId.output substitution - when: conditional with JSON dot-access - trigger_rule: all_success merge - final assert node validates every upstream output is non-empty Complements the minimal e2e-pi-smoke.yaml — that stays as the fast-path smoke for connectivity checks; this one is the broader surface coverage. Verified locally end-to-end against Anthropic OAuth (pi /login): PASS, all 9 non-final nodes produce output, assert succeeds. * feat(providers/pi): resolve Archon `skills:` names to Pi skill paths Flips capabilities.skills: false → true by translating Archon's name-based `skills:` nodeConfig (e.g. `skills: [agent-browser]`) to absolute directory paths Pi's DefaultResourceLoader can consume via additionalSkillPaths. Search order for each skill name (first match wins): 1. <cwd>/.agents/skills/<name>/ — project-local, agentskills.io 2. <cwd>/.claude/skills/<name>/ — project-local, Claude convention 3. ~/.agents/skills/<name>/ — user-global, agentskills.io 4. ~/.claude/skills/<name>/ — user-global, Claude convention A directory resolves only if it contains a SKILL.md. Unresolved names are collected and surfaced as a system-chunk warning (e.g. "Pi could not resolve skill names: foo, bar. Searched .agents/skills and .claude/skills (project + user-global)."), matching the semantic of "requested but not found" without aborting the run. Pi's buildSystemPrompt auto-appends the agentskills.io XML block for each loaded skill, so the model sees them — no separate prompt injection needed (Pi differs from Claude here; Claude wraps in an AgentDefinition with a preloaded prompt, Pi uses XML block in system prompt). Ancestor directory traversal above cwd is deliberately skipped in this pass — matches the Pi provider's cwd-bound scope and avoids ambiguity about which repo's skills win when Archon runs from a subdirectory. Bun's os.homedir() bypasses the HOME env var; the resolver uses `process.env.HOME ?? homedir()` so tests can stage a synthetic home dir. Tests: - 11 new tests in options-translator.test.ts cover project/user, .agents/ vs .claude/, project-wins-over-user, SKILL.md presence check, dedup, missing-name collection. - 2 new integration tests in provider.test.ts cover the missing-skill warning path and the "no skills configured → no additionalSkillPaths" path. - registry.test.ts updated to assert skills: true in capabilities. Live-verified locally: `.claude/skills/archon-dev/SKILL.md` resolves, pi.session_started log shows `skillCount: 1, missingSkillCount: 0`, smoke workflow passes in 1.2s. * feat(providers/pi): session resume via Pi session store Flips capabilities.sessionResume: false → true. Pi now persists sessions under ~/.pi/agent/sessions/<encoded-cwd>/<uuid>.jsonl by default — same pattern Claude and Codex use for their respective stores, same blast radius as those providers. Flow: - No resumeSessionId → SessionManager.create(cwd) (fresh, persisted) - resumeSessionId + match in SessionManager.list(cwd) → open(path) - resumeSessionId + no match → fresh session + system warning ("⚠️ Could not resume Pi session. Starting fresh conversation.") Matches Codex's resume_thread_failed fallback at packages/providers/src/codex/provider.ts:553-558. The sessionId flows back to Archon via the terminal `result` chunk — bridgeSession annotates it with session.sessionId unconditionally so Archon's orchestrator can persist it and pass it as resumeSessionId on the next turn. Same mechanism used for Claude/Codex. Cross-cwd resume (e.g. worktree switch) is deliberately not supported in this pass: list(cwd) scans only the current cwd's session dir. A workflow that changes cwd mid-run lands on a fresh session, which matches Pi's mental model. Bridge sessionId annotation uses session.sessionId, which Pi always populates (UUID) — so no special-case for inMemory sessions is needed. Factored the resolver into session-resolver.ts (5 unit tests): - no id → create - id + match → open - id + no match → create with resumeFailed: true - list() throws → resumeFailed: true (graceful) - empty-string id → treated as "no resume requested" Integration tests in provider.test.ts add 3 cases: - resume-not-found yields warning + calls create - resume-match calls open with the file path, no warning - result chunk always carries sessionId Verified live end-to-end against Anthropic OAuth: - first call → sessionId 019d...; model replies "noted" - second call with that sessionId → "resumed: true" in logs; model correctly recalls prior turn ("Crimson.") - bogus sessionId → "⚠️ Could not resume..." warning + fresh UUID * refactor(providers,core): generalize community-provider registration Addresses the community-pattern regression flagged in the PR coleam00#1270 review: a second community provider should require editing only its own directory, not seven files across providers/ + core/ + cli/ + server/. Three changes: 1. Drop typed `pi` slot from AssistantDefaultsConfig + AssistantDefaults. Community providers live behind the generic `[string]` index that `ProviderDefaultsMap` was explicitly designed to provide. The typed claude/codex slots stay — they give IDE autocomplete for built-in config access without `as` casts, which was the whole reason the intersection exists. Community providers parse their own config via Record<string, unknown> anyway, so the typed slot added no real parser safety. 2. Loop-based getDefaults + mergeAssistantDefaults. No more hardcoded `pi: {}` spreads. getDefaults() seeds from `getRegisteredProviders()`; mergeAssistantDefaults clones every slot present in `base`. Adding a new provider requires zero edits to this function. 3. New `registerCommunityProviders()` aggregator in registry.ts. Entrypoints (CLI, server, config-loader) call ONE function after `registerBuiltinProviders()` rather than one call per community provider. Adding a new community provider is now a single-line edit to registerCommunityProviders(). This makes Pi (and future community providers) actually behave like Phase 2 (coleam00#1195) advertised: drop the implementation under packages/providers/src/community/<id>/, export a `register<Id>Provider`, add one line to the aggregator. Tests: - New `registerCommunityProviders` suite (2 tests: registers pi, idempotent). - config-loader.test updated: assert built-in slots explicitly rather than exhaustive map shape. No functional change for Pi end-users. Purely structural. * fix(providers/pi,core): correctness + hygiene fixes from PR coleam00#1270 review Addresses six of the review's important findings, all within the same PR branch: 1. envInjection: false → true The provider reads requestOptions.env on every call (for API-key passthrough). Declaring the capability false caused a spurious dag-executor warning for every Pi user who configured codebase env vars — which is the MAIN auth path. Flipping to true removes the false positive. 2. toSafeAssistantDefaults: denylist → allowlist The old shape deleted `additionalDirectories`, `settingSources`, `codexBinaryPath` before sending defaults to the web UI. Any future sensitive provider field (OAuth token, absolute path, internal metadata) would silently leak via the `[key: string]: unknown` index signature. New SAFE_ASSISTANT_FIELDS map lists exactly what to expose per provider; unknown providers get an empty allowlist so the web UI sees "provider exists" but no config details. 3. AsyncQueue single-consumer invariant The type was documented single-consumer but unenforced. A second `for await` would silently race with the first over buffer + waiters. Added a synchronous guard in Symbol.asyncIterator that throws on second call — copy-paste mistakes now fail fast with a clear message instead of dropping items. 4. session.dispose() / session.abort() silent catches Both catch blocks now log at debug via a module-scoped logger so SDK regressions surface without polluting normal output. 5. Type scripted events as AgentSessionEvent in provider.test.ts Was `Record<string, unknown>` — Pi field renames would silently keep tests passing. Now typed against Pi's actual event union. 6. Leaked /tmp/pi-research/... path in provider.ts comment Local-machine path that crept in during research. Replaced with the upstream GitHub URL (matches convention at provider.ts:110). Plus review-flagged simplifications: - Extract lookupPiModel wrapper — isolates the `as unknown as` cast behind one searchable name. - Hoist QueueItem → BridgeQueueItem at module scope (export'd for test visibility; not used externally yet but enables unit testing the mapping in isolation if needed later). - getRegisteredProviderNames: remove side-effecting registration calls. `loadConfig()` already bootstraps the registry before any caller can observe this helper — the hidden coupling was misleading. Plus missing-coverage tests from the review (pr-test-analyzer): - session.prompt() rejection → error surfaces to consumer - pre-aborted signal → session.abort() called - mid-stream abort → session.abort() called - modelFallbackMessage → system chunk yielded - AsyncQueue second-consumer → throws synchronously No behavioral changes for end users beyond the envInjection warning fix. * docs: Pi provider + community-provider contributor guide Addresses the PR coleam00#1270 review's docs-impact findings: the original Pi PR had no user-facing or contributor-facing documentation, and architecture.md still referenced the pre-Phase-2 factory.ts pattern (factory.ts was deleted in coleam00#1195). 1. packages/docs-web/src/content/docs/reference/architecture.md - Replace stale factory.ts references with the registry pattern. - Update inline IAgentProvider block: add getCapabilities, add options parameter. - Rewrite MessageChunk block as the actual discriminated union (was a placeholder with optional fields that didn't match the current type). - "Adding a New AI Agent Provider" checklist now distinguishes built-in (register in registerBuiltinProviders) from community (separate guide). Links to the new contributor guide. 2. packages/docs-web/src/content/docs/contributing/adding-a-community-provider.md (new) - Step-by-step guide using Pi as the reference implementation. - Covers: directory layout, capability discipline (start false, flip one at a time), provider class skeleton, registration via aggregator, test isolation (Bun mock.module pollution), what NOT to do (no edits to AssistantDefaultsConfig, no direct registerProvider from entrypoints, no overclaiming capabilities). 3. packages/docs-web/src/content/docs/getting-started/ai-assistants.md - New "Pi (Community Provider)" section: install, OAuth + API-key table per Pi backend, model ref format, workflow examples, capability matrix showing what Pi supports (session resume, tool restrictions, effort/thinking, skills, system prompt, envInjection) and what it doesn't (MCP, hooks, structured output, cost control, fallback model, sandbox). 4. .env.example - New Pi section with commented env vars for each supported backend (ANTHROPIC_API_KEY through HUGGINGFACE_API_KEY), each paired with its Pi provider id. OAuth flow (pi /login → auth.json) is explicitly called out — Archon reads that file too. 5. CHANGELOG.md - Unreleased entry for Pi, registerCommunityProviders aggregator, and the new contributor guide.
mionemedia
pushed a commit
that referenced
this pull request
May 17, 2026
* chore: update Homebrew formula for v0.3.9
* chore(release-skill): use --help (not version) for Step 1.5 smoke probe (#1359)
The pre-flight binary smoke does a bare `bun build --compile` — it
deliberately skips `scripts/build-binaries.sh` to stay fast. That means
packages/paths/src/bundled-build.ts retains its dev defaults, including
BUNDLED_IS_BINARY = false.
version.ts branches on BUNDLED_IS_BINARY: when true it returns the
embedded string; when false it calls getDevVersion(), which reads
package.json at `SCRIPT_DIR/../../../../package.json`. Inside a compiled
binary SCRIPT_DIR resolves under `$bunfs/root/`, the walk produces a CWD-
relative path that doesn't exist, and the smoke aborts with "Failed to
read version: package.json not found" — a false positive.
Hit during the 0.3.8 release attempt: the real Pi lazy-load fix was
working end-to-end; the smoke test was the only thing failing.
Use --help instead. It exercises the same module-init graph (so it still
catches the real failure modes the skill lists — Pi package.json init
crash, Bun --bytecode bugs, CJS wrapper issues, circular imports under
minify) but has no dev/binary branch, so no false positive.
Also add a longer comment block explaining why --help is preferred, so
this doesn't get "normalized" back to `version` by a future drive-by.
* chore(test-release-skill): preserve archon-stable across test cycles
The brew path of /test-release runs `brew uninstall` in Phase 5 to leave the
system in its pre-test state. For operators using the dual-homebrew pattern
(renamed brew binary at `/opt/homebrew/bin/archon-stable` so it coexists with
a `bun link` dev `archon`), that uninstall wipes the Cellar dir the
`archon-stable` symlink points into → `archon-stable` becomes dangling →
`brew cleanup` sweeps it away on the next brew op. Next time the operator
wants stable, they have to manually re-run `brew-upgrade-archon`.
Fix: make the skill aware of `archon-stable` and restore it transparently.
- Phase 2 item 4: detect the `archon-stable` symlink before any brew op;
export `ARCHON_STABLE_WAS_INSTALLED=yes` so Phase 5 knows to restore it.
Only triggers for the brew path (curl-mac/curl-vps don't touch brew so
they leave `archon-stable` alone).
- Phase 5 brew path: after `brew uninstall + untap`, if the flag was set,
re-tap + re-install + rename. Verifies the restored `archon-stable`
reports a version and warns (non-fatal) if the rename target is missing.
Documents the tradeoff: the restored version is "whatever the tap ships
today", not necessarily the pre-test version — usually that's what the
operator wants (the release they just tested becomes stable) but the
back-version-QA case requires a manual `brew-upgrade-archon` after.
- Phase 1 confirmation banner now mentions that `archon-stable` will be
preserved so the operator isn't surprised by the reinstall during Phase 5.
No changes to curl-mac/curl-vps paths. No changes to Phase 4 test suite.
* fix(providers/pi): install PI_PACKAGE_DIR shim so Pi workflows run in a compiled binary (#1360)
v0.3.9 made Pi boot-safe: lazy-loading its imports meant `archon version`
no longer crashed on `@mariozechner/pi-coding-agent/dist/config.js`'s
module-init `readFileSync(getPackageJsonPath())`. That's what the
`provider-lazy-load.test.ts` regression test guards.
The fix was only half the problem though. When a Pi workflow actually
runs, sendQuery() triggers the dynamic import — and Pi's config.js
module-init fires then, hitting the exact same ENOENT on
`dirname(process.execPath)/package.json`. Discovered by running
`archon workflow run test-pi` against a locally-compiled 0.3.9 binary:
[main] Failed: ENOENT: no such file or directory,
open '/private/tmp/package.json'
at readFileSync (unknown)
at <anonymous> (/$bunfs/root/archon-providertest:184:7889)
at init_config
Boot-safe ≠ runtime-safe. The `/test-release` run for 0.3.9 passed
because it only exercised `archon-assist` (Claude); Pi was never
actually invoked on the released binary.
Fix: before the dynamic `import('@mariozechner/pi-coding-agent')` in
sendQuery, install a PI_PACKAGE_DIR shim. Pi's config.js checks
`process.env.PI_PACKAGE_DIR` first in its `getPackageDir()` and
short-circuits the `dirname(process.execPath)` walk. We write a
minimal `{name, version, piConfig:{}}` stub to
`tmpdir()/archon-pi-shim/package.json` (idempotent — existsSync check)
and set the env var. Pi only reads `piConfig.name`, `piConfig.configDir`,
and `version` from that file, all optional, so the stub surface is
genuinely minimal.
Localized to PiProvider: no global state, no mutation of any shared
config, no upstream fork. Claude and Codex providers are unaffected
(their SDKs don't have this class of module-init side effect).
Verified end-to-end: built a compiled archon binary with this patch,
ran `archon workflow run test-pi --no-worktree` (Pi workflow with
model `anthropic/claude-haiku-4-5`), got a clean response. Before the
patch, same binary crashed at `dag_node_started` with the ENOENT above.
Regression test added: asserts `PI_PACKAGE_DIR` is set after sendQuery
hits even its fast-fail "no model" path. Together with the existing
`provider-lazy-load.test.ts` (boot-safe) this covers both halves.
* feat(providers): autodetect canonical binary install paths for Claude and Codex (#1361)
Both binary resolvers previously stopped at env-var + explicit config and
threw a "not found" error when neither was set. Users who followed the
upstream-recommended install flow (Anthropic's `curl install.sh` for
Claude, `npm install -g @openai/codex`) still had to manually set either
`CLAUDE_BIN_PATH` / `CODEX_BIN_PATH` or the corresponding config field
before any workflow could run.
Add a tier-N autodetect step between the explicit config tier and the
install-instructions throw. Purely additive: env and config still win
when set (precedence covered by new tests). On autodetect miss, the same
install-instructions error fires as before.
Claude probe list (verified against docs.claude.com "Uninstall Claude
Code → Native installation" section):
- $HOME/.local/bin/claude (mac/linux native installer)
- $USERPROFILE\.local\bin\claude.exe (Windows native installer)
Codex probe list (verified against openai/codex README; npm global-
install puts the binary at `{npm_prefix}/bin/<name>` on POSIX,
`{npm_prefix}\<name>.cmd` on Windows):
- $HOME/.npm-global/bin/codex (user-set `npm config set prefix`)
- /opt/homebrew/bin/codex (mac arm64 with homebrew-node)
- /usr/local/bin/codex (mac intel / linux system node)
- %APPDATA%\npm\codex.cmd (Windows npm global default)
- $HOME\.npm-global\codex.cmd (Windows user-set prefix)
Not probed (explicit override still required):
- Custom npm prefixes — `npm root -g` would need a subprocess per
resolve, too much surface for a probe helper
- `brew install --cask codex` — cask layout isn't a PATH binary
- Manual GitHub Releases extracts — placement is user-determined
- `~/.bun/bin/codex` — not documented in openai/codex README
Pi provider intentionally has no equivalent change: the Pi SDK is
bundled into the archon binary (no subprocess), so there's no "binary"
to resolve. Pi auth lives at `~/.pi/agent/auth.json` which the SDK
already finds by default, and the PR A shim (`PI_PACKAGE_DIR`) handles
the package-dir case via Pi's own documented escape hatch.
E2E verified: removed both config entries from ~/.archon/config.yaml,
rebuilt compiled binary, ran `archon workflow run archon-assist` and a
Codex workflow. Logs showed `source: 'autodetect'` for both, responses
returned cleanly.
* fix(providers/test): use os.homedir() instead of $HOME in claude binary autodetect test
The native-installer autodetect test computed its expected path from
process.env.HOME, but the implementation uses node:os homedir(). On
Windows, HOME is typically unset (Windows uses USERPROFILE), so the
test fell back to '/Users/test' while the resolver returned the real
home dir — making the spy's path-equality check fail and breaking CI
on windows-latest.
Mirror the implementation by importing homedir() from node:os and
joining with node:path so the expected path matches the actual
platform-resolved home and separator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(server): contain Discord login failure so it doesn't kill the server (#1365)
Reported in #1365: a user running `archon serve` with DISCORD_BOT_TOKEN
set but the "Message Content Intent" toggle disabled in the Discord
Developer Portal saw the entire server crash with `Used disallowed
intents`. Discord rejects the gateway connection (close code 4014) when
a privileged intent is requested without being enabled, and the
unguarded `await discord.start()` propagated the error all the way up,
taking the web UI down with it.
Wrap discord.start() in try/catch — log the failure with an actionable
hint (special-cased for the disallowed-intent error) and continue
running. Other adapters and the web UI come up regardless. The shutdown
handler already uses optional chaining (`discord?.stop()`) so nulling
discord after a failed start is safe.
Other adapters (Telegram, Slack, GitHub, Gitea, GitLab) have the same
unguarded-start pattern but are out of scope for this fix — addressing
them is tracked separately.
Also expanded the Discord setup docs with a caution callout that names
the exact error string and the new log event so users can grep for
both.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(script-nodes): dedicated guide + teach the archon skill (#1362)
* docs(script-nodes): add dedicated guide and teach the archon skill how to write them
Script nodes (script:) have been a first-class DAG node type since v0.3.3 but
were documented only as one-liners in CLAUDE.md and a CI smoke test. Claude
Code reading the archon skill would see "Four Node Types: command, prompt,
bash, loop" and reach for bash+node/python one-liners instead of a proper
script node — losing bun's --no-env-file isolation, uv's --with dependency
pins, and the .archon/scripts/ reuse story.
- New packages/docs-web/src/content/docs/guides/script-nodes.md mirroring the
structure of loop-nodes.md / approval-nodes.md: schema, inline vs named
dispatch, runtime/deps semantics, scripts directory precedence (repo > home),
extension-runtime mapping, env isolation, stdout/stderr contract, patterns,
and the explicit list of ignored AI fields.
- guides/authoring-workflows.md and guides/index.md updated so the new guide is
discoverable from both the node-types table and the guides landing page.
- reference/variables.md calls out the no-shell-quote difference between
bash: and script: substitution — a subtle correctness trap when adapting a
bash pattern into a script node.
- Sidebar order bumped +1 on hooks/mcp-servers/skills/global-workflows/
remotion-workflow to slot script-nodes at order 5 next to the other
node-type guides.
- .claude/skills/archon/SKILL.md: replaces stale "Four Node Types" (which
also silently omitted approval and cancel) with the accurate seven, with a
script-node code block showing both inline and named patterns.
- references/workflow-dag.md: full Script Node section covering dispatch,
resolution, deps, stdout contract, and the list of AI-only fields that are
ignored; validation-rules list updated.
- references/dag-advanced.md and references/variables.md: retry-support line
corrected; no-shell-quote note added.
- examples/dag-workflow.yaml: added an extract-labels TypeScript script node
and updated the header comment.
* fix(docs): review follow-ups for script-node guide
- skills example: extract-labels was reading process.env.ISSUE_JSON which is
never set; use String.raw`$fetch-issue.output` so the upstream bash node's
JSON is actually consumed
- guides/script-nodes.md + skills/workflow-dag.md: idle_timeout is accepted
but ignored on script (and bash) nodes — executeScriptNode only reads
node.timeout. Clarify that script/bash use `timeout`, not idle_timeout
- archon-workflow-builder.yaml: prompt enumerated only bash/prompt/command/loop,
so the AI builder could never propose script or approval nodes. Add both
(plus examples + rule about script output not being shell-quoted) and
regenerate bundled defaults
- book/dag-workflows.md + book/quick-reference.md + adapters/web.md: fill in
the node-type references that were missing script, approval, and cancel.
adapters/web.md also overclaimed "loop" in the palette — NodePalette.tsx
only drags command/prompt/bash, so note that the other kinds are YAML-only
* docs/skill: general hardening — fix inaccuracies, fill workflow/CLI/env gaps, add good-practices + troubleshooting (#1363)
* fix(skill/when): document the full `when:` operator set and compound expressions
The skill reference previously stated "operators: ==, != only" which is
materially wrong — the condition evaluator supports ==, !=, <, >, <=, >=
plus && / || compound expressions with && binding tighter than ||, plus
dot-notation JSON field access. An agent authoring a workflow from the
skill would think half the operators don't exist.
Replaces the single-sentence section with a structured reference covering:
- All six comparison operators (string and numeric modes)
- Compound expressions with precedence rules and short-circuit eval
- JSON dot notation semantics and failure modes
- The fail-closed rules in full (invalid expression, non-numeric side,
missing field, skipped upstream)
Grounded in packages/workflows/src/condition-evaluator.ts.
* feat(skill): document Approval and Cancel node types
Approval and cancel nodes are first-class DAG node types (approval since the
workflow lifecycle work in #871, cancel as a guarded-exit primitive) but the
skill never described either one. An agent reading the skill and asked to
"add a review gate before implementation" or "stop the workflow if the input
is unsafe" would fall back to bash + exit 1, losing the proper semantics
(cancelled vs. failed, on_reject AI rework, web UI auto-resume).
Approval node coverage (references/workflow-dag.md, SKILL.md):
- Full configuration block with message, capture_response, on_reject
- The interactive: true workflow-level requirement for web UI delivery
- Approve/reject commands across all platforms (CLI, slash, natural
language) and the capture_response → $node-id.output flow
- Ignored-fields list + the on_reject.prompt AI sub-node exception
Cancel node coverage (references/workflow-dag.md, SKILL.md):
- Single-field schema (cancel: "<reason>")
- Lifecycle: cancelled (not failed); in-flight parallel nodes stopped;
no DAG auto-resume path
- The "cancel: vs bash-exit-1" decision rule (expected precondition miss
vs. check itself failing)
- Two canonical patterns — upstream-classification gate, pre-expensive-step
gate
Validation-rules list updated to enumerate approval/cancel constraints
(message non-empty, on_reject.max_attempts range 1-10, cancel reason
non-empty), plus a forward note that script: joins the mutually-exclusive
set once PR #1362 lands.
Placement in both files is after the Loop section and before the validation
section, so this commit stays additive with respect to PR #1362's Script
node insertion between Bash and Loop — rebase is clean.
* feat(skill): document workflow-level fields beyond name/provider/model
The skill's Schema section previously showed only name, description, provider,
and model at the workflow level — which is most of a stub. Agents asked to
"use the 1M-context Claude beta" or "run this under a network sandbox" or
"add a fallback model in case Opus rate-limits" had no way to discover
that any of these fields existed at the workflow level.
Adds a comprehensive Workflow-Level Fields section covering:
- Core: name, description, provider, model, interactive (with explicit
callout that interactive: true is REQUIRED for approval/loop gates on
web UI — a common footgun)
- Isolation: worktree.enabled for pin-on/pin-off (the only worktree field
at workflow level; baseBranch/copyFiles/path/initSubmodules are
config.yaml only, so a cross-reference points there)
- Claude SDK advanced: effort, thinking, fallbackModel, betas, sandbox,
with explicit per-node-only exceptions (maxBudgetUsd, systemPrompt)
- Codex-specific: modelReasoningEffort (with note that it's NOT the same
as Claude's effort — this has confused users), webSearchMode,
additionalDirectories
- A complete worked example combining sandbox + approval + interactive
All fields cross-referenced against packages/workflows/src/schemas/workflow.ts
and packages/workflows/src/schemas/dag-node.ts.
* feat(skill/loop): document interactive loops and gate_message
Interactive loop nodes pause between iterations for human feedback via
/workflow approve — used by archon-piv-loop and archon-interactive-prd.
The skill's Loop Nodes section previously omitted both interactive: true
and gate_message entirely, so an agent writing a guided-refinement
workflow wouldn't know the feature exists or that gate_message is
required at parse time.
Adds:
- interactive and gate_message rows to the config table (marking
gate_message as required when interactive: true — enforced by the
loader's superRefine)
- A dedicated "Interactive Loops" subsection explaining the 6-step
iterate-pause-approve-resume flow
- Explicit call-out that $LOOP_USER_INPUT populates ONLY on the first
iteration of a resumed session — easy to miss and a common surprise
- Workflow-level interactive: true requirement for web UI delivery
(loader warning otherwise) so the full-flow example is complete
- Note that until_bash substitution DOES shell-quote $nodeId.output
(unlike script bodies) — called out since the audit surfaced this
inconsistency
* fix(skill/cli): complete the CLI command reference with missing lifecycle commands
The CLI reference previously documented only list, run, cleanup, validate,
complete, version, setup, and chat — missing nearly every workflow
lifecycle command an agent needs to operate a paused, failed, or stuck
run. The interactive-workflows reference assumed these commands existed
without actually documenting them.
Adds full documentation for:
- archon workflow status — show running workflow(s)
- archon workflow approve <run-id> [comment] — resume approval gate
(also populates $LOOP_USER_INPUT on interactive loops and the gate
node's output when capture_response: true)
- archon workflow reject <run-id> [reason] — reject gate; cancels or
triggers on_reject rework depending on node config
- archon workflow cancel <run-id> — terminate running/paused with
in-flight subprocess kill
- archon workflow abandon <run-id> — mark stuck row cancelled without
subprocess kill (for orphan-cleanup after server crashes — matches
the #1216 precedent)
- archon workflow resume <run-id> [message] — force-resume specific
run (auto-resume is default; this is for explicit override)
- archon workflow cleanup [days] — disk hygiene for old terminal runs
(with explicit callout that it does NOT transition 'running' rows,
a common confusion)
- archon workflow event emit — used inside loop prompts for state
signalling; documented so agents don't invent their own mechanism
- archon continue <branch> [flags] [msg] — iterative-session entry
point with --workflow and --no-context flags
Also:
- Adds --allow-env-keys flag to the `workflow run` flag table with
audit-log context and the env-leak-gate remediation use case
- Adds an "Auto-resume without --resume" note disambiguating when
--resume is needed vs. when auto-resume handles it
- Adds --include-closed flag to `isolation cleanup`, which was
previously missing; converts the flag list to a structured table
- Explains the cancel/abandon distinction (live subprocess vs. orphan)
All grounded in packages/cli/src/commands/workflow.ts, continue.ts,
and isolation.ts.
* feat(skill/repo-init): add scripts/ and state/, three-path env model, per-project env injection
The repo-init reference was missing two first-class .archon/ directories
(scripts/ since v0.3.3, state/ since the workflow-state feature) and had
nothing to say about env — the #1 thing a user hits on first-run when
their repo has a .env file with API keys.
Directory tree updates:
- Adds .archon/scripts/ with the extension->runtime rule (.ts/.js -> bun,
.py -> uv) so agents know where to put named scripts referenced by
script: nodes.
- Adds .archon/state/ with explicit "always gitignore" callout — these
are runtime artifacts, not source. Previously undocumented in the skill.
- Adds .archon/.env (repo-scoped Archon env) and distinguishes it from
the target repo's top-level .env.
- Adds a "What each directory is for" list so the structure isn't just
a tree with no narrative.
.gitignore guidance:
- state/ and .env added as must-gitignore (state/ matches CLAUDE.md and
reference/archon-directories.md — skill was lagging).
- mcp/ demoted to conditional — gitignore only if you hardcode secrets.
New "Three-Path Env Model" section:
- ~/.archon/.env (trusted, user), <cwd>/.archon/.env (trusted, repo),
<cwd>/.env (UNTRUSTED, target project — stripped from subprocess env).
- Precedence (override: true across archon-owned paths) and the
observable [archon] loaded N keys / stripped K keys log lines so
operators can verify what actually happened.
- Decision tree for where to put API keys vs. target-project env vs.
things Archon shouldn't touch.
- Links to archon setup --scope home|project with --force for writing
to the right file with timestamped backups.
New "Per-Project Env Injection" section:
- Documents both managed surfaces: .archon/config.yaml env: block
(git-committed, $REF expansion) and Web UI Settings → Projects →
Env Vars (DB-stored, never returned over API).
- Names every execution surface that receives the injected vars:
Claude/Codex/Pi subprocess, bash: nodes, script: nodes, and direct
codebase-scoped chat.
- Documents the env-leak gate with all 5 remediation paths so an agent
hitting "Cannot register: env has sensitive keys" knows the options.
Grounded in CHANGELOG v0.3.7 (three-path env + setup flags), v0.3.0
(env-leak gate), and reference/security.md on the docs site.
* fix(skill/authoring-commands): correct override paths and add home-scoped commands
The file-location and discovery sections described an override layout that
does not match the actual resolver. It showed:
.archon/commands/defaults/archon-assist.md # Overrides the bundled
and claimed `.archon/commands/defaults/` was where repo-level overrides
lived. In fact the resolver (executor-shared.ts:152-200 + command-
validation.ts) walks `.archon/commands/` 1 level deep and uses basename
matching — putting `archon-assist.md` at the top of `.archon/commands/`
is the canonical way to override the bundled version. The `defaults/`
subfolder is a Archon-internal convention for shipping bundled defaults,
not a user-facing override pattern.
Also, home-scoped commands (`~/.archon/commands/`, shipped in v0.3.7)
were completely absent — agents authoring personal helpers wouldn't
know they could live at the user level and be shared across every repo.
Changes:
- File Location section now shows all three discovery scopes (repo,
home, bundled) with precedence ordering and 1-level subfolder rules
- Duplicate-basename rule documented as a user error surface
- Discovery and Priority section rewritten with accurate 3-step lookup
order — no more references to the nonexistent defaults/ override path
- Adds the Web UI "Global (~/.archon/commands/)" palette label note so
users authoring helpers for the builder know what to expect
No code changes — this is a pure fix of stale/incorrect skill reference
material.
* feat(skill): add workflow good-practices and troubleshooting reference pages
Closes two gaps from the audit. The skill previously had zero guidance on
designing multi-node workflows (what to avoid, what to reach for first,
how to structure artifact chains) and zero guidance on where to look
when things go wrong (log paths, env-leak gate remediations, orphan-row
cleanup, resume semantics).
New references/good-practices.md (9 Good Practices + 7 Anti-Patterns):
- Use deterministic nodes (bash:/script:) for deterministic work, AI for
reasoning — the single biggest quality lever
- output_format required whenever downstream when: reads a field — the
most common source of "workflow silently routes wrong"
- trigger_rule: none_failed_min_one_success after conditional branches —
the classic bug where all_success fails because a skipped when:-gated
branch doesn't count as a success
- context: fresh requires artifacts for state passing — commands must
explicitly "read $ARTIFACTS_DIR/..." when downstream of fresh
- Cheap models (haiku) for glue, strong for substance
- Workflow descriptions as routing affordances
- Validate (archon validate workflows) + smoke-run before shipping
- Artifact-chain-first design
- worktree.enabled: true for code-changing workflows (reversibility)
- Anti-patterns with before/after YAML examples for each (AI-for-tests,
free-form when: matching, context: fresh without artifacts, long flat
AI-node layers, secrets in YAML, retry on loop nodes, tiny
max_iterations, missing workflow-level interactive:, tool-restricted
MCP nodes)
New references/troubleshooting.md:
- Log location (~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl)
with jq recipes for common queries (last assistant message, failed
events, full stream)
- Artifact location for cross-node handoff debugging
- 9 Common Failure Modes, each with root cause + concrete fix:
- $BASE_BRANCH unresolvable
- Env-leak gate (5 remediations)
- Claude/Codex binary not found (compiled-binary-only)
- "running" forever (AI working / orphan / idle_timeout)
- Mid-workflow failure and auto-resume semantics
- Approval gate missing on web UI (workflow-level interactive:)
- MCP plugin connection noise (filtered by design)
- Empty $nodeId.output / field access (4 causes)
- Diagnostic command cheat sheet (list, status, isolation list, validate,
tail-log, --verbose, LOG_LEVEL=debug)
- Escalation protocol (version + validate + log tail + CHANGELOG + issue)
SKILL.md routing table now dispatches "Workflow good practices /
anti-patterns" and "Troubleshoot a failing / stuck workflow" to the new
references so an agent can find them without having to know they exist.
* docs(book): update node-types coverage from four to all seven
The book is the curated first-contact reading path (landing page → "Get
Started" → /book/). Both dag-workflows.md and quick-reference.md were
stuck on "four node types" — missing script, approval, and cancel. A user
reading the book as their first introduction would form an incomplete
mental model, then find three more node types in the reference section
later with no explanation of when they arrived.
book/dag-workflows.md:
- "four node types" → "seven node types. Exactly one mode field is
required per node"
- Table now lists Command, Prompt, Bash, Script, Loop, Approval, Cancel
with one-line "when to use" for each, and cross-links to the dedicated
guide pages for Script / Loop / Approval
- New sections below the table for Script (inline + named examples with
runtime and deps), Approval (with the interactive: true workflow-level
note that's easy to miss), and Cancel (guarded-exit pattern) — keeping
the existing narrative shape for Bash and Loop
book/quick-reference.md:
- Node Options table now includes script, approval, cancel rows
- agents row added (inline sub-agents, Claude-only)
- New "Script-specific fields" and "Approval-specific fields" subsections
so the cheat-sheet is actually complete rather than pointing users
elsewhere for the required constraints
- Retry row callout that loop nodes hard-error on retry — previously
omitted
- bash timeout note widened to cover script timeout (same semantics)
Both files are docs-web content; the CI build on the docs-script-nodes
PR (#1362) previously validated the Starlight build path with a similar
table addition, so this should render clean.
* fix(skill/cli): remove nonexistent \`archon workflow cancel\`, fix workflow status jq recipe
Two accuracy issues from the PR code-reviewer (comment 4311243858).
C1: \`archon workflow cancel <run-id>\` does NOT exist as a CLI subcommand.
The switch at packages/cli/src/cli.ts:318-485 dispatches on list / run /
status / resume / abandon / approve / reject / cleanup / event — running
\`archon workflow cancel\` hits the default case and exits with "Unknown
workflow subcommand: cancel" (cli.ts:478-484). Active cancellation is
only available via:
- /workflow cancel <run-id> chat slash command (all platforms)
- Cancel button on the Web UI dashboard
- POST /api/workflows/runs/{runId}/cancel REST endpoint
cli-commands.md: removed the \`### archon workflow cancel <run-id>\`
subsection; kept the \`abandon\` subsection but made it explicit that
abandon does NOT kill a subprocess. Added a call-out box at the bottom
of the abandon section explaining where to go for actual cancellation.
troubleshooting.md "running forever" section: split the original
cancel-vs-abandon advice into three bullets — Web UI / CLI abandon (for
orphans, no subprocess kill) / chat \`/workflow cancel\` (for live runs
that need interruption). Added an explicit "there is no archon workflow
cancel CLI subcommand" parenthetical since the wrong command was being
suggested in flow.
I1: the \`archon workflow list --json\` diagnostic used an incorrect jq
filter. workflow list's --json output (workflow.ts:185-219) has shape
{ workflows: [{ name, description, provider?, model?, ... }], errors: [...] }
with no \`runs\` field — \`jq '.workflows[] | select(.runs)'\` returns empty
unconditionally. Replaced with \`archon workflow status --json | jq '.runs[]'\`,
which matches the actual shape of workflowStatusCommand at
workflow.ts:852+ ({ runs: WorkflowRun[] }). Also tightened the narration
to distinguish JSON from human-readable status output.
No change to the commit history in this PR — these are follow-up fixes
to claims I introduced in earlier commits of this branch (f10b989e for
C1, 66d2b86e for I1).
* fix(skill): remove env-leak gate references (feature was removed in provider extraction)
C2 from the PR code-reviewer (comment 4311243858). The pre-spawn env-leak
gate was removed from the codebase during the provider-extraction refactor
— see TODO(#1135) at packages/providers/src/claude/provider.ts:908. Zero
hits for --allow-env-keys / allowEnvKeys / allow_env_keys / allow_target_repo_keys
across packages/. The CLI's parseArgs (cli.ts:182-208) has no
--allow-env-keys option, and because parseArgs uses strict: false, an
unknown --allow-env-keys would be silently ignored rather than error.
What remains accurate and is NOT touched:
- Three-Path Env Model section (user/repo archon-owned envs are loaded;
target repo <cwd>/.env keys are stripped from process.env at boot)
still correctly describes current behavior, grounded in
packages/paths/src/strip-cwd-env.ts + env-integration.test.ts
- Per-Project Env Injection section (Option 1: .archon/config.yaml env:
block; Option 2: Web UI Settings → Projects → Env Vars) is unchanged —
both remain the sanctioned way to get env vars into subprocesses
Removed claims (all three files):
- cli-commands.md: --allow-env-keys flag row in the workflow run flags
table
- repo-init.md: the "Env-leak gate" subsection at the end of Per-Project
Env Injection listing 5 remediations (all of which reference UI/CLI/
config surfaces that don't exist). Replaced with a succinct callout
that explains the actual current behavior — target repo .env keys are
stripped, workflows that need those values should use managed
injection — so the reader still gets the "where to put my env vars"
answer
- troubleshooting.md: the "Cannot register: codebase has sensitive env
keys" section (error message that can no longer be emitted)
If the env-leak gate is ever resurrected per TODO(#1135), the docs can be
re-added then. The CHANGELOG v0.3.0 entry describing the gate is a
historical record of past behavior and does not need to be rewritten.
* fix(skill/troubleshooting): correct JSONL event type names and field name
C3 from the PR code-reviewer (comment 4311243858). The troubleshooting
reference's event-types table used _started / _completed / _failed
suffixes, but packages/workflows/src/logger.ts:19-30 shows the actual
WorkflowEvent.type enum is:
workflow_start | workflow_complete | workflow_error |
assistant | tool | validation |
node_start | node_complete | node_skipped | node_error
The second jq recipe also queried `.event` but the discriminator is `.type`.
Fixes:
- Event table: renamed columns (_started → _start, _completed → _complete,
_failed → _error). Explicitly called out the field name as `type` so the
reader knows what jq selector to use
- Replaced the "tool_use / tool_result" row with a single `tool` row and
listed its actual payload fields (tool_name, tool_input, duration_ms,
tokens) — tool_use/tool_result are SDK message kinds that appear within
the AI stream, not top-level log event types
- Added a `validation` row (was missing; it's emitted by workflow-level
validation calls with `check` and `result` fields)
- Removed `retry_attempt` row — this event type is not emitted to the
JSONL file. Retry bookkeeping goes through pino logs, not the workflow
log file
- Added an explicit callout that loop_iteration_started /
loop_iteration_completed (and other emitter-only events) go through
the workflow event emitter + DB workflow_events table, NOT the JSONL
file. Pointed readers to the DB or Web UI for loop-level detail. This
distinguishes the two parallel event systems — easy to conflate
(store.ts:11-17 uses _started/_completed/_failed for the DB side,
logger.ts uses _start/_complete/_error for JSONL)
- Fixed the "all failed events" jq recipe: .event → .type and _failed → _error
- Minor cleanup: the inline "tool_use events" mention in the "running
forever" section said the wrong event name — updated to "tool or
assistant events in the tail"
Grounded in packages/workflows/src/logger.ts (canonical JSONL event
shape) and packages/workflows/src/store.ts (the parallel DB event
naming, which the reviewer correctly flagged as different and worth
keeping distinct).
* fix(skill): two stragglers from the code-reviewer audit
Cleanup of two references that slipped through the earlier C1 and C3 fixes:
- references/troubleshooting.md:126: \`node_failed\` → \`node_error\`
(the "Node output is empty" diagnostics section references the JSONL
log, which uses the logger.ts enum — not the DB workflow_events table
which does use \`node_failed\`). The C3 fix corrected the event table
and one jq recipe but missed this inline mention.
- references/interactive-workflows.md:106: removed \`archon workflow
cancel <run-id>\` (nonexistent CLI subcommand) from the
troubleshooting bullet. This was pre-existing before the hardening
PR but fell within the C1 remediation scope. Replaced with the
correct triage: reject (approval gate only) vs abandon (orphan
cleanup, no subprocess kill) vs chat /workflow cancel (actual
subprocess termination).
Grounded in the same sources as the earlier C1/C3 commits:
packages/cli/src/cli.ts:318-485 (no cancel case) and
packages/workflows/src/logger.ts:19-30 (JSONL type enum).
* feat(skill): point to archon.diy as the canonical docs source
The skill had no reference to archon.diy (the live docs site built from
packages/docs-web/). Several reference files said "see the docs site"
without naming the URL, leaving the agent to guess or grep the repo for
the hostname. An agent with the skill loaded should know that when the
distilled reference pages don't cover a case, the full canonical docs
are one WebFetch away.
SKILL.md: new "Richer Context: archon.diy" section between Routing and
Running Workflows. Covers:
- When to reach for the live docs (longer examples, tutorial framing,
features the skill only mentions in passing, "where's that
documented?" user questions)
- URL map — 13 starting points covering getting-started, book (tutorial
series), guides/ (authoring + per-node-type + per-node-feature),
reference/ (variables, CLI, security, architecture, configuration,
troubleshooting), adapters/, deployment/
- Precedence: skill refs first (context-cheap, tuned for agents), docs
site as escalation. Prevents agents defaulting to WebFetch when a
local skill ref already covers the answer
Also upgrades the 5 existing generic "docs site" mentions across
reference files to concrete archon.diy URLs with anchor fragments where
helpful:
- good-practices.md: Inline sub-agents pattern → archon.diy/guides/
authoring-workflows/#inline-sub-agents
- troubleshooting.md: "Install page on the docs site" → archon.diy/
getting-started/installation/
- workflow-dag.md: "Workflow Description Best Practices" → anchor link;
sandbox schema reference → archon.diy/guides/authoring-workflows/
#claude-sdk-advanced-options
- repo-init.md: Security Model reference → archon.diy/reference/
security/#target-repo-env-isolation (deep-link into the section that
covers the <cwd>/.env strip behavior)
URL source of truth: astro.config.mjs:5 (site: 'https://archon.diy').
URL structure mirrors packages/docs-web/src/content/docs/<section>/
<page>.md — verified by the 62 pages the docs build produces.
* chore(workflows): switch default Opus pin to opus[1m] alias (#1395)
Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus /
opus[1m] now resolve to 4.7 with a 1M context window at standard
pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m]
lets bundled default workflows auto-track the recommended Opus version.
No explicit effort is set, so nodes inherit the per-model default
(xhigh on 4.7, high on 4.6).
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1398)
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1380)
The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md)
that the AI agent would sometimes write to a different location, breaking all
downstream nodes that glob for the plan file. Migrated all plan/progress file
references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching
the pattern used by archon-fix-github-issue and other workflows.
Changes:
- Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node
- Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in
refine-plan, code-review, and fix-feedback nodes
- Replace empty-string guard with file-existence check in implement-setup bash
- Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/
- Add explicit plan/progress paths in finalize node
- Regenerated bundled-defaults.generated.ts
Fixes #1380
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(workflow): address review findings in archon-piv-loop
- Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to
eliminate the duplicate heading that collided with Step 3's identical
title in the create-plan node
- Guard implement-setup against a 0-task plan file: exit 1 with a
clear error when no '### Task N:' sections are found, preventing a
silent no-op implement loop
- Remove 2>/dev/null from code-review commit so pre-commit hook failures
and other stderr are visible to the agent instead of silently swallowed
- Replace '|| true' on git push in finalize with an explicit WARNING echo
so push failures (auth, upstream conflict, no remote) surface to the
agent rather than being silently ignored
- Regenerate bundled-defaults.generated.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(workflows): regenerate bundled defaults to match opus[1m] alias
The bundle was stale relative to the YAML sources after #1395 merged —
check:bundled was failing CI. Regenerated; no YAML edits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(workflows): add anyFailed status derivation coverage for DAG executor (#1403)
PIV Task 1: Adds three new tests in a dedicated describe block
'executeDagWorkflow -- final status derivation' covering the anyFailed
branch (dag-executor.ts ~line 2956) that previously had no direct test:
- one success + one independent failure calls failWorkflowRun (not completeWorkflowRun)
- multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun)
- trigger_rule: none_failed skips dependent node but anyFailed still marks run failed
Fixes #1381.
* docs/skill: add parameter-matrix.md quick-lookup reference
New reference for the archon skill: a single-glance lookup of which
parameter works on which node type, an intent-based "how do I..." table,
a consolidated silent-failure catalog, and an inline agents: section
(previously only referenced via archon.diy).
Purpose is complementary, not duplicative:
- workflow-dag.md remains the authoring guide
- dag-advanced.md remains the hooks/MCP/skills/retry deep-dive
- good-practices.md remains the patterns and anti-patterns
- parameter-matrix.md is the grep-this-first lookup when you know the
outcome you want but not which field gets you there
Also registers the new reference in SKILL.md routing table.
* docs: point contributors at PR template and Closes #N convention
Add explicit references to .github/PULL_REQUEST_TEMPLATE.md in both
CONTRIBUTING.md and CLAUDE.md, plus a reminder to link issues with
Closes/Fixes/Resolves so they auto-close on merge. Repo-triage runs
were flagging dozens of partially-filled or unlinked PRs each cycle.
* feat(workflows): add maintainer-standup workflow for daily PR/issue triage (#1428)
* feat(workflows): add maintainer-standup workflow for daily PR/issue triage
Daily morning briefing that pulls origin/dev, triages all open PRs and assigned
issues against direction.md, and surfaces progress vs. the previous run. Designed
for live-checkout use (worktree.enabled: false) so it can read its own state.
Layout under .archon/maintainer-standup/:
- direction.md (committed) — project north-star: what Archon IS / IS NOT.
Drives PR P4 polite-decline classification with cited clauses.
- README.md / profile.md.example — setup docs and template for new maintainers.
- profile.md, state.json, briefs/YYYY-MM-DD.md — gitignored, per-maintainer.
Engine:
- 3 parallel gather scripts in .archon/scripts/maintainer-standup-*.ts
(git-status, gh-data, read-context) — bun runtime, JSON stdout.
- Synthesis node: command file with output_format schema for
{ brief_markdown, next_state }.
- Persist node: tiny inline bun script writes both to disk.
Run-to-run continuity: state.json carries observed_prs/issues snapshots, so the
next run can detect what merged, what closed, what the maintainer shipped, and
which carry-over items aged past N days.
Also adds .archon/** to the ESLint global ignore list (matches the existing
.claude/skills/** pattern) since .archon/ is user content and not part of any
tsconfig project.
* fix(maintainer-standup): address CodeRabbit review on #1428
- gh-data: bump --limit 100 → 1000 on all_open_prs and warn loudly when
the cap is hit; preserves the observed_prs invariant the next-run
"resolved since last run" diff depends on. (CodeRabbit critical)
- maintainer-standup.md: clarify P1 CI signal — the gathered payload only
carries mergeStateStatus, not statusCheckRollup; for borderline P1s,
drill in via `gh pr checks <n>`. (CodeRabbit minor)
- workflow.yaml persist: write briefs under local YYYY-MM-DD (sv-SE
locale) instead of UTC ISO date, so an evening run doesn't file
tomorrow's brief and break recent_briefs lookups. (CodeRabbit minor)
- workflow.yaml persist: wrap state/brief writes in try/catch; on
failure dump brief_markdown and next_state to stderr so a 5-minute
Sonnet synthesis isn't lost to a transient disk error. (CodeRabbit minor)
- gh-data + git-status: switch from execSync (shell-string) to
execFileSync (argv array) for git/gh invocations. Defense-in-depth
against shell metacharacters in values that pass through (esp. the
gh_handle from profile.md). (CodeRabbit nitpick)
* feat(workflows): support explicit tags in workflow YAML (#1190)
Add optional `tags: string[]` to `workflowBaseSchema`. Explicit values take precedence over keyword inference; `tags: []` suppresses inference end-to-end; omitting the field falls back to inference (backwards compatible). Non-array values warn-and-ignore matching the sibling `worktree`/`additionalDirectories` patterns.
* feat(workflows): add maintainer-review-pr and group maintainer workflows under maintainer/ (#1430)
* feat(workflows): add maintainer-review-pr and group maintainer workflows under .archon/workflows/maintainer/
Adds the maintainer-review-pr workflow — a Pi/Minimax-based PR triage
flow that gates on direction alignment, scope focus, and PR-template
quality before doing any deep review. If the gate clears, runs the
five review aspects (code/error-handling/test-coverage/comment-quality/
docs-impact) as parallel Archon nodes and auto-posts a synthesized
review comment. If the gate fails (direction conflict, multiple
concerns, sprawling scope), drafts a polite-decline comment and pauses
for the maintainer's approval before posting.
Reorganizes the existing maintainer-standup workflow into the same
subfolder so all maintainer-facing workflows live together. Subfolder
grouping is supported by the workflow loader (1 level deep, resolution
by filename).
What lands:
- .archon/workflows/maintainer/maintainer-standup.yaml (moved from
.archon/workflows/maintainer-standup.yaml)
- .archon/workflows/maintainer/maintainer-review-pr.yaml (new)
- .archon/commands/maintainer-review-{gate,code-review,error-handling,
test-coverage,comment-quality,docs-impact,synthesize,report}.md (new,
Pi-tuned variants of the existing review-agent commands so they avoid
Claude-only Task / sub-agent patterns)
Pi/Minimax integration:
- Uses provider: pi, model: minimax/MiniMax-M2.7 — verified via the
e2e-minimax-smoke test that Pi correctly routes to Minimax (session
jsonl confirms provider=minimax) and that Pi's best-effort
output_format parser handles the gate's nested schema.
- Two test runs landed real comments: a direction-decline on PR #1335
and a deep-review on PR #1369. Both were posted to GitHub via the
workflow's gh pr comment node.
* chore(workflows): also group repo-triage under .archon/workflows/maintainer/
repo-triage is the third maintainer-facing workflow alongside maintainer-standup and maintainer-review-pr; group it in the same subfolder for consistency. Subfolder resolution is by filename so the workflow name is unchanged.
* feat(pi): use ModelRegistry to support custom models and skip auth for unmapped providers (#1284)
Closes #1096.
- Switch Pi provider model lookup from pi-ai's getModel() (static catalog
only) to ModelRegistry.create(authStorage).find() so user-configured
custom models in ~/.pi/agent/models.json (LM Studio, ollama, llamacpp,
custom OpenAI-compatible endpoints) are discoverable.
- Remove the local lookupPiModel helper.
- For env-var-mapped providers (anthropic, openai, etc.) still throw
with a pi /login hint when credentials are missing. For unmapped
providers, log pi.auth_missing at info and continue so local models
that don't need credentials work without ceremony.
- Surface modelRegistry.getError() in the not-found message and emit
pi.model_not_found so users debugging custom-provider configs see the
real cause (e.g. missing baseUrl in models.json).
- Guard AuthStorage.create() and ModelRegistry.create() with try/catch
so a malformed ~/.pi/agent/auth.json surfaces with Pi-framed context
instead of a raw SDK stack trace.
- Document the credential-free path for local providers in ai-assistants.md.
Co-authored-by: Matt Chapman <Matt@NinjitsuWeb.com>
* chore(workflows): group smoke-test workflows under test-workflows/ + add e2e-minimax-smoke (#1431)
* chore(workflows): group all smoke-test workflows under .archon/workflows/test-workflows/
Move the 7 existing e2e-*.yaml smoke tests plus the new e2e-minimax-smoke
test into a dedicated subfolder. Subfolder grouping is supported by the
workflow loader (1 level deep, resolution by filename) so workflow names
are unchanged. Mirrors the .archon/workflows/maintainer/ split landing
in #1430.
Also adds e2e-minimax-smoke.yaml — a sanity check that Pi correctly
routes to Minimax M2.7 via the user's local pi auth, and that Pi's
best-effort output_format parser handles a small nested schema. Asserts
routing by reading the most recent Pi session jsonl rather than asking
the model to self-identify (LLMs are unreliable narrators about their
own identity, especially when Pi's system prompt mentions other
providers as defaults).
* fix(e2e-minimax-smoke): address CodeRabbit review on #1431
- Widen find window from -mmin -3 to -mmin -10. The smoke's three Pi
nodes plus the assert can collectively run several minutes on slow
networks; 3 minutes was tight enough to false-FAIL on a healthy run.
(CodeRabbit minor)
- Drop non-deterministic `head -1` over `find` output. find doesn't
guarantee any order; on a tie, the wrong file would be picked. Now
iterates all matching sessions and breaks on first one carrying the
routing signal — any match is sufficient evidence. (CodeRabbit minor)
- Replace single-regex `'"provider":"minimax".*"modelId":"MiniMax-M2.7"'`
with two separate greps joined by `&&`. JSON field order isn't part of
Pi's contract; a future Pi release reordering `provider` and `modelId`
in the model_change event would silently false-FAIL the original
pattern. The new check is order-independent. (CodeRabbit major)
* fix(maintainer-review): address CodeRabbit findings on #1430 (#1432)
Six findings, two majors and four minors/nitpicks:
- gate.md L17 vs L77: resolved conflicting input-source instructions.
Body claimed "all inline, no extra fetch" while a later phase
permitted reading PULL_REQUEST_TEMPLATE.md. Now: explicit "one
allowed extra read" callout in Phase 1 + matching wording in Gate C.
(CodeRabbit major)
- gate.md fenced blocks: added missing language identifiers (text/json/
markdown) to satisfy markdownlint MD040. (CodeRabbit minor)
- gate.md L155 + read-context.ts: deterministic clock. The 3-day deadline
was anchored to prior_state.last_run_at, which can be stale and produce
past-dated deadlines. Moved both today and deadline_3d into the
read-context.ts output (computed via sv-SE locale → ISO date in local
time) and instructed the gate to use $read-context.output.deadline_3d
directly. LLMs are unreliable at calendar arithmetic; this avoids it
entirely. (CodeRabbit major)
- maintainer-review-pr.yaml fetch-diff: dropped 2>/dev/null on gh pr diff
so auth / network / deleted-PR failures fail the node instead of
feeding an empty diff to the gate. Empty-but-successful diff (PR has
no changes) is now an explicit marker the gate can detect. (CodeRabbit
minor)
- maintainer-review-pr.yaml approve-unclear: added capture_response: true
so the maintainer's approve comment flows to the report node. Reject
reasoning is already captured by Archon's run record. (CodeRabbit
minor)
- maintainer-review-pr.yaml post-decline + report.md: the gh pr edit
--add-label call previously swallowed all errors with || true and the
report still claimed the label was applied. Now writes applied/skipped
to $ARTIFACTS_DIR/.label-applied + the gh stderr to .label-error so
the report can describe the actual outcome. (CodeRabbit nitpick)
* fix(workflows): approval gate bypass after reject-with-redraft on resume (#1435)
* fix(workflows): approval gate bypass after reject-with-redraft on resume
When an approval node was rejected with on_reject.prompt, the synthetic
PromptNode built to run the on_reject prompt reused the approval gate's
own node ID. executeNodeInternal then wrote a node_completed event with
that ID, causing getCompletedDagNodeOutputs to treat the gate as already
completed on the next resume — bypassing the human gate entirely.
Fix: give the synthetic node the ID `${node.id}:on_reject` so its
node_completed event has a distinct step_name that won't match the
approval gate slot in priorCompletedNodes.
Adds a regression test asserting no node_completed event with the
approval gate's ID is written during on_reject execution.
Fixes #1429
* test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node
Add complementary positive assertion to the regression test to verify that
node_completed is written exactly once with step_name 'review:on_reject',
ensuring future refactors that suppress the event entirely would be caught.
Add inline comment in executeApprovalNode documenting the known SSE side-effect:
node_started/node_completed events with nodeId='review:on_reject' flow through
the SSE pipeline into the web UI, resulting in a transient phantom node in the
execution view. This is cosmetic-only — the human gate contract is preserved.
* simplify: reduce duplicate cast pattern in on_reject test assertions
* feat(workflows): add mutates_checkout to allow concurrent runs on live checkout (#1438)
* feat(workflows): add mutates_checkout field to skip path-lock for concurrent runs
Add `mutates_checkout: boolean` (optional, default true) to the workflow
schema. When set to false, the executor skips the path-exclusive lock
that serializes all runs on the same working path, allowing N concurrent
runs on the same live checkout.
The primary use case is `maintainer-review-pr`, which reads shared state
but writes only to per-run artifact paths and GitHub PR comments — two
parallel reviews of different PRs should not fail with "Workflow already
active on this path".
Changes:
- `schemas/workflow.ts`: add optional `mutates_checkout` field
- `loader.ts`: parse and propagate the field (warn-and-ignore on invalid values)
- `executor.ts`: wrap path-lock guard in `if (workflow.mutates_checkout !== false)`
- `executor.test.ts`: two new tests in the concurrent-run guard suite
- `maintainer-review-pr.yaml`: opt in with `mutates_checkout: false`
* test(workflows): add loader tests for mutates_checkout parsing
- Add 5 tests covering false, true, omitted, and invalid (string "yes") values
- Invalid non-boolean values are silently dropped with warn — now explicitly tested
- Remove the // end mutates_checkout guard trailing comment (no precedent in file)
- Clarify loader comment: "parse/warn pattern" not "warn-and-ignore pattern" to avoid implying the return style matches interactive
* simplify: collapse nodeType/aiFields pair into single nonAiNode object in parseDagNode
* docs: replace String.raw with direct assignment in script node examples (#1434)
* docs: replace String.raw with direct assignment in script node examples
String.raw`$nodeId.output` fails silently when substituted output contains
a backtick, terminating the template literal early and producing cryptic parse
errors. JSON is valid JS expression syntax, so direct assignment is safe for
all valid JSON values including those with backticks.
- Replace String.raw pattern in dag-workflow.yaml example
- Replace String.raw pattern in archon-workflow-builder.yaml template
- Add CAUTION bullet in workflow-dag.md Script Node section
- Add Silent Failures item #14 in parameter-matrix.md
- Add Starlight caution aside in script-nodes.md
- Extend script bodies bullet in variables.md
- Regenerate bundled-defaults.generated.ts
Fixes #1427
* docs: fix Rule 6 in generate-yaml prompt to distinguish bun vs uv patterns
Rule 6 still referenced JSON.parse after the example was updated to direct
assignment, creating a contradiction for the AI code generator. Update the
prose to explicitly distinguish TypeScript/bun (direct assignment) from
Python/uv (json.loads), matching the updated embedded example.
* chore(workflows): group experimental workflows under .archon/workflows/experimental/
Move two repo-scoped workflows that were sitting untracked at the workflow
root into a dedicated subfolder. Subfolder grouping is supported by the
loader (1 level deep, resolution by filename), so workflow names are
unchanged and the /release skill still resolves archon-release correctly.
Files moved:
- archon-fix-github-issue-experimental.yaml — Path-A variant of the
issue-fix workflow used today to land #1434, #1435, #1438.
- archon-release.yaml — the live release workflow used by the /release
skill end-to-end (validate -> binary smoke -> version bump -> changelog
-> approval -> commit -> PR -> tag -> Homebrew formula update).
* fix(workflows): export ARTIFACTS_DIR, LOG_DIR, BASE_BRANCH to bash nodes (#1387)
executeBashNode previously only merged explicit envVars on top of
process.env. The three well-known workflow directories (artifactsDir,
logDir, baseBranch) were passed as function parameters and used for
compile-time substitution of $ARTIFACTS_DIR / $LOG_DIR / $BASE_BRANCH
in the script body, but were never added to the subprocess environment.
As a result, any script that relied on shell-runtime expansion — e.g.
JSON_FILE="${ARTIFACTS_DIR}/foo.output.json" inside a heredoc, an
inherited helper script, or a `bash -c` subshell — saw the variable
unset and silently fell back to its default (typically an empty string
or "."), writing artifacts to the workflow cwd instead of the nominal
artifacts directory.
Always build subprocessEnv from process.env plus the three well-known
directories, then allow explicit envVars to override. Compile-time
substitution behavior is unchanged; existing scripts that do not
reference these variables are unaffected; user-supplied envVars still
win on conflict.
* fix(workflow): substitute $nodeId.output refs in approval messages (#1426)
* fix(workflow): substitute \$nodeId.output refs in approval messages
Approval node messages were emitted as raw strings, bypassing the
substituteNodeOutputRefs() pass that prompt/bash/loop/cancel nodes
all run. This made interactive workflows like atlas-onboard show
literal "\$gather-context.output.repo_name" placeholders to humans
at HITL gates, leaving them unable to know what they were approving.
Fix: rendered the approval.message through substituteNodeOutputRefs
once at the top of the standard approval gate path, then used the
resolved string in all 4 emission sites (safeSendMessage,
createWorkflowEvent, pauseWorkflowRun, event-emitter).
Test: new dag-executor.test case wires a structured-output upstream
node into an approval node and asserts pauseWorkflowRun receives the
substituted message ("Repo: hcr-els | App: CCELS | Port: 3012")
rather than the literal placeholders.
Repro: any workflow with an approval node whose message references
\$nodeId.output[.field]. Observed in the wild on atlas-onboard's
confirm-context HITL gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(workflow): extend approval-substitution test to cover all 4 emission sites
Per CodeRabbit review: the original test only verified pauseWorkflowRun
received the substituted message, but the fix touches 4 emission sites.
A future regression at safeSendMessage / createWorkflowEvent / event-emitter
would silently leave the test passing while users still saw raw $node.output
placeholders.
Adds two additional assertions:
- platform.sendMessage prompt contains substituted message + does NOT
contain literal $gather-context.output placeholders
- The persisted approval_requested workflow event's data.message is
substituted
Event-emitter assertion deferred (no existing pattern for spying on the
global emitter in this test file). Two of three secondary surfaces
covered closes the practical regression risk — both are user-visible
(chat prompt + audit-log event); the emitter is internal only.
Test count: 7 pass / 22 expect() (was 18). Full suite 193 pass / 353
expect() — no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286) (#1367)
* feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286)
Adds a new substitution variable that carries the previous loop iteration's
cleaned output into the next iteration's prompt. Empty on iteration 1; the
prior iteration's output (after stripCompletionTags) on iteration 2+.
Why: fresh_context: true loops have no way to reference what the previous
pass produced or why it failed without dragging the full session forward.
$LOOP_PREV_OUTPUT closes that gap with zero session-cost — same trust
boundary as $nodeId.output, no new external surface.
Changes:
- packages/workflows/src/executor-shared.ts: substituteWorkflowVariables
accepts a 10th positional loopPrevOutput arg and substitutes
$LOOP_PREV_OUTPUT (defaults to '').
- packages/workflows/src/dag-executor.ts: executeLoopNode passes
lastIterationOutput on iteration 2+ (and explicit '' on iteration 1 /
the first iteration of an interactive resume, since lastIterationOutput
is a per-call variable that does not survive resume metadata).
- Unit tests: 3 new cases in executor-shared.test.ts.
- Integration tests: 2 new cases in dag-executor.test.ts verifying the
prompt sent to the AI on iter 1 vs iter 2, and that the value reflects
cleaned output (no <promise> tags).
- Docs: variables.md, loop-nodes.md (new "Retry-on-failure" pattern),
CLAUDE.md variable reference.
Backward compatibility: prompts that don't reference $LOOP_PREV_OUTPUT are
unaffected. All 843 workflow tests + type-check + lint + format:check +
bun run validate pass locally.
* docs: address coderabbit review on variables/loop-nodes
- variables.md: include $LOOP_PREV_OUTPUT in substitution-order list and
availability table to match the new variable row at line 30
- loop-nodes.md: document the interactive-resume exception where the first
iteration after an approval-gate resume still receives an empty
$LOOP_PREV_OUTPUT regardless of iteration number (per dag-executor.ts
L1781-1783 where i === startIteration always clears prev output)
* docs(changelog): add Unreleased entry for $LOOP_PREV_OUTPUT (#1367 review)
* test(loop): add resume-from-approval integration test for $LOOP_PREV_OUTPUT (#1367 review)
Per maintainer-review-pr suggestion (Wirasm): two-call integration test
covering the resume-from-approval scenario.
- Call 1: fresh interactive loop pauses at the gate after iteration 1 and
asserts $LOOP_PREV_OUTPUT substitutes to empty on iter 1 (no prior
output) plus the gate pause is recorded.
- Call 2: resumed run with metadata.approval populated. The first
resumed iteration must substitute $LOOP_PREV_OUTPUT to '', NOT to the
paused run's iter-1 output (which lived in a different process and is
not persisted). $LOOP_USER_INPUT still flows through as normal.
Locks the documented invariant at dag-executor.ts:1769-1772.
---------
Co-authored-by: voidborne-d <DottyEstradalco@allergist.com>
* feat(maintainer-standup): surface contributor replies since last run (#1457)
The brief was missing a key signal — when contributors reply on PRs or
issues, the maintainer wouldn't see it explicitly. Empirically reviewed
PR replies were buried under aggregate updatedAt timestamps with no
indication of WHO replied or WHAT they said.
This adds a new "Replies waiting on you" section to the daily brief,
sourced from two paginated GitHub API calls scoped by since=last_run_at:
- /repos/{o}/{r}/issues/comments PR + issue conversation comments
- /repos/{o}/{r}/pulls/comments inline code-review comments
Filters applied:
- Skip the maintainer's own comments (gh_handle from profile.md)
- Skip GitHub bot accounts (login ending in [bot]) — coderabbitai,
chatgpt-codex-connector, dependabot, etc. They post a constant
churn of automated review tooling that drowns out human replies;
the maintainer wants the latter.
Output is grouped by PR/issue number with kind classification:
- issue comment on a non-PR issue
- pr_conversation PR conversation-level comment
- pr_review inline code-review comment (most actionable —
usually needs a code-level response, so kind
upgrades to pr_review whenever review comments
arrive on a PR that also has conversation ones)
Sorted by recency (newest reply first). Synthesizer reads
gh-data.output.replies_since_last_run and renders a section.
Verified on a backdated state.json (last_run_at = yesterday morning):
22 human replies on 22 PRs/issues, bot noise filtered (32 → 22 after
the [bot] filter). Surfaces…
mionemedia
pushed a commit
that referenced
this pull request
May 17, 2026
* chore: update Homebrew formula for v0.3.9
* chore(release-skill): use --help (not version) for Step 1.5 smoke probe (#1359)
The pre-flight binary smoke does a bare `bun build --compile` — it
deliberately skips `scripts/build-binaries.sh` to stay fast. That means
packages/paths/src/bundled-build.ts retains its dev defaults, including
BUNDLED_IS_BINARY = false.
version.ts branches on BUNDLED_IS_BINARY: when true it returns the
embedded string; when false it calls getDevVersion(), which reads
package.json at `SCRIPT_DIR/../../../../package.json`. Inside a compiled
binary SCRIPT_DIR resolves under `$bunfs/root/`, the walk produces a CWD-
relative path that doesn't exist, and the smoke aborts with "Failed to
read version: package.json not found" — a false positive.
Hit during the 0.3.8 release attempt: the real Pi lazy-load fix was
working end-to-end; the smoke test was the only thing failing.
Use --help instead. It exercises the same module-init graph (so it still
catches the real failure modes the skill lists — Pi package.json init
crash, Bun --bytecode bugs, CJS wrapper issues, circular imports under
minify) but has no dev/binary branch, so no false positive.
Also add a longer comment block explaining why --help is preferred, so
this doesn't get "normalized" back to `version` by a future drive-by.
* chore(test-release-skill): preserve archon-stable across test cycles
The brew path of /test-release runs `brew uninstall` in Phase 5 to leave the
system in its pre-test state. For operators using the dual-homebrew pattern
(renamed brew binary at `/opt/homebrew/bin/archon-stable` so it coexists with
a `bun link` dev `archon`), that uninstall wipes the Cellar dir the
`archon-stable` symlink points into → `archon-stable` becomes dangling →
`brew cleanup` sweeps it away on the next brew op. Next time the operator
wants stable, they have to manually re-run `brew-upgrade-archon`.
Fix: make the skill aware of `archon-stable` and restore it transparently.
- Phase 2 item 4: detect the `archon-stable` symlink before any brew op;
export `ARCHON_STABLE_WAS_INSTALLED=yes` so Phase 5 knows to restore it.
Only triggers for the brew path (curl-mac/curl-vps don't touch brew so
they leave `archon-stable` alone).
- Phase 5 brew path: after `brew uninstall + untap`, if the flag was set,
re-tap + re-install + rename. Verifies the restored `archon-stable`
reports a version and warns (non-fatal) if the rename target is missing.
Documents the tradeoff: the restored version is "whatever the tap ships
today", not necessarily the pre-test version — usually that's what the
operator wants (the release they just tested becomes stable) but the
back-version-QA case requires a manual `brew-upgrade-archon` after.
- Phase 1 confirmation banner now mentions that `archon-stable` will be
preserved so the operator isn't surprised by the reinstall during Phase 5.
No changes to curl-mac/curl-vps paths. No changes to Phase 4 test suite.
* fix(providers/pi): install PI_PACKAGE_DIR shim so Pi workflows run in a compiled binary (#1360)
v0.3.9 made Pi boot-safe: lazy-loading its imports meant `archon version`
no longer crashed on `@mariozechner/pi-coding-agent/dist/config.js`'s
module-init `readFileSync(getPackageJsonPath())`. That's what the
`provider-lazy-load.test.ts` regression test guards.
The fix was only half the problem though. When a Pi workflow actually
runs, sendQuery() triggers the dynamic import — and Pi's config.js
module-init fires then, hitting the exact same ENOENT on
`dirname(process.execPath)/package.json`. Discovered by running
`archon workflow run test-pi` against a locally-compiled 0.3.9 binary:
[main] Failed: ENOENT: no such file or directory,
open '/private/tmp/package.json'
at readFileSync (unknown)
at <anonymous> (/$bunfs/root/archon-providertest:184:7889)
at init_config
Boot-safe ≠ runtime-safe. The `/test-release` run for 0.3.9 passed
because it only exercised `archon-assist` (Claude); Pi was never
actually invoked on the released binary.
Fix: before the dynamic `import('@mariozechner/pi-coding-agent')` in
sendQuery, install a PI_PACKAGE_DIR shim. Pi's config.js checks
`process.env.PI_PACKAGE_DIR` first in its `getPackageDir()` and
short-circuits the `dirname(process.execPath)` walk. We write a
minimal `{name, version, piConfig:{}}` stub to
`tmpdir()/archon-pi-shim/package.json` (idempotent — existsSync check)
and set the env var. Pi only reads `piConfig.name`, `piConfig.configDir`,
and `version` from that file, all optional, so the stub surface is
genuinely minimal.
Localized to PiProvider: no global state, no mutation of any shared
config, no upstream fork. Claude and Codex providers are unaffected
(their SDKs don't have this class of module-init side effect).
Verified end-to-end: built a compiled archon binary with this patch,
ran `archon workflow run test-pi --no-worktree` (Pi workflow with
model `anthropic/claude-haiku-4-5`), got a clean response. Before the
patch, same binary crashed at `dag_node_started` with the ENOENT above.
Regression test added: asserts `PI_PACKAGE_DIR` is set after sendQuery
hits even its fast-fail "no model" path. Together with the existing
`provider-lazy-load.test.ts` (boot-safe) this covers both halves.
* feat(providers): autodetect canonical binary install paths for Claude and Codex (#1361)
Both binary resolvers previously stopped at env-var + explicit config and
threw a "not found" error when neither was set. Users who followed the
upstream-recommended install flow (Anthropic's `curl install.sh` for
Claude, `npm install -g @openai/codex`) still had to manually set either
`CLAUDE_BIN_PATH` / `CODEX_BIN_PATH` or the corresponding config field
before any workflow could run.
Add a tier-N autodetect step between the explicit config tier and the
install-instructions throw. Purely additive: env and config still win
when set (precedence covered by new tests). On autodetect miss, the same
install-instructions error fires as before.
Claude probe list (verified against docs.claude.com "Uninstall Claude
Code → Native installation" section):
- $HOME/.local/bin/claude (mac/linux native installer)
- $USERPROFILE\.local\bin\claude.exe (Windows native installer)
Codex probe list (verified against openai/codex README; npm global-
install puts the binary at `{npm_prefix}/bin/<name>` on POSIX,
`{npm_prefix}\<name>.cmd` on Windows):
- $HOME/.npm-global/bin/codex (user-set `npm config set prefix`)
- /opt/homebrew/bin/codex (mac arm64 with homebrew-node)
- /usr/local/bin/codex (mac intel / linux system node)
- %APPDATA%\npm\codex.cmd (Windows npm global default)
- $HOME\.npm-global\codex.cmd (Windows user-set prefix)
Not probed (explicit override still required):
- Custom npm prefixes — `npm root -g` would need a subprocess per
resolve, too much surface for a probe helper
- `brew install --cask codex` — cask layout isn't a PATH binary
- Manual GitHub Releases extracts — placement is user-determined
- `~/.bun/bin/codex` — not documented in openai/codex README
Pi provider intentionally has no equivalent change: the Pi SDK is
bundled into the archon binary (no subprocess), so there's no "binary"
to resolve. Pi auth lives at `~/.pi/agent/auth.json` which the SDK
already finds by default, and the PR A shim (`PI_PACKAGE_DIR`) handles
the package-dir case via Pi's own documented escape hatch.
E2E verified: removed both config entries from ~/.archon/config.yaml,
rebuilt compiled binary, ran `archon workflow run archon-assist` and a
Codex workflow. Logs showed `source: 'autodetect'` for both, responses
returned cleanly.
* fix(providers/test): use os.homedir() instead of $HOME in claude binary autodetect test
The native-installer autodetect test computed its expected path from
process.env.HOME, but the implementation uses node:os homedir(). On
Windows, HOME is typically unset (Windows uses USERPROFILE), so the
test fell back to '/Users/test' while the resolver returned the real
home dir — making the spy's path-equality check fail and breaking CI
on windows-latest.
Mirror the implementation by importing homedir() from node:os and
joining with node:path so the expected path matches the actual
platform-resolved home and separator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(server): contain Discord login failure so it doesn't kill the server (#1365)
Reported in #1365: a user running `archon serve` with DISCORD_BOT_TOKEN
set but the "Message Content Intent" toggle disabled in the Discord
Developer Portal saw the entire server crash with `Used disallowed
intents`. Discord rejects the gateway connection (close code 4014) when
a privileged intent is requested without being enabled, and the
unguarded `await discord.start()` propagated the error all the way up,
taking the web UI down with it.
Wrap discord.start() in try/catch — log the failure with an actionable
hint (special-cased for the disallowed-intent error) and continue
running. Other adapters and the web UI come up regardless. The shutdown
handler already uses optional chaining (`discord?.stop()`) so nulling
discord after a failed start is safe.
Other adapters (Telegram, Slack, GitHub, Gitea, GitLab) have the same
unguarded-start pattern but are out of scope for this fix — addressing
them is tracked separately.
Also expanded the Discord setup docs with a caution callout that names
the exact error string and the new log event so users can grep for
both.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(script-nodes): dedicated guide + teach the archon skill (#1362)
* docs(script-nodes): add dedicated guide and teach the archon skill how to write them
Script nodes (script:) have been a first-class DAG node type since v0.3.3 but
were documented only as one-liners in CLAUDE.md and a CI smoke test. Claude
Code reading the archon skill would see "Four Node Types: command, prompt,
bash, loop" and reach for bash+node/python one-liners instead of a proper
script node — losing bun's --no-env-file isolation, uv's --with dependency
pins, and the .archon/scripts/ reuse story.
- New packages/docs-web/src/content/docs/guides/script-nodes.md mirroring the
structure of loop-nodes.md / approval-nodes.md: schema, inline vs named
dispatch, runtime/deps semantics, scripts directory precedence (repo > home),
extension-runtime mapping, env isolation, stdout/stderr contract, patterns,
and the explicit list of ignored AI fields.
- guides/authoring-workflows.md and guides/index.md updated so the new guide is
discoverable from both the node-types table and the guides landing page.
- reference/variables.md calls out the no-shell-quote difference between
bash: and script: substitution — a subtle correctness trap when adapting a
bash pattern into a script node.
- Sidebar order bumped +1 on hooks/mcp-servers/skills/global-workflows/
remotion-workflow to slot script-nodes at order 5 next to the other
node-type guides.
- .claude/skills/archon/SKILL.md: replaces stale "Four Node Types" (which
also silently omitted approval and cancel) with the accurate seven, with a
script-node code block showing both inline and named patterns.
- references/workflow-dag.md: full Script Node section covering dispatch,
resolution, deps, stdout contract, and the list of AI-only fields that are
ignored; validation-rules list updated.
- references/dag-advanced.md and references/variables.md: retry-support line
corrected; no-shell-quote note added.
- examples/dag-workflow.yaml: added an extract-labels TypeScript script node
and updated the header comment.
* fix(docs): review follow-ups for script-node guide
- skills example: extract-labels was reading process.env.ISSUE_JSON which is
never set; use String.raw`$fetch-issue.output` so the upstream bash node's
JSON is actually consumed
- guides/script-nodes.md + skills/workflow-dag.md: idle_timeout is accepted
but ignored on script (and bash) nodes — executeScriptNode only reads
node.timeout. Clarify that script/bash use `timeout`, not idle_timeout
- archon-workflow-builder.yaml: prompt enumerated only bash/prompt/command/loop,
so the AI builder could never propose script or approval nodes. Add both
(plus examples + rule about script output not being shell-quoted) and
regenerate bundled defaults
- book/dag-workflows.md + book/quick-reference.md + adapters/web.md: fill in
the node-type references that were missing script, approval, and cancel.
adapters/web.md also overclaimed "loop" in the palette — NodePalette.tsx
only drags command/prompt/bash, so note that the other kinds are YAML-only
* docs/skill: general hardening — fix inaccuracies, fill workflow/CLI/env gaps, add good-practices + troubleshooting (#1363)
* fix(skill/when): document the full `when:` operator set and compound expressions
The skill reference previously stated "operators: ==, != only" which is
materially wrong — the condition evaluator supports ==, !=, <, >, <=, >=
plus && / || compound expressions with && binding tighter than ||, plus
dot-notation JSON field access. An agent authoring a workflow from the
skill would think half the operators don't exist.
Replaces the single-sentence section with a structured reference covering:
- All six comparison operators (string and numeric modes)
- Compound expressions with precedence rules and short-circuit eval
- JSON dot notation semantics and failure modes
- The fail-closed rules in full (invalid expression, non-numeric side,
missing field, skipped upstream)
Grounded in packages/workflows/src/condition-evaluator.ts.
* feat(skill): document Approval and Cancel node types
Approval and cancel nodes are first-class DAG node types (approval since the
workflow lifecycle work in #871, cancel as a guarded-exit primitive) but the
skill never described either one. An agent reading the skill and asked to
"add a review gate before implementation" or "stop the workflow if the input
is unsafe" would fall back to bash + exit 1, losing the proper semantics
(cancelled vs. failed, on_reject AI rework, web UI auto-resume).
Approval node coverage (references/workflow-dag.md, SKILL.md):
- Full configuration block with message, capture_response, on_reject
- The interactive: true workflow-level requirement for web UI delivery
- Approve/reject commands across all platforms (CLI, slash, natural
language) and the capture_response → $node-id.output flow
- Ignored-fields list + the on_reject.prompt AI sub-node exception
Cancel node coverage (references/workflow-dag.md, SKILL.md):
- Single-field schema (cancel: "<reason>")
- Lifecycle: cancelled (not failed); in-flight parallel nodes stopped;
no DAG auto-resume path
- The "cancel: vs bash-exit-1" decision rule (expected precondition miss
vs. check itself failing)
- Two canonical patterns — upstream-classification gate, pre-expensive-step
gate
Validation-rules list updated to enumerate approval/cancel constraints
(message non-empty, on_reject.max_attempts range 1-10, cancel reason
non-empty), plus a forward note that script: joins the mutually-exclusive
set once PR #1362 lands.
Placement in both files is after the Loop section and before the validation
section, so this commit stays additive with respect to PR #1362's Script
node insertion between Bash and Loop — rebase is clean.
* feat(skill): document workflow-level fields beyond name/provider/model
The skill's Schema section previously showed only name, description, provider,
and model at the workflow level — which is most of a stub. Agents asked to
"use the 1M-context Claude beta" or "run this under a network sandbox" or
"add a fallback model in case Opus rate-limits" had no way to discover
that any of these fields existed at the workflow level.
Adds a comprehensive Workflow-Level Fields section covering:
- Core: name, description, provider, model, interactive (with explicit
callout that interactive: true is REQUIRED for approval/loop gates on
web UI — a common footgun)
- Isolation: worktree.enabled for pin-on/pin-off (the only worktree field
at workflow level; baseBranch/copyFiles/path/initSubmodules are
config.yaml only, so a cross-reference points there)
- Claude SDK advanced: effort, thinking, fallbackModel, betas, sandbox,
with explicit per-node-only exceptions (maxBudgetUsd, systemPrompt)
- Codex-specific: modelReasoningEffort (with note that it's NOT the same
as Claude's effort — this has confused users), webSearchMode,
additionalDirectories
- A complete worked example combining sandbox + approval + interactive
All fields cross-referenced against packages/workflows/src/schemas/workflow.ts
and packages/workflows/src/schemas/dag-node.ts.
* feat(skill/loop): document interactive loops and gate_message
Interactive loop nodes pause between iterations for human feedback via
/workflow approve — used by archon-piv-loop and archon-interactive-prd.
The skill's Loop Nodes section previously omitted both interactive: true
and gate_message entirely, so an agent writing a guided-refinement
workflow wouldn't know the feature exists or that gate_message is
required at parse time.
Adds:
- interactive and gate_message rows to the config table (marking
gate_message as required when interactive: true — enforced by the
loader's superRefine)
- A dedicated "Interactive Loops" subsection explaining the 6-step
iterate-pause-approve-resume flow
- Explicit call-out that $LOOP_USER_INPUT populates ONLY on the first
iteration of a resumed session — easy to miss and a common surprise
- Workflow-level interactive: true requirement for web UI delivery
(loader warning otherwise) so the full-flow example is complete
- Note that until_bash substitution DOES shell-quote $nodeId.output
(unlike script bodies) — called out since the audit surfaced this
inconsistency
* fix(skill/cli): complete the CLI command reference with missing lifecycle commands
The CLI reference previously documented only list, run, cleanup, validate,
complete, version, setup, and chat — missing nearly every workflow
lifecycle command an agent needs to operate a paused, failed, or stuck
run. The interactive-workflows reference assumed these commands existed
without actually documenting them.
Adds full documentation for:
- archon workflow status — show running workflow(s)
- archon workflow approve <run-id> [comment] — resume approval gate
(also populates $LOOP_USER_INPUT on interactive loops and the gate
node's output when capture_response: true)
- archon workflow reject <run-id> [reason] — reject gate; cancels or
triggers on_reject rework depending on node config
- archon workflow cancel <run-id> — terminate running/paused with
in-flight subprocess kill
- archon workflow abandon <run-id> — mark stuck row cancelled without
subprocess kill (for orphan-cleanup after server crashes — matches
the #1216 precedent)
- archon workflow resume <run-id> [message] — force-resume specific
run (auto-resume is default; this is for explicit override)
- archon workflow cleanup [days] — disk hygiene for old terminal runs
(with explicit callout that it does NOT transition 'running' rows,
a common confusion)
- archon workflow event emit — used inside loop prompts for state
signalling; documented so agents don't invent their own mechanism
- archon continue <branch> [flags] [msg] — iterative-session entry
point with --workflow and --no-context flags
Also:
- Adds --allow-env-keys flag to the `workflow run` flag table with
audit-log context and the env-leak-gate remediation use case
- Adds an "Auto-resume without --resume" note disambiguating when
--resume is needed vs. when auto-resume handles it
- Adds --include-closed flag to `isolation cleanup`, which was
previously missing; converts the flag list to a structured table
- Explains the cancel/abandon distinction (live subprocess vs. orphan)
All grounded in packages/cli/src/commands/workflow.ts, continue.ts,
and isolation.ts.
* feat(skill/repo-init): add scripts/ and state/, three-path env model, per-project env injection
The repo-init reference was missing two first-class .archon/ directories
(scripts/ since v0.3.3, state/ since the workflow-state feature) and had
nothing to say about env — the #1 thing a user hits on first-run when
their repo has a .env file with API keys.
Directory tree updates:
- Adds .archon/scripts/ with the extension->runtime rule (.ts/.js -> bun,
.py -> uv) so agents know where to put named scripts referenced by
script: nodes.
- Adds .archon/state/ with explicit "always gitignore" callout — these
are runtime artifacts, not source. Previously undocumented in the skill.
- Adds .archon/.env (repo-scoped Archon env) and distinguishes it from
the target repo's top-level .env.
- Adds a "What each directory is for" list so the structure isn't just
a tree with no narrative.
.gitignore guidance:
- state/ and .env added as must-gitignore (state/ matches CLAUDE.md and
reference/archon-directories.md — skill was lagging).
- mcp/ demoted to conditional — gitignore only if you hardcode secrets.
New "Three-Path Env Model" section:
- ~/.archon/.env (trusted, user), <cwd>/.archon/.env (trusted, repo),
<cwd>/.env (UNTRUSTED, target project — stripped from subprocess env).
- Precedence (override: true across archon-owned paths) and the
observable [archon] loaded N keys / stripped K keys log lines so
operators can verify what actually happened.
- Decision tree for where to put API keys vs. target-project env vs.
things Archon shouldn't touch.
- Links to archon setup --scope home|project with --force for writing
to the right file with timestamped backups.
New "Per-Project Env Injection" section:
- Documents both managed surfaces: .archon/config.yaml env: block
(git-committed, $REF expansion) and Web UI Settings → Projects →
Env Vars (DB-stored, never returned over API).
- Names every execution surface that receives the injected vars:
Claude/Codex/Pi subprocess, bash: nodes, script: nodes, and direct
codebase-scoped chat.
- Documents the env-leak gate with all 5 remediation paths so an agent
hitting "Cannot register: env has sensitive keys" knows the options.
Grounded in CHANGELOG v0.3.7 (three-path env + setup flags), v0.3.0
(env-leak gate), and reference/security.md on the docs site.
* fix(skill/authoring-commands): correct override paths and add home-scoped commands
The file-location and discovery sections described an override layout that
does not match the actual resolver. It showed:
.archon/commands/defaults/archon-assist.md # Overrides the bundled
and claimed `.archon/commands/defaults/` was where repo-level overrides
lived. In fact the resolver (executor-shared.ts:152-200 + command-
validation.ts) walks `.archon/commands/` 1 level deep and uses basename
matching — putting `archon-assist.md` at the top of `.archon/commands/`
is the canonical way to override the bundled version. The `defaults/`
subfolder is a Archon-internal convention for shipping bundled defaults,
not a user-facing override pattern.
Also, home-scoped commands (`~/.archon/commands/`, shipped in v0.3.7)
were completely absent — agents authoring personal helpers wouldn't
know they could live at the user level and be shared across every repo.
Changes:
- File Location section now shows all three discovery scopes (repo,
home, bundled) with precedence ordering and 1-level subfolder rules
- Duplicate-basename rule documented as a user error surface
- Discovery and Priority section rewritten with accurate 3-step lookup
order — no more references to the nonexistent defaults/ override path
- Adds the Web UI "Global (~/.archon/commands/)" palette label note so
users authoring helpers for the builder know what to expect
No code changes — this is a pure fix of stale/incorrect skill reference
material.
* feat(skill): add workflow good-practices and troubleshooting reference pages
Closes two gaps from the audit. The skill previously had zero guidance on
designing multi-node workflows (what to avoid, what to reach for first,
how to structure artifact chains) and zero guidance on where to look
when things go wrong (log paths, env-leak gate remediations, orphan-row
cleanup, resume semantics).
New references/good-practices.md (9 Good Practices + 7 Anti-Patterns):
- Use deterministic nodes (bash:/script:) for deterministic work, AI for
reasoning — the single biggest quality lever
- output_format required whenever downstream when: reads a field — the
most common source of "workflow silently routes wrong"
- trigger_rule: none_failed_min_one_success after conditional branches —
the classic bug where all_success fails because a skipped when:-gated
branch doesn't count as a success
- context: fresh requires artifacts for state passing — commands must
explicitly "read $ARTIFACTS_DIR/..." when downstream of fresh
- Cheap models (haiku) for glue, strong for substance
- Workflow descriptions as routing affordances
- Validate (archon validate workflows) + smoke-run before shipping
- Artifact-chain-first design
- worktree.enabled: true for code-changing workflows (reversibility)
- Anti-patterns with before/after YAML examples for each (AI-for-tests,
free-form when: matching, context: fresh without artifacts, long flat
AI-node layers, secrets in YAML, retry on loop nodes, tiny
max_iterations, missing workflow-level interactive:, tool-restricted
MCP nodes)
New references/troubleshooting.md:
- Log location (~/.archon/workspaces/<owner>/<repo>/logs/<run-id>.jsonl)
with jq recipes for common queries (last assistant message, failed
events, full stream)
- Artifact location for cross-node handoff debugging
- 9 Common Failure Modes, each with root cause + concrete fix:
- $BASE_BRANCH unresolvable
- Env-leak gate (5 remediations)
- Claude/Codex binary not found (compiled-binary-only)
- "running" forever (AI working / orphan / idle_timeout)
- Mid-workflow failure and auto-resume semantics
- Approval gate missing on web UI (workflow-level interactive:)
- MCP plugin connection noise (filtered by design)
- Empty $nodeId.output / field access (4 causes)
- Diagnostic command cheat sheet (list, status, isolation list, validate,
tail-log, --verbose, LOG_LEVEL=debug)
- Escalation protocol (version + validate + log tail + CHANGELOG + issue)
SKILL.md routing table now dispatches "Workflow good practices /
anti-patterns" and "Troubleshoot a failing / stuck workflow" to the new
references so an agent can find them without having to know they exist.
* docs(book): update node-types coverage from four to all seven
The book is the curated first-contact reading path (landing page → "Get
Started" → /book/). Both dag-workflows.md and quick-reference.md were
stuck on "four node types" — missing script, approval, and cancel. A user
reading the book as their first introduction would form an incomplete
mental model, then find three more node types in the reference section
later with no explanation of when they arrived.
book/dag-workflows.md:
- "four node types" → "seven node types. Exactly one mode field is
required per node"
- Table now lists Command, Prompt, Bash, Script, Loop, Approval, Cancel
with one-line "when to use" for each, and cross-links to the dedicated
guide pages for Script / Loop / Approval
- New sections below the table for Script (inline + named examples with
runtime and deps), Approval (with the interactive: true workflow-level
note that's easy to miss), and Cancel (guarded-exit pattern) — keeping
the existing narrative shape for Bash and Loop
book/quick-reference.md:
- Node Options table now includes script, approval, cancel rows
- agents row added (inline sub-agents, Claude-only)
- New "Script-specific fields" and "Approval-specific fields" subsections
so the cheat-sheet is actually complete rather than pointing users
elsewhere for the required constraints
- Retry row callout that loop nodes hard-error on retry — previously
omitted
- bash timeout note widened to cover script timeout (same semantics)
Both files are docs-web content; the CI build on the docs-script-nodes
PR (#1362) previously validated the Starlight build path with a similar
table addition, so this should render clean.
* fix(skill/cli): remove nonexistent \`archon workflow cancel\`, fix workflow status jq recipe
Two accuracy issues from the PR code-reviewer (comment 4311243858).
C1: \`archon workflow cancel <run-id>\` does NOT exist as a CLI subcommand.
The switch at packages/cli/src/cli.ts:318-485 dispatches on list / run /
status / resume / abandon / approve / reject / cleanup / event — running
\`archon workflow cancel\` hits the default case and exits with "Unknown
workflow subcommand: cancel" (cli.ts:478-484). Active cancellation is
only available via:
- /workflow cancel <run-id> chat slash command (all platforms)
- Cancel button on the Web UI dashboard
- POST /api/workflows/runs/{runId}/cancel REST endpoint
cli-commands.md: removed the \`### archon workflow cancel <run-id>\`
subsection; kept the \`abandon\` subsection but made it explicit that
abandon does NOT kill a subprocess. Added a call-out box at the bottom
of the abandon section explaining where to go for actual cancellation.
troubleshooting.md "running forever" section: split the original
cancel-vs-abandon advice into three bullets — Web UI / CLI abandon (for
orphans, no subprocess kill) / chat \`/workflow cancel\` (for live runs
that need interruption). Added an explicit "there is no archon workflow
cancel CLI subcommand" parenthetical since the wrong command was being
suggested in flow.
I1: the \`archon workflow list --json\` diagnostic used an incorrect jq
filter. workflow list's --json output (workflow.ts:185-219) has shape
{ workflows: [{ name, description, provider?, model?, ... }], errors: [...] }
with no \`runs\` field — \`jq '.workflows[] | select(.runs)'\` returns empty
unconditionally. Replaced with \`archon workflow status --json | jq '.runs[]'\`,
which matches the actual shape of workflowStatusCommand at
workflow.ts:852+ ({ runs: WorkflowRun[] }). Also tightened the narration
to distinguish JSON from human-readable status output.
No change to the commit history in this PR — these are follow-up fixes
to claims I introduced in earlier commits of this branch (f10b989e for
C1, 66d2b86e for I1).
* fix(skill): remove env-leak gate references (feature was removed in provider extraction)
C2 from the PR code-reviewer (comment 4311243858). The pre-spawn env-leak
gate was removed from the codebase during the provider-extraction refactor
— see TODO(#1135) at packages/providers/src/claude/provider.ts:908. Zero
hits for --allow-env-keys / allowEnvKeys / allow_env_keys / allow_target_repo_keys
across packages/. The CLI's parseArgs (cli.ts:182-208) has no
--allow-env-keys option, and because parseArgs uses strict: false, an
unknown --allow-env-keys would be silently ignored rather than error.
What remains accurate and is NOT touched:
- Three-Path Env Model section (user/repo archon-owned envs are loaded;
target repo <cwd>/.env keys are stripped from process.env at boot)
still correctly describes current behavior, grounded in
packages/paths/src/strip-cwd-env.ts + env-integration.test.ts
- Per-Project Env Injection section (Option 1: .archon/config.yaml env:
block; Option 2: Web UI Settings → Projects → Env Vars) is unchanged —
both remain the sanctioned way to get env vars into subprocesses
Removed claims (all three files):
- cli-commands.md: --allow-env-keys flag row in the workflow run flags
table
- repo-init.md: the "Env-leak gate" subsection at the end of Per-Project
Env Injection listing 5 remediations (all of which reference UI/CLI/
config surfaces that don't exist). Replaced with a succinct callout
that explains the actual current behavior — target repo .env keys are
stripped, workflows that need those values should use managed
injection — so the reader still gets the "where to put my env vars"
answer
- troubleshooting.md: the "Cannot register: codebase has sensitive env
keys" section (error message that can no longer be emitted)
If the env-leak gate is ever resurrected per TODO(#1135), the docs can be
re-added then. The CHANGELOG v0.3.0 entry describing the gate is a
historical record of past behavior and does not need to be rewritten.
* fix(skill/troubleshooting): correct JSONL event type names and field name
C3 from the PR code-reviewer (comment 4311243858). The troubleshooting
reference's event-types table used _started / _completed / _failed
suffixes, but packages/workflows/src/logger.ts:19-30 shows the actual
WorkflowEvent.type enum is:
workflow_start | workflow_complete | workflow_error |
assistant | tool | validation |
node_start | node_complete | node_skipped | node_error
The second jq recipe also queried `.event` but the discriminator is `.type`.
Fixes:
- Event table: renamed columns (_started → _start, _completed → _complete,
_failed → _error). Explicitly called out the field name as `type` so the
reader knows what jq selector to use
- Replaced the "tool_use / tool_result" row with a single `tool` row and
listed its actual payload fields (tool_name, tool_input, duration_ms,
tokens) — tool_use/tool_result are SDK message kinds that appear within
the AI stream, not top-level log event types
- Added a `validation` row (was missing; it's emitted by workflow-level
validation calls with `check` and `result` fields)
- Removed `retry_attempt` row — this event type is not emitted to the
JSONL file. Retry bookkeeping goes through pino logs, not the workflow
log file
- Added an explicit callout that loop_iteration_started /
loop_iteration_completed (and other emitter-only events) go through
the workflow event emitter + DB workflow_events table, NOT the JSONL
file. Pointed readers to the DB or Web UI for loop-level detail. This
distinguishes the two parallel event systems — easy to conflate
(store.ts:11-17 uses _started/_completed/_failed for the DB side,
logger.ts uses _start/_complete/_error for JSONL)
- Fixed the "all failed events" jq recipe: .event → .type and _failed → _error
- Minor cleanup: the inline "tool_use events" mention in the "running
forever" section said the wrong event name — updated to "tool or
assistant events in the tail"
Grounded in packages/workflows/src/logger.ts (canonical JSONL event
shape) and packages/workflows/src/store.ts (the parallel DB event
naming, which the reviewer correctly flagged as different and worth
keeping distinct).
* fix(skill): two stragglers from the code-reviewer audit
Cleanup of two references that slipped through the earlier C1 and C3 fixes:
- references/troubleshooting.md:126: \`node_failed\` → \`node_error\`
(the "Node output is empty" diagnostics section references the JSONL
log, which uses the logger.ts enum — not the DB workflow_events table
which does use \`node_failed\`). The C3 fix corrected the event table
and one jq recipe but missed this inline mention.
- references/interactive-workflows.md:106: removed \`archon workflow
cancel <run-id>\` (nonexistent CLI subcommand) from the
troubleshooting bullet. This was pre-existing before the hardening
PR but fell within the C1 remediation scope. Replaced with the
correct triage: reject (approval gate only) vs abandon (orphan
cleanup, no subprocess kill) vs chat /workflow cancel (actual
subprocess termination).
Grounded in the same sources as the earlier C1/C3 commits:
packages/cli/src/cli.ts:318-485 (no cancel case) and
packages/workflows/src/logger.ts:19-30 (JSONL type enum).
* feat(skill): point to archon.diy as the canonical docs source
The skill had no reference to archon.diy (the live docs site built from
packages/docs-web/). Several reference files said "see the docs site"
without naming the URL, leaving the agent to guess or grep the repo for
the hostname. An agent with the skill loaded should know that when the
distilled reference pages don't cover a case, the full canonical docs
are one WebFetch away.
SKILL.md: new "Richer Context: archon.diy" section between Routing and
Running Workflows. Covers:
- When to reach for the live docs (longer examples, tutorial framing,
features the skill only mentions in passing, "where's that
documented?" user questions)
- URL map — 13 starting points covering getting-started, book (tutorial
series), guides/ (authoring + per-node-type + per-node-feature),
reference/ (variables, CLI, security, architecture, configuration,
troubleshooting), adapters/, deployment/
- Precedence: skill refs first (context-cheap, tuned for agents), docs
site as escalation. Prevents agents defaulting to WebFetch when a
local skill ref already covers the answer
Also upgrades the 5 existing generic "docs site" mentions across
reference files to concrete archon.diy URLs with anchor fragments where
helpful:
- good-practices.md: Inline sub-agents pattern → archon.diy/guides/
authoring-workflows/#inline-sub-agents
- troubleshooting.md: "Install page on the docs site" → archon.diy/
getting-started/installation/
- workflow-dag.md: "Workflow Description Best Practices" → anchor link;
sandbox schema reference → archon.diy/guides/authoring-workflows/
#claude-sdk-advanced-options
- repo-init.md: Security Model reference → archon.diy/reference/
security/#target-repo-env-isolation (deep-link into the section that
covers the <cwd>/.env strip behavior)
URL source of truth: astro.config.mjs:5 (site: 'https://archon.diy').
URL structure mirrors packages/docs-web/src/content/docs/<section>/
<page>.md — verified by the 62 pages the docs build produces.
* chore(workflows): switch default Opus pin to opus[1m] alias (#1395)
Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus /
opus[1m] now resolve to 4.7 with a 1M context window at standard
pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m]
lets bundled default workflows auto-track the recommended Opus version.
No explicit effort is set, so nodes inherit the per-model default
(xhigh on 4.7, high on 4.6).
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1398)
* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (#1380)
The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md)
that the AI agent would sometimes write to a different location, breaking all
downstream nodes that glob for the plan file. Migrated all plan/progress file
references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching
the pattern used by archon-fix-github-issue and other workflows.
Changes:
- Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node
- Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in
refine-plan, code-review, and fix-feedback nodes
- Replace empty-string guard with file-existence check in implement-setup bash
- Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/
- Add explicit plan/progress paths in finalize node
- Regenerated bundled-defaults.generated.ts
Fixes #1380
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(workflow): address review findings in archon-piv-loop
- Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to
eliminate the duplicate heading that collided with Step 3's identical
title in the create-plan node
- Guard implement-setup against a 0-task plan file: exit 1 with a
clear error when no '### Task N:' sections are found, preventing a
silent no-op implement loop
- Remove 2>/dev/null from code-review commit so pre-commit hook failures
and other stderr are visible to the agent instead of silently swallowed
- Replace '|| true' on git push in finalize with an explicit WARNING echo
so push failures (auth, upstream conflict, no remote) surface to the
agent rather than being silently ignored
- Regenerate bundled-defaults.generated.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(workflows): regenerate bundled defaults to match opus[1m] alias
The bundle was stale relative to the YAML sources after #1395 merged —
check:bundled was failing CI. Regenerated; no YAML edits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(workflows): add anyFailed status derivation coverage for DAG executor (#1403)
PIV Task 1: Adds three new tests in a dedicated describe block
'executeDagWorkflow -- final status derivation' covering the anyFailed
branch (dag-executor.ts ~line 2956) that previously had no direct test:
- one success + one independent failure calls failWorkflowRun (not completeWorkflowRun)
- multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun)
- trigger_rule: none_failed skips dependent node but anyFailed still marks run failed
Fixes #1381.
* docs/skill: add parameter-matrix.md quick-lookup reference
New reference for the archon skill: a single-glance lookup of which
parameter works on which node type, an intent-based "how do I..." table,
a consolidated silent-failure catalog, and an inline agents: section
(previously only referenced via archon.diy).
Purpose is complementary, not duplicative:
- workflow-dag.md remains the authoring guide
- dag-advanced.md remains the hooks/MCP/skills/retry deep-dive
- good-practices.md remains the patterns and anti-patterns
- parameter-matrix.md is the grep-this-first lookup when you know the
outcome you want but not which field gets you there
Also registers the new reference in SKILL.md routing table.
* docs: point contributors at PR template and Closes #N convention
Add explicit references to .github/PULL_REQUEST_TEMPLATE.md in both
CONTRIBUTING.md and CLAUDE.md, plus a reminder to link issues with
Closes/Fixes/Resolves so they auto-close on merge. Repo-triage runs
were flagging dozens of partially-filled or unlinked PRs each cycle.
* feat(workflows): add maintainer-standup workflow for daily PR/issue triage (#1428)
* feat(workflows): add maintainer-standup workflow for daily PR/issue triage
Daily morning briefing that pulls origin/dev, triages all open PRs and assigned
issues against direction.md, and surfaces progress vs. the previous run. Designed
for live-checkout use (worktree.enabled: false) so it can read its own state.
Layout under .archon/maintainer-standup/:
- direction.md (committed) — project north-star: what Archon IS / IS NOT.
Drives PR P4 polite-decline classification with cited clauses.
- README.md / profile.md.example — setup docs and template for new maintainers.
- profile.md, state.json, briefs/YYYY-MM-DD.md — gitignored, per-maintainer.
Engine:
- 3 parallel gather scripts in .archon/scripts/maintainer-standup-*.ts
(git-status, gh-data, read-context) — bun runtime, JSON stdout.
- Synthesis node: command file with output_format schema for
{ brief_markdown, next_state }.
- Persist node: tiny inline bun script writes both to disk.
Run-to-run continuity: state.json carries observed_prs/issues snapshots, so the
next run can detect what merged, what closed, what the maintainer shipped, and
which carry-over items aged past N days.
Also adds .archon/** to the ESLint global ignore list (matches the existing
.claude/skills/** pattern) since .archon/ is user content and not part of any
tsconfig project.
* fix(maintainer-standup): address CodeRabbit review on #1428
- gh-data: bump --limit 100 → 1000 on all_open_prs and warn loudly when
the cap is hit; preserves the observed_prs invariant the next-run
"resolved since last run" diff depends on. (CodeRabbit critical)
- maintainer-standup.md: clarify P1 CI signal — the gathered payload only
carries mergeStateStatus, not statusCheckRollup; for borderline P1s,
drill in via `gh pr checks <n>`. (CodeRabbit minor)
- workflow.yaml persist: write briefs under local YYYY-MM-DD (sv-SE
locale) instead of UTC ISO date, so an evening run doesn't file
tomorrow's brief and break recent_briefs lookups. (CodeRabbit minor)
- workflow.yaml persist: wrap state/brief writes in try/catch; on
failure dump brief_markdown and next_state to stderr so a 5-minute
Sonnet synthesis isn't lost to a transient disk error. (CodeRabbit minor)
- gh-data + git-status: switch from execSync (shell-string) to
execFileSync (argv array) for git/gh invocations. Defense-in-depth
against shell metacharacters in values that pass through (esp. the
gh_handle from profile.md). (CodeRabbit nitpick)
* feat(workflows): support explicit tags in workflow YAML (#1190)
Add optional `tags: string[]` to `workflowBaseSchema`. Explicit values take precedence over keyword inference; `tags: []` suppresses inference end-to-end; omitting the field falls back to inference (backwards compatible). Non-array values warn-and-ignore matching the sibling `worktree`/`additionalDirectories` patterns.
* feat(workflows): add maintainer-review-pr and group maintainer workflows under maintainer/ (#1430)
* feat(workflows): add maintainer-review-pr and group maintainer workflows under .archon/workflows/maintainer/
Adds the maintainer-review-pr workflow — a Pi/Minimax-based PR triage
flow that gates on direction alignment, scope focus, and PR-template
quality before doing any deep review. If the gate clears, runs the
five review aspects (code/error-handling/test-coverage/comment-quality/
docs-impact) as parallel Archon nodes and auto-posts a synthesized
review comment. If the gate fails (direction conflict, multiple
concerns, sprawling scope), drafts a polite-decline comment and pauses
for the maintainer's approval before posting.
Reorganizes the existing maintainer-standup workflow into the same
subfolder so all maintainer-facing workflows live together. Subfolder
grouping is supported by the workflow loader (1 level deep, resolution
by filename).
What lands:
- .archon/workflows/maintainer/maintainer-standup.yaml (moved from
.archon/workflows/maintainer-standup.yaml)
- .archon/workflows/maintainer/maintainer-review-pr.yaml (new)
- .archon/commands/maintainer-review-{gate,code-review,error-handling,
test-coverage,comment-quality,docs-impact,synthesize,report}.md (new,
Pi-tuned variants of the existing review-agent commands so they avoid
Claude-only Task / sub-agent patterns)
Pi/Minimax integration:
- Uses provider: pi, model: minimax/MiniMax-M2.7 — verified via the
e2e-minimax-smoke test that Pi correctly routes to Minimax (session
jsonl confirms provider=minimax) and that Pi's best-effort
output_format parser handles the gate's nested schema.
- Two test runs landed real comments: a direction-decline on PR #1335
and a deep-review on PR #1369. Both were posted to GitHub via the
workflow's gh pr comment node.
* chore(workflows): also group repo-triage under .archon/workflows/maintainer/
repo-triage is the third maintainer-facing workflow alongside maintainer-standup and maintainer-review-pr; group it in the same subfolder for consistency. Subfolder resolution is by filename so the workflow name is unchanged.
* feat(pi): use ModelRegistry to support custom models and skip auth for unmapped providers (#1284)
Closes #1096.
- Switch Pi provider model lookup from pi-ai's getModel() (static catalog
only) to ModelRegistry.create(authStorage).find() so user-configured
custom models in ~/.pi/agent/models.json (LM Studio, ollama, llamacpp,
custom OpenAI-compatible endpoints) are discoverable.
- Remove the local lookupPiModel helper.
- For env-var-mapped providers (anthropic, openai, etc.) still throw
with a pi /login hint when credentials are missing. For unmapped
providers, log pi.auth_missing at info and continue so local models
that don't need credentials work without ceremony.
- Surface modelRegistry.getError() in the not-found message and emit
pi.model_not_found so users debugging custom-provider configs see the
real cause (e.g. missing baseUrl in models.json).
- Guard AuthStorage.create() and ModelRegistry.create() with try/catch
so a malformed ~/.pi/agent/auth.json surfaces with Pi-framed context
instead of a raw SDK stack trace.
- Document the credential-free path for local providers in ai-assistants.md.
Co-authored-by: Matt Chapman <Matt@NinjitsuWeb.com>
* chore(workflows): group smoke-test workflows under test-workflows/ + add e2e-minimax-smoke (#1431)
* chore(workflows): group all smoke-test workflows under .archon/workflows/test-workflows/
Move the 7 existing e2e-*.yaml smoke tests plus the new e2e-minimax-smoke
test into a dedicated subfolder. Subfolder grouping is supported by the
workflow loader (1 level deep, resolution by filename) so workflow names
are unchanged. Mirrors the .archon/workflows/maintainer/ split landing
in #1430.
Also adds e2e-minimax-smoke.yaml — a sanity check that Pi correctly
routes to Minimax M2.7 via the user's local pi auth, and that Pi's
best-effort output_format parser handles a small nested schema. Asserts
routing by reading the most recent Pi session jsonl rather than asking
the model to self-identify (LLMs are unreliable narrators about their
own identity, especially when Pi's system prompt mentions other
providers as defaults).
* fix(e2e-minimax-smoke): address CodeRabbit review on #1431
- Widen find window from -mmin -3 to -mmin -10. The smoke's three Pi
nodes plus the assert can collectively run several minutes on slow
networks; 3 minutes was tight enough to false-FAIL on a healthy run.
(CodeRabbit minor)
- Drop non-deterministic `head -1` over `find` output. find doesn't
guarantee any order; on a tie, the wrong file would be picked. Now
iterates all matching sessions and breaks on first one carrying the
routing signal — any match is sufficient evidence. (CodeRabbit minor)
- Replace single-regex `'"provider":"minimax".*"modelId":"MiniMax-M2.7"'`
with two separate greps joined by `&&`. JSON field order isn't part of
Pi's contract; a future Pi release reordering `provider` and `modelId`
in the model_change event would silently false-FAIL the original
pattern. The new check is order-independent. (CodeRabbit major)
* fix(maintainer-review): address CodeRabbit findings on #1430 (#1432)
Six findings, two majors and four minors/nitpicks:
- gate.md L17 vs L77: resolved conflicting input-source instructions.
Body claimed "all inline, no extra fetch" while a later phase
permitted reading PULL_REQUEST_TEMPLATE.md. Now: explicit "one
allowed extra read" callout in Phase 1 + matching wording in Gate C.
(CodeRabbit major)
- gate.md fenced blocks: added missing language identifiers (text/json/
markdown) to satisfy markdownlint MD040. (CodeRabbit minor)
- gate.md L155 + read-context.ts: deterministic clock. The 3-day deadline
was anchored to prior_state.last_run_at, which can be stale and produce
past-dated deadlines. Moved both today and deadline_3d into the
read-context.ts output (computed via sv-SE locale → ISO date in local
time) and instructed the gate to use $read-context.output.deadline_3d
directly. LLMs are unreliable at calendar arithmetic; this avoids it
entirely. (CodeRabbit major)
- maintainer-review-pr.yaml fetch-diff: dropped 2>/dev/null on gh pr diff
so auth / network / deleted-PR failures fail the node instead of
feeding an empty diff to the gate. Empty-but-successful diff (PR has
no changes) is now an explicit marker the gate can detect. (CodeRabbit
minor)
- maintainer-review-pr.yaml approve-unclear: added capture_response: true
so the maintainer's approve comment flows to the report node. Reject
reasoning is already captured by Archon's run record. (CodeRabbit
minor)
- maintainer-review-pr.yaml post-decline + report.md: the gh pr edit
--add-label call previously swallowed all errors with || true and the
report still claimed the label was applied. Now writes applied/skipped
to $ARTIFACTS_DIR/.label-applied + the gh stderr to .label-error so
the report can describe the actual outcome. (CodeRabbit nitpick)
* fix(workflows): approval gate bypass after reject-with-redraft on resume (#1435)
* fix(workflows): approval gate bypass after reject-with-redraft on resume
When an approval node was rejected with on_reject.prompt, the synthetic
PromptNode built to run the on_reject prompt reused the approval gate's
own node ID. executeNodeInternal then wrote a node_completed event with
that ID, causing getCompletedDagNodeOutputs to treat the gate as already
completed on the next resume — bypassing the human gate entirely.
Fix: give the synthetic node the ID `${node.id}:on_reject` so its
node_completed event has a distinct step_name that won't match the
approval gate slot in priorCompletedNodes.
Adds a regression test asserting no node_completed event with the
approval gate's ID is written during on_reject execution.
Fixes #1429
* test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node
Add complementary positive assertion to the regression test to verify that
node_completed is written exactly once with step_name 'review:on_reject',
ensuring future refactors that suppress the event entirely would be caught.
Add inline comment in executeApprovalNode documenting the known SSE side-effect:
node_started/node_completed events with nodeId='review:on_reject' flow through
the SSE pipeline into the web UI, resulting in a transient phantom node in the
execution view. This is cosmetic-only — the human gate contract is preserved.
* simplify: reduce duplicate cast pattern in on_reject test assertions
* feat(workflows): add mutates_checkout to allow concurrent runs on live checkout (#1438)
* feat(workflows): add mutates_checkout field to skip path-lock for concurrent runs
Add `mutates_checkout: boolean` (optional, default true) to the workflow
schema. When set to false, the executor skips the path-exclusive lock
that serializes all runs on the same working path, allowing N concurrent
runs on the same live checkout.
The primary use case is `maintainer-review-pr`, which reads shared state
but writes only to per-run artifact paths and GitHub PR comments — two
parallel reviews of different PRs should not fail with "Workflow already
active on this path".
Changes:
- `schemas/workflow.ts`: add optional `mutates_checkout` field
- `loader.ts`: parse and propagate the field (warn-and-ignore on invalid values)
- `executor.ts`: wrap path-lock guard in `if (workflow.mutates_checkout !== false)`
- `executor.test.ts`: two new tests in the concurrent-run guard suite
- `maintainer-review-pr.yaml`: opt in with `mutates_checkout: false`
* test(workflows): add loader tests for mutates_checkout parsing
- Add 5 tests covering false, true, omitted, and invalid (string "yes") values
- Invalid non-boolean values are silently dropped with warn — now explicitly tested
- Remove the // end mutates_checkout guard trailing comment (no precedent in file)
- Clarify loader comment: "parse/warn pattern" not "warn-and-ignore pattern" to avoid implying the return style matches interactive
* simplify: collapse nodeType/aiFields pair into single nonAiNode object in parseDagNode
* docs: replace String.raw with direct assignment in script node examples (#1434)
* docs: replace String.raw with direct assignment in script node examples
String.raw`$nodeId.output` fails silently when substituted output contains
a backtick, terminating the template literal early and producing cryptic parse
errors. JSON is valid JS expression syntax, so direct assignment is safe for
all valid JSON values including those with backticks.
- Replace String.raw pattern in dag-workflow.yaml example
- Replace String.raw pattern in archon-workflow-builder.yaml template
- Add CAUTION bullet in workflow-dag.md Script Node section
- Add Silent Failures item #14 in parameter-matrix.md
- Add Starlight caution aside in script-nodes.md
- Extend script bodies bullet in variables.md
- Regenerate bundled-defaults.generated.ts
Fixes #1427
* docs: fix Rule 6 in generate-yaml prompt to distinguish bun vs uv patterns
Rule 6 still referenced JSON.parse after the example was updated to direct
assignment, creating a contradiction for the AI code generator. Update the
prose to explicitly distinguish TypeScript/bun (direct assignment) from
Python/uv (json.loads), matching the updated embedded example.
* chore(workflows): group experimental workflows under .archon/workflows/experimental/
Move two repo-scoped workflows that were sitting untracked at the workflow
root into a dedicated subfolder. Subfolder grouping is supported by the
loader (1 level deep, resolution by filename), so workflow names are
unchanged and the /release skill still resolves archon-release correctly.
Files moved:
- archon-fix-github-issue-experimental.yaml — Path-A variant of the
issue-fix workflow used today to land #1434, #1435, #1438.
- archon-release.yaml — the live release workflow used by the /release
skill end-to-end (validate -> binary smoke -> version bump -> changelog
-> approval -> commit -> PR -> tag -> Homebrew formula update).
* fix(workflows): export ARTIFACTS_DIR, LOG_DIR, BASE_BRANCH to bash nodes (#1387)
executeBashNode previously only merged explicit envVars on top of
process.env. The three well-known workflow directories (artifactsDir,
logDir, baseBranch) were passed as function parameters and used for
compile-time substitution of $ARTIFACTS_DIR / $LOG_DIR / $BASE_BRANCH
in the script body, but were never added to the subprocess environment.
As a result, any script that relied on shell-runtime expansion — e.g.
JSON_FILE="${ARTIFACTS_DIR}/foo.output.json" inside a heredoc, an
inherited helper script, or a `bash -c` subshell — saw the variable
unset and silently fell back to its default (typically an empty string
or "."), writing artifacts to the workflow cwd instead of the nominal
artifacts directory.
Always build subprocessEnv from process.env plus the three well-known
directories, then allow explicit envVars to override. Compile-time
substitution behavior is unchanged; existing scripts that do not
reference these variables are unaffected; user-supplied envVars still
win on conflict.
* fix(workflow): substitute $nodeId.output refs in approval messages (#1426)
* fix(workflow): substitute \$nodeId.output refs in approval messages
Approval node messages were emitted as raw strings, bypassing the
substituteNodeOutputRefs() pass that prompt/bash/loop/cancel nodes
all run. This made interactive workflows like atlas-onboard show
literal "\$gather-context.output.repo_name" placeholders to humans
at HITL gates, leaving them unable to know what they were approving.
Fix: rendered the approval.message through substituteNodeOutputRefs
once at the top of the standard approval gate path, then used the
resolved string in all 4 emission sites (safeSendMessage,
createWorkflowEvent, pauseWorkflowRun, event-emitter).
Test: new dag-executor.test case wires a structured-output upstream
node into an approval node and asserts pauseWorkflowRun receives the
substituted message ("Repo: hcr-els | App: CCELS | Port: 3012")
rather than the literal placeholders.
Repro: any workflow with an approval node whose message references
\$nodeId.output[.field]. Observed in the wild on atlas-onboard's
confirm-context HITL gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(workflow): extend approval-substitution test to cover all 4 emission sites
Per CodeRabbit review: the original test only verified pauseWorkflowRun
received the substituted message, but the fix touches 4 emission sites.
A future regression at safeSendMessage / createWorkflowEvent / event-emitter
would silently leave the test passing while users still saw raw $node.output
placeholders.
Adds two additional assertions:
- platform.sendMessage prompt contains substituted message + does NOT
contain literal $gather-context.output placeholders
- The persisted approval_requested workflow event's data.message is
substituted
Event-emitter assertion deferred (no existing pattern for spying on the
global emitter in this test file). Two of three secondary surfaces
covered closes the practical regression risk — both are user-visible
(chat prompt + audit-log event); the emitter is internal only.
Test count: 7 pass / 22 expect() (was 18). Full suite 193 pass / 353
expect() — no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286) (#1367)
* feat(workflows): expose $LOOP_PREV_OUTPUT in loop node prompts (#1286)
Adds a new substitution variable that carries the previous loop iteration's
cleaned output into the next iteration's prompt. Empty on iteration 1; the
prior iteration's output (after stripCompletionTags) on iteration 2+.
Why: fresh_context: true loops have no way to reference what the previous
pass produced or why it failed without dragging the full session forward.
$LOOP_PREV_OUTPUT closes that gap with zero session-cost — same trust
boundary as $nodeId.output, no new external surface.
Changes:
- packages/workflows/src/executor-shared.ts: substituteWorkflowVariables
accepts a 10th positional loopPrevOutput arg and substitutes
$LOOP_PREV_OUTPUT (defaults to '').
- packages/workflows/src/dag-executor.ts: executeLoopNode passes
lastIterationOutput on iteration 2+ (and explicit '' on iteration 1 /
the first iteration of an interactive resume, since lastIterationOutput
is a per-call variable that does not survive resume metadata).
- Unit tests: 3 new cases in executor-shared.test.ts.
- Integration tests: 2 new cases in dag-executor.test.ts verifying the
prompt sent to the AI on iter 1 vs iter 2, and that the value reflects
cleaned output (no <promise> tags).
- Docs: variables.md, loop-nodes.md (new "Retry-on-failure" pattern),
CLAUDE.md variable reference.
Backward compatibility: prompts that don't reference $LOOP_PREV_OUTPUT are
unaffected. All 843 workflow tests + type-check + lint + format:check +
bun run validate pass locally.
* docs: address coderabbit review on variables/loop-nodes
- variables.md: include $LOOP_PREV_OUTPUT in substitution-order list and
availability table to match the new variable row at line 30
- loop-nodes.md: document the interactive-resume exception where the first
iteration after an approval-gate resume still receives an empty
$LOOP_PREV_OUTPUT regardless of iteration number (per dag-executor.ts
L1781-1783 where i === startIteration always clears prev output)
* docs(changelog): add Unreleased entry for $LOOP_PREV_OUTPUT (#1367 review)
* test(loop): add resume-from-approval integration test for $LOOP_PREV_OUTPUT (#1367 review)
Per maintainer-review-pr suggestion (Wirasm): two-call integration test
covering the resume-from-approval scenario.
- Call 1: fresh interactive loop pauses at the gate after iteration 1 and
asserts $LOOP_PREV_OUTPUT substitutes to empty on iter 1 (no prior
output) plus the gate pause is recorded.
- Call 2: resumed run with metadata.approval populated. The first
resumed iteration must substitute $LOOP_PREV_OUTPUT to '', NOT to the
paused run's iter-1 output (which lived in a different process and is
not persisted). $LOOP_USER_INPUT still flows through as normal.
Locks the documented invariant at dag-executor.ts:1769-1772.
---------
Co-authored-by: voidborne-d <DottyEstradalco@allergist.com>
* feat(maintainer-standup): surface contributor replies since last run (#1457)
The brief was missing a key signal — when contributors reply on PRs or
issues, the maintainer wouldn't see it explicitly. Empirically reviewed
PR replies were buried under aggregate updatedAt timestamps with no
indication of WHO replied or WHAT they said.
This adds a new "Replies waiting on you" section to the daily brief,
sourced from two paginated GitHub API calls scoped by since=last_run_at:
- /repos/{o}/{r}/issues/comments PR + issue conversation comments
- /repos/{o}/{r}/pulls/comments inline code-review comments
Filters applied:
- Skip the maintainer's own comments (gh_handle from profile.md)
- Skip GitHub bot accounts (login ending in [bot]) — coderabbitai,
chatgpt-codex-connector, dependabot, etc. They post a constant
churn of automated review tooling that drowns out human replies;
the maintainer wants the latter.
Output is grouped by PR/issue number with kind classification:
- issue comment on a non-PR issue
- pr_conversation PR conversation-level comment
- pr_review inline code-review comment (most actionable —
usually needs a code-level response, so kind
upgrades to pr_review whenever review comments
arrive on a PR that also has conversation ones)
Sorted by recency (newest reply first). Synthesizer reads
gh-data.output.replies_since_last_run and renders a section.
Verified on a backdated state.json (last_run_at = yesterday morning):
22 human replies on 22 PRs/issues, bot noise filtered (32 → 22 after
the [bot] filter). Surfaces…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements upstream-style minimal MCP endpoints and fixes to unblock the MCP Dashboard and local development.
Backend
python/src/server/api_routes/mcp_api.py:GET /api/mcp/clients— returns{ clients: [], total: 0 }GET /api/mcp/sessions— returns basic session info with optionalserver_uptime_secondsGET /api/mcp/health— lightweight health probe for MCP APIstatusandconfigendpoints; preserve unified logging/tracing via Logfire helpers.Frontend & Dev Experience
archon-ui-main/package.json: ensure Vite dev server binds correctly in Docker (port binding and host).Compose / Infra
docker-compose.yml: server now correctly targetssrc.server.main:app; frontend maps host port to internal 5173 consistently.Projects API cleanup
python/src/server/api_routes/projects_api.py: remove stale Socket.IO broadcast call; replace with debug no-op per upstream refactor.Pytest
python/pytest.ini: setasyncio_mode=autoto support async tests used by integration suites.Rationale
404 Not Foundon/api/mcp/clientsand/api/mcp/sessions. Upstream implements minimal versions of these endpoints to stabilize the UI even when advanced MCP integrations are optional.Verification
GET /api/mcp/clients→200 OKwith{ clients: [], total: 0 }GET /api/mcp/sessions→200 OKwith{ active_sessions: 0, session_timeout: 3600, server_uptime_seconds?: number }ProjectServiceError: Not Found.GET /andGET /healthshow healthy after DB migration (migration/add_source_url_display_name.sql).Files Changed
python/src/server/api_routes/mcp_api.pypython/src/server/api_routes/projects_api.pyarchon-ui-main/package.jsondocker-compose.ymlpython/pytest.iniNotes
/api/mcp/*used by the frontend MCP page.