feat(resources): add resource watch scheduling and status tracking#709
feat(resources): add resource watch scheduling and status tracking#709
Conversation
implement resource watch functionality that allows automatic monitoring and re-processing of resources at specified intervals. key features include: - add watch_interval parameter to resource APIs - create watch scheduler service for task execution - handle conflict detection for active watch tasks - provide watch status query capability - include comprehensive tests and examples the watch feature enables periodic automatic updates of resources without manual intervention, improving data freshness for frequently changing content
…ssing - Implement get_watch_status API for tracking resource watch status - Add immediate persistence for first-time resource additions - Improve file change detection with size comparison - Refactor watch scheduler with better concurrency control - Add test coverage for watch status and resource processing - Remove unused watch manager references and clean up code
- Simplify logging by removing redundant data copying - Fix syntax errors in docstrings and string literals - Add new fields to EmbeddingMsg class - Improve line wrapping and formatting - Update watch task storage URIs to use hidden files
…g unused fields Remove media_uri, media_mime_type and id parameters as they are not used in the implementation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Add test case for recovering tasks from backup storage when primary is missing Remove require_owner parameter from _check_permission as it's redundant with the existing role-based checks
Add watch_interval parameter to enable periodic resource updates. When target is specified, watch_interval > 0 creates/updates a watch task, while <= 0 disables it. Also simplify resource moving logic in ResourceProcessor by using direct mv operation.
qin-ctx
left a comment
There was a problem hiding this comment.
Review Summary
The resource watch feature is well-structured with comprehensive test coverage (~2300 lines of tests including unit, integration, E2E, and recovery tests). The WatchManager's persistence design with atomic write rotation (tmp→bak→main) is solid.
However, there are blocking issues around silent failure on cross-user URI conflicts, use of private VikingFS APIs in Phase 3.5, and privilege escalation in the scheduler.
Additionally:
- PR description has all "Type of Change" checkboxes unchecked (this is clearly a New feature)
- No REST API endpoint for
get_watch_status— only available through Python SDK, not via HTTP API or CLI
remove get_watch_status method and related tests, update examples to use direct task access update watch manager to use ConflictError for URI conflicts and include original_role in tasks add validation for watch_interval requiring target URI
qin-ctx
left a comment
There was a problem hiding this comment.
Review Summary
All 5 blocking issues from the previous review have been addressed:
- URI conflict now properly raises
ConflictError(not silently caught) - Phase 3.5 uses public VikingFS APIs (
mv/exists/mkdir) - Scheduler preserves original user roles via
task.original_role watch_interval > 0withouttonow raisesInvalidArgumentError- Example uses public API only
One previously flagged non-blocking issue remains unaddressed: naive datetime.now() without timezone.
There is one new blocking issue: watch_interval validation runs after resource processing, leading to resource ingestion before the error is raised.
Additionally, the embedding_tracker.py changes (removal of get_status, remove, get_all_tracked methods) are unrelated to the watch feature. Consider moving these to a separate PR to keep change scope focused and simplify rollback if needed.
Minor: The PR's "Type of Change" checkboxes are still all unchecked — this should be marked as "New feature".
…urce Move watch interval validation earlier in the flow to fail fast when 'to' parameter is missing
Description
This PR adds a “resource watch” capability that automatically re-processes resources at a configurable interval. It introduces persisted watch tasks, a background scheduler to run due tasks safely, and end-to-end support (API/SDK/CLI) for enabling, updating, canceling.
Related Issue
Type of Change
Changes Made
last_execution_timeandnext_execution_timewatch_interval(minutes) supported end-to-end (server API + Python client + CLI)Testing
Checklist
Screenshots (if applicable)
Additional Notes