feat: add DuckDB service (tinycloud.duckdb/*)#27
Merged
Conversation
…ation Add embedded analytical database service with columnar storage, per-space isolation, and UCAN capability model. Mirrors the SQL service architecture with DuckDB-specific features: Core module (tinycloud-core/src/duckdb/): - Actor-based connection pool with idle timeout and memory threshold promotion - SQL parser validation (GenericDialect) as primary security layer - DuckDB settings lockdown (external access disabled, unsigned extensions blocked) - Rich value types including List and Struct with recursive serde - Describe, Ingest, ExportToKv, Export, Import request variants - UCAN caveats for table/column/statement allowlists and read-only mode Server integration: - DuckDbStorageConfig with configurable path, memory threshold, idle timeout - Route handling with capability extraction and error status mapping - Binary response support for database export and Arrow IPC streams - "duckdb" added to /version features
7 tasks
Security: - Replace statement blocklist with 3-tier allowlist (default/admin/delegation bypass) - Block security-critical SET vars (enable_external_access, etc.) unconditionally - Expand function blocklist (parquet_scan, csv_scan, glob, iceberg_scan, etc.) - Validate max_memory against SQL injection - Validate db_name against path traversal (.., /, \, null) - Validate imported databases (temp file + DuckDB open + test query) - Block export when caveats active - Apply caveats to describe (filter tables/columns) - Handle SELECT * with column caveats Types: - Fix ColumnInfo wire format (type/nullable instead of dataType/isNullable) - Remove unnecessary Deserialize from DuckDbResponse - Fix UBigInt truncation (values > i64::MAX as string) - Fix Map key formatting (Display instead of Debug) Robustness: - Clean up stale actor entries from DashMap on exit - Fix promote_to_file (temporarily enable external access for EXPORT DATABASE) - Use async I/O (tokio::fs) in async functions - Replace expect() with error propagation in actor open - Replace filter_map(|r| r.ok()) with proper error propagation - Add statement_timeout = 30s Arrow IPC: - Add execute_query_arrow() using stmt.query_arrow() + StreamWriter - Route Arrow format via Accept header through to actor - Add Arrow variant to DuckDbResponse Quality: - Extract verify_auth() and read_json_body() helpers from route handlers - Add 32 unit tests across parser, caveats, storage, and types
Local resources (SQLite parent dir, block storage dir, SQL/DuckDB dirs) are now created automatically on first run. Remote backends (Postgres, S3) are left untouched — their connection errors surface naturally. Replaces the raw .unwrap() panic in main with a readable error chain so misconfigured remote backends get clear diagnostics.
- Remove SET statement_timeout (unsupported in duckdb crate v1.4.4) - Move column_names() call after query() execution to avoid panic in RawStatement::schema when schema isn't populated yet - Remove DenchClaw references from spec
Export previously read directly from disk, returning 404 for in-memory databases. Now routes through the database actor which can serialize both in-memory and file-backed databases. - Add Export message variant to DuckDB and SQL actors - Use Arrow record batches (appender-arrow) for fast bulk copy - Fix promote_to_file to use copy_tables instead of broken enable_external_access toggle - SQL export uses SQLite backup API for in-memory serialization
SQLite's DEFERRED transactions deadlock when concurrent verify_auth() calls both try to upgrade from shared read to exclusive write locks. The SQLITE_BUSY error was incorrectly mapped to SpaceNotFound (404). - Set max_connections(1) for SQLite to serialize writes - Enable WAL mode for concurrent reads - Set busy_timeout(5s) as safety net - Add tracing::warn with actual error details on epoch insert failure - Keep max_connections(100) for PostgreSQL/MySQL
Replace .filter_map(|r| r.ok()) with .collect::<Result<Vec<_>, _>>() to surface row deserialization errors during table copy. Log view copy failures instead of silently swallowing them with let _ =.
- Box DuckDbRequest in DbMessage::Execute to reduce enum size disparity (209 bytes vs 8 bytes) - Apply rustfmt formatting to storage.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tinycloud-core/src/duckdb/) with actor-based connection pool, SQL parser validation as primary security layer, DuckDB settings lockdown, rich value types (List/Struct), and Describe/Ingest/Export/Import operations"duckdb"in/versionfeaturesNew files (core module)
types.rsDuckDbRequest(9 variants),DuckDbValue(recursive),DuckDbResponse,DuckDbError(13 variants)caveats.rsparser.rsGenericDialect— blocks COPY, INSTALL/LOAD, SET; DDL requires writestorage.rsdatabase.rsmpscchannels, parser-only security (no authorizer hook)describe.rsinformation_schemaservice.rsDashMapconnection pool with idle timeout and memory threshold promotionModified files (server integration)
src/config.rs—DuckDbStorageConfigwith path, memory threshold, idle timeout, max memorytinycloud-core/src/db.rs—DuckDbResult,DuckDbExport,DuckDbArrowoutcome variantssrc/auth_guards.rs— Responder arms for JSON, binary, Arrow responsessrc/routes/mod.rs—handle_duckdb_invoke(),duckdb_error_to_status(), capability extractionsrc/lib.rs— Construct and manageDuckDbServiceCompanion PR: TinyCloudLabs/js-sdk (TypeScript SDK DuckDB service)
Test plan
cargo buildcompiles successfullycargo clippy -- -D warningspassesGET /versionincludes"duckdb"in features.duckdbfile bytes