Skip to content

feat: add DuckDB service (tinycloud.duckdb/*)#27

Merged
skgbafa merged 10 commits intomainfrom
feat/duckdb-service
Mar 9, 2026
Merged

feat: add DuckDB service (tinycloud.duckdb/*)#27
skgbafa merged 10 commits intomainfrom
feat/duckdb-service

Conversation

@skgbafa
Copy link
Copy Markdown
Contributor

@skgbafa skgbafa commented Mar 7, 2026

Summary

  • Add embedded DuckDB analytical database service with columnar storage, per-space isolation, and UCAN capability model
  • Core module (tinycloud-core/src/duckdb/) with actor-based connection pool, SQL parser validation as primary security layer, DuckDB settings lockdown, rich value types (List/Struct), and Describe/Ingest/Export/Import operations
  • Server integration with config, route handling, capability extraction, binary response support (database export + Arrow IPC streams), and "duckdb" in /version features

New files (core module)

File Purpose
types.rs DuckDbRequest (9 variants), DuckDbValue (recursive), DuckDbResponse, DuckDbError (13 variants)
caveats.rs UCAN caveats for table/column/statement allowlists and read-only mode
parser.rs SQL validation via GenericDialect — blocks COPY, INSTALL/LOAD, SET; DDL requires write
storage.rs In-memory/file connections with security settings (external access disabled)
database.rs Actor pattern with mpsc channels, parser-only security (no authorizer hook)
describe.rs Schema introspection via information_schema
service.rs DashMap connection pool with idle timeout and memory threshold promotion

Modified files (server integration)

  • src/config.rsDuckDbStorageConfig with path, memory threshold, idle timeout, max memory
  • tinycloud-core/src/db.rsDuckDbResult, DuckDbExport, DuckDbArrow outcome variants
  • src/auth_guards.rs — Responder arms for JSON, binary, Arrow responses
  • src/routes/mod.rshandle_duckdb_invoke(), duckdb_error_to_status(), capability extraction
  • src/lib.rs — Construct and manage DuckDbService

Companion PR: TinyCloudLabs/js-sdk (TypeScript SDK DuckDB service)

Test plan

  • cargo build compiles successfully
  • cargo clippy -- -D warnings passes
  • GET /version includes "duckdb" in features
  • Create table via DuckDB execute, insert data, query it back
  • Verify parser blocks COPY TO/FROM, INSTALL, LOAD, SET
  • Verify caveats enforcement (table/column allowlists, read-only)
  • Test Describe action returns schema info
  • Test Export returns raw .duckdb file bytes
  • Test Import replaces database file and reopens
  • Test error status codes match spec Section 10

…ation

Add embedded analytical database service with columnar storage, per-space
isolation, and UCAN capability model. Mirrors the SQL service architecture
with DuckDB-specific features:

Core module (tinycloud-core/src/duckdb/):
- Actor-based connection pool with idle timeout and memory threshold promotion
- SQL parser validation (GenericDialect) as primary security layer
- DuckDB settings lockdown (external access disabled, unsigned extensions blocked)
- Rich value types including List and Struct with recursive serde
- Describe, Ingest, ExportToKv, Export, Import request variants
- UCAN caveats for table/column/statement allowlists and read-only mode

Server integration:
- DuckDbStorageConfig with configurable path, memory threshold, idle timeout
- Route handling with capability extraction and error status mapping
- Binary response support for database export and Arrow IPC streams
- "duckdb" added to /version features
skgbafa added 9 commits March 7, 2026 23:16
Security:
- Replace statement blocklist with 3-tier allowlist (default/admin/delegation bypass)
- Block security-critical SET vars (enable_external_access, etc.) unconditionally
- Expand function blocklist (parquet_scan, csv_scan, glob, iceberg_scan, etc.)
- Validate max_memory against SQL injection
- Validate db_name against path traversal (.., /, \, null)
- Validate imported databases (temp file + DuckDB open + test query)
- Block export when caveats active
- Apply caveats to describe (filter tables/columns)
- Handle SELECT * with column caveats

Types:
- Fix ColumnInfo wire format (type/nullable instead of dataType/isNullable)
- Remove unnecessary Deserialize from DuckDbResponse
- Fix UBigInt truncation (values > i64::MAX as string)
- Fix Map key formatting (Display instead of Debug)

Robustness:
- Clean up stale actor entries from DashMap on exit
- Fix promote_to_file (temporarily enable external access for EXPORT DATABASE)
- Use async I/O (tokio::fs) in async functions
- Replace expect() with error propagation in actor open
- Replace filter_map(|r| r.ok()) with proper error propagation
- Add statement_timeout = 30s

Arrow IPC:
- Add execute_query_arrow() using stmt.query_arrow() + StreamWriter
- Route Arrow format via Accept header through to actor
- Add Arrow variant to DuckDbResponse

Quality:
- Extract verify_auth() and read_json_body() helpers from route handlers
- Add 32 unit tests across parser, caveats, storage, and types
Local resources (SQLite parent dir, block storage dir, SQL/DuckDB dirs)
are now created automatically on first run. Remote backends (Postgres,
S3) are left untouched — their connection errors surface naturally.

Replaces the raw .unwrap() panic in main with a readable error chain
so misconfigured remote backends get clear diagnostics.
- Remove SET statement_timeout (unsupported in duckdb crate v1.4.4)
- Move column_names() call after query() execution to avoid panic
  in RawStatement::schema when schema isn't populated yet
- Remove DenchClaw references from spec
Export previously read directly from disk, returning 404 for in-memory
databases. Now routes through the database actor which can serialize
both in-memory and file-backed databases.

- Add Export message variant to DuckDB and SQL actors
- Use Arrow record batches (appender-arrow) for fast bulk copy
- Fix promote_to_file to use copy_tables instead of broken
  enable_external_access toggle
- SQL export uses SQLite backup API for in-memory serialization
SQLite's DEFERRED transactions deadlock when concurrent verify_auth()
calls both try to upgrade from shared read to exclusive write locks.
The SQLITE_BUSY error was incorrectly mapped to SpaceNotFound (404).

- Set max_connections(1) for SQLite to serialize writes
- Enable WAL mode for concurrent reads
- Set busy_timeout(5s) as safety net
- Add tracing::warn with actual error details on epoch insert failure
- Keep max_connections(100) for PostgreSQL/MySQL
Replace .filter_map(|r| r.ok()) with .collect::<Result<Vec<_>, _>>()
to surface row deserialization errors during table copy. Log view copy
failures instead of silently swallowing them with let _ =.
- Box DuckDbRequest in DbMessage::Execute to reduce enum size disparity
  (209 bytes vs 8 bytes)
- Apply rustfmt formatting to storage.rs
@skgbafa skgbafa merged commit 62f5e0c into main Mar 9, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant