Skip to content

Explore per-agent worker isolation, VFS scratch storage, and host filesystem capabilities #78096

@steipete

Description

@steipete

Summary

Long-term architecture note: explore whether OpenClaw should combine per-agent worker/process isolation, a virtual filesystem layer, and the existing real-host filesystem safety work into a more explicit filesystem capability model.

This is not a request to replace the current fs-safe/pinned-path helpers immediately. The current implementation is solving a real gap in Node's filesystem APIs. The goal here is to track adjacent OSS work and define experiments that could reduce risk, reduce Python helper dependency over time, and give each agent a clearer filesystem boundary.

Current OpenClaw Problem Space

OpenClaw has a long-lived gateway with many agents. Different agents may need different workspace roots in the same overall gateway runtime.

We currently have user-visible operations that eventually need safe real-host filesystem mutation:

  • agent patch/apply operations
  • workspace file set/create/update operations
  • memory/session export writes
  • shell/pi host write/edit operations
  • archive/media staging copies
  • sandbox file mutations such as write, mkdir, remove, rename

These are host filesystem operations, not only scratch-state operations. They need to resist path traversal, symlink/hardlink tricks, and TOCTOU races.

Node's standard fs APIs do not provide the full openat/dirfd-style capability interface we would want for this. That is why the current code has pinned path/write helpers and, on POSIX, Python helper paths for specific operations where Python exposes dir_fd-style primitives that Node does not.

Why Node Permissions Alone Are Not Enough

Node has a permission model:

  • --permission
  • --allow-fs-read
  • --allow-fs-write
  • --allow-worker

This is useful as an outer process or worker sandbox, but it is not the same abstraction as a per-root filesystem capability object.

Limitations for OpenClaw's shape:

  • It is process/worker launch policy, not an object API like root.write("path").
  • It is coarse-grained for a long-lived gateway containing many agents with different roots.
  • Workers need correctly configured execArgv; trusted parent code must enforce that every time.
  • --allow-worker is explicitly risky because worker creation can weaken the model if exposed incorrectly.
  • Node documents permission-model constraints around symlinks, existing file descriptors, native modules, subprocesses, worker threads, inspector, and WASI.
  • It does not provide pinned dirfd/openat-style operations for safe real-host mutations.

So Node permissions may be useful as an additional outer guard, but should not be treated as a replacement for a capability-safe host filesystem API.

Reference: https://nodejs.org/api/permissions.html

Relevant OSS References

Platformatic Regina

Repo: https://github.com/platformatic/regina
License: Apache-2.0

Regina is a multi-agent orchestrator built on Platformatic Watt. It discovers markdown agent definitions and spawns each agent as an isolated application thread.

Relevant ideas:

  • per-agent application/thread lifecycle
  • idle suspension/resume
  • per-agent state storage
  • cross-pod migration/state backup
  • built-in tools backed by a per-instance virtual filesystem

This is highly relevant architecturally, but it does not directly solve safe host-folder mutation. Regina's default file tools operate inside a virtual filesystem.

Platformatic Runtime / Watt

Repo: https://github.com/platformatic/platformatic
License: Apache-2.0

Platformatic/Watt really does run applications in Node worker threads. The runtime creates workers with per-application execArgv, and its runtime config supports filesystem permissions that are converted into Node permission flags.

Relevant ideas:

  • worker-thread application isolation
  • per-application execArgv
  • worker lifecycle, health, restart, and management APIs
  • thread interceptor / mesh routing
  • optional process mode for heavier isolation
  • application-level permissions based on Node's permission model

This could inspire an OpenClaw experiment where agents run in separate workers/processes with an outer permission boundary.

@platformatic/vfs

Package: https://www.npmjs.com/package/@platformatic/vfs
Repo: https://github.com/platformatic/vfs
License: MIT

@platformatic/vfs is described as a virtual filesystem for Node.js, a userland shim for node:vfs. Regina uses it with a SQLite provider for per-agent VFS state.

Relevant ideas:

  • per-agent virtual scratch/state filesystem
  • SQLite-backed persistence
  • a filesystem-like API that can back tools without exposing host paths
  • possible future alignment if Node gains a native VFS layer

This is complementary to fs-safe, not a direct replacement. It can make many agent tool operations avoid the host filesystem entirely, but real workspace edits still need safe host mutation.

fs-safe

Site: https://fs-safe.io/
Repo/package local context: ../fs-safe

fs-safe is the host-filesystem capability layer extracted from OpenClaw security work. Its role is closer to Go's os.Root / Rust cap-std: pass around a root capability and perform safe relative operations beneath it.

This remains the more direct fit for OpenClaw workspace edits and other real-host file operations.

Go os.Root

Reference: https://go.dev/blog/osroot

Go added os.Root to address traversal-resistant filesystem access. This is the general shape OpenClaw wants for real-host filesystem access: rooted operations, not ad hoc string prefix checks.

Rust cap-std

Repo: https://github.com/bytecodealliance/cap-std

Rust's cap-std is another object-capability filesystem design. Useful reference for API shape and threat model.

Linux openat2 option

Package: https://www.npmjs.com/package/@cocalc/openat2

@cocalc/openat2 exposes Linux openat2/dirfd-based primitives via a native addon. It may be useful as an optional Linux fast path for fs-safe, but it is not portable enough to be the whole answer.

Possible Direction

Layered model:

outer runtime boundary      worker/process permissions, env sanitization, subprocess policy
virtual agent filesystem    per-agent scratch/state VFS, possibly @platformatic/vfs-style
host capability filesystem  fs-safe root objects for real workspace/media/archive mutations

The key distinction:

  • virtual FS is great for agent-local state and tool scratch files
  • real-host FS capability is still needed for workspace edits, media staging, archive extraction, and integrations that must touch real files

Experiments To Try

  • Prototype one agent running in a worker with per-worker execArgv permission flags.
  • Prototype the same with process mode and compare isolation, startup cost, memory, and operational complexity.
  • Try a per-agent VFS scratch area for default tool state that does not need to touch the host workspace.
  • Evaluate whether @platformatic/vfs can back any existing OpenClaw scratch/session/tool paths without changing user-visible semantics.
  • Keep real workspace edits on fs-safe and measure how much Python helper usage remains after moving scratch-only paths to VFS.
  • Explore an optional Linux openat2 fast path inside fs-safe while keeping portable POSIX/macOS behavior.
  • Document which operations require real host paths versus virtual/scratch paths.
  • Define an API boundary so extensions/plugins receive capabilities rather than raw host paths whenever feasible.

Questions

  • Which OpenClaw file operations truly need real host paths?
  • Which operations are only agent scratch/state and could move into VFS?
  • Should an agent's default shell/write/edit tools operate in VFS by default, with explicit capability grants for host workspace edits?
  • Can worker/process permissions act as defense-in-depth without making gateway lifecycle or debugging too complex?
  • Can fs-safe become the shared host capability layer used by both OpenClaw and external consumers?
  • How do we preserve current OpenClaw UX where agents edit real repos, while giving each agent a clearer boundary?

Non-Goals For Now

  • Do not replace fs-safe with Node permissions.
  • Do not migrate all agent execution to Platformatic/Regina.
  • Do not make virtual filesystems the only storage model for workspace edits.
  • Do not remove Python helpers until there is equivalent portable safety proof.
  • Do not add a large dependency to core without measuring install/runtime cost and ownership impact.

Success Criteria

A future design would be successful if it:

  • preserves real workspace editing behavior
  • reduces direct raw-path handling in agent/plugin code
  • narrows or removes Python helper usage where safely possible
  • gives each agent an explicit filesystem boundary
  • uses Node permissions or process isolation only as defense-in-depth
  • keeps extension/plugin dependency ownership clean
  • provides measurable performance and security improvements
  • has regression tests for symlink, hardlink, traversal, rename, copy, remove, and TOCTOU-style races

License Notes

  • Platformatic Regina: Apache-2.0
  • Platformatic Runtime/Watt: Apache-2.0
  • @platformatic/vfs: MIT
  • @cocalc/openat2: MIT
  • Rust cap-std: Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT depending crate; verify exact component before reuse

Initial Assessment

This seems worth tracking as a long-term vision item. The strongest path is probably not "replace fs-safe", but:

  1. keep fs-safe as the real-host filesystem capability layer;
  2. add a VFS/scratch layer where real host writes are unnecessary;
  3. optionally wrap agents in worker/process permission boundaries for defense-in-depth;
  4. reduce Python helpers only when an equally safe native/Node/optional-addon path exists.

Metadata

Metadata

Assignees

Labels

agentsAgent runtime and toolingenhancementNew feature or requestgatewayGateway runtimemaintainerMaintainer-authored PRsecuritySecurity documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions