Summary
Persisted sessions can keep using a stale skillsSnapshot and miss newly added skills after a gateway restart.
I hit this on April 21, 2026 in a Feishu DM session. A new local skill existed on disk and was eligible, but the session kept behaving as if the skill did not exist until the session was reset/refreshed.
Repro
- Start a long-lived session so it persists a
skillsSnapshot with version = 0.
- Add a new skill to the workspace after that session already exists.
- Restart the gateway before a watcher event bumps the workspace skill snapshot version.
- Send the next message in the existing session.
Actual
The existing session can keep reusing the old snapshot and fail to surface the newly added skill.
In my case the session could not see a newly added mock-full-reduction-config skill from the available_skills list, even though the skill was present on disk and openclaw skills check reported it as ready.
Expected
The first post-restart turn in an existing session should rebuild the skills snapshot if the process has not yet established a non-zero workspace version.
Root Cause
There are two pieces that combine badly:
shouldRefreshSnapshotForVersion() returns false when both the cached version and the current version are 0.
ensureSkillSnapshot() reads snapshotVersion before calling ensureSkillsWatcher().
That means an old persisted session snapshot with version = 0 can compare equal to the fresh process state, and the next turn does not rebuild the snapshot.
Proposed Fix
- Seed a non-zero workspace snapshot version when the first watcher for a workspace is attached.
- In
ensureSkillSnapshot(), call ensureSkillsWatcher() before reading getSkillsSnapshotVersion(workspaceDir).
That makes the first post-restart turn observe the bumped version and refresh stale session snapshots.
Notes
I also added focused regression tests locally for both behaviors:
- watcher attachment seeds a non-zero version
ensureSkillSnapshot() reads the version after ensuring the watcher
Summary
Persisted sessions can keep using a stale
skillsSnapshotand miss newly added skills after a gateway restart.I hit this on April 21, 2026 in a Feishu DM session. A new local skill existed on disk and was eligible, but the session kept behaving as if the skill did not exist until the session was reset/refreshed.
Repro
skillsSnapshotwithversion = 0.Actual
The existing session can keep reusing the old snapshot and fail to surface the newly added skill.
In my case the session could not see a newly added
mock-full-reduction-configskill from theavailable_skillslist, even though the skill was present on disk andopenclaw skills checkreported it as ready.Expected
The first post-restart turn in an existing session should rebuild the skills snapshot if the process has not yet established a non-zero workspace version.
Root Cause
There are two pieces that combine badly:
shouldRefreshSnapshotForVersion()returns false when both the cached version and the current version are0.ensureSkillSnapshot()readssnapshotVersionbefore callingensureSkillsWatcher().That means an old persisted session snapshot with
version = 0can compare equal to the fresh process state, and the next turn does not rebuild the snapshot.Proposed Fix
ensureSkillSnapshot(), callensureSkillsWatcher()before readinggetSkillsSnapshotVersion(workspaceDir).That makes the first post-restart turn observe the bumped version and refresh stale session snapshots.
Notes
I also added focused regression tests locally for both behaviors:
ensureSkillSnapshot()reads the version after ensuring the watcher