Feature: Implement Skill Discovery - With Clawhub Integration and Caching#332
Feature: Implement Skill Discovery - With Clawhub Integration and Caching#332lxowalle merged 18 commits intosipeed:mainfrom
Conversation
|
Possible Optimizations
|
There was a problem hiding this comment.
Pull request overview
This PR adds a skill discovery + installation subsystem backed by pluggable “skill registries” (initially Clawhub) and a query SearchCache, exposing it via new agent tools (find_skills, install_skill) and an extended CLI flow (picoclaw skills install --registry ...).
Changes:
- Introduces a registry abstraction (
SkillRegistry) and manager (RegistryManager) with concurrent multi-registry search. - Adds Clawhub registry integration (search/meta/download+extract) and an LRU+TTL search cache using trigram/Jaccard similarity.
- Adds agent tools and CLI plumbing/config to discover/install skills from registries.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tools/skills_search.go | Implements find_skills tool and result formatting (with cache support). |
| pkg/tools/skills_search_test.go | Unit tests for find_skills tool behavior and formatting. |
| pkg/tools/skills_install.go | Implements install_skill tool with slug safety checks, moderation handling, and origin metadata writing. |
| pkg/tools/skills_install_test.go | Unit tests for install_skill tool validation and error cases. |
| pkg/skills/registry.go | Adds registry interfaces/types and RegistryManager with concurrent fan-out search. |
| pkg/skills/registry_test.go | Tests RegistryManager behavior (sorting, limits, failure handling). |
| pkg/skills/clawhub_registry.go | Implements Clawhub registry client and ZIP extraction. |
| pkg/skills/clawhub_registry_test.go | HTTP/ZIP-focused tests for Clawhub integration and extraction safety. |
| pkg/skills/search_cache.go | Adds trigram/Jaccard cache with TTL and eviction logic. |
| pkg/skills/search_cache_test.go | Tests cache hit/miss, TTL, eviction, and concurrency. |
| pkg/agent/loop.go | Registers the new skill tools in the agent tool registry. |
| pkg/config/config.go | Adds tools.skills configuration schema (registries + cache + concurrency). |
| config/config.example.json | Documents the new tools.skills config shape. |
| cmd/picoclaw/main.go | Extends CLI skills install to support --registry <name> <slug>. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Leeaandrob
left a comment
There was a problem hiding this comment.
PR Review: Feature: Implement Skill Discovery — Clawhub Integration & Caching
Verdict: REQUEST_CHANGES ❌
@harshbansal7 — This is a well-architected feature addition with clean abstractions (SkillRegistry interface, RegistryManager, SearchCache with trigram/Jaccard similarity). However, I found 1 critical bug, 3 high-severity issues, and several medium-severity items that must be addressed before merge. Tested locally with go vet, go build, and go test -race.
Findings
[CRITICAL] InstallSkillTool mutex is local to Execute — provides ZERO concurrency protection
File: pkg/tools/skills_install.go:68-73
func (t *InstallSkillTool) Execute(ctx context.Context, args map[string]interface{}) *ToolResult {
slugLock := sync.Mutex{}
slugLock.Lock()
defer slugLock.Unlock()The mutex is created inside Execute — a brand new mutex on every call. This means every concurrent invocation gets its own independent mutex, providing absolutely no mutual exclusion. The TOCTOU race between os.Stat(targetDir) (line 95) and DownloadAndInstall (line 115) remains completely unprotected.
Fix: Move the mutex to the struct:
type InstallSkillTool struct {
registryMgr *skills.RegistryManager
workspace string
mu sync.Mutex // protects concurrent installations
}
func (t *InstallSkillTool) Execute(ctx context.Context, args map[string]interface{}) *ToolResult {
t.mu.Lock()
defer t.mu.Unlock()[HIGH] Two tests are failing — error message mismatch after refactor
Verified locally: go test -race ./pkg/tools/... → FAIL
TestInstallSkillToolMissingSlugexpects"slug is required"but gets"identifier is required"TestInstallSkillToolMissingRegistryexpects"registry is required"but gets"identifier is required"
This happened because slug validation was extracted to ValidateSkillIdentifier() but the tests weren't updated. The tests must match the actual error messages.
[HIGH] doGet allows 50MB+ response body for JSON API responses
File: pkg/skills/clawhub_registry.go:293
body, err := io.ReadAll(io.LimitReader(resp.Body, int64(c.maxZipSize)+1024))maxZipSize defaults to 50MB. This limit is applied to search and metadata API responses, which should return a few KB of JSON at most. On a device with 10-20MB of RAM (PicoClaw's target hardware), a malicious or misconfigured server returning 50MB of JSON would OOM the process.
Fix: Use a separate, reasonable limit for JSON responses:
const maxJSONResponseSize = 1 * 1024 * 1024 // 1 MB
body, err := io.ReadAll(io.LimitReader(resp.Body, maxJSONResponseSize))[HIGH] CLI path skillsInstallFromRegistry does NOT validate slug
File: cmd/picoclaw/main.go:1317
targetDir := filepath.Join(workspace, "skills", slug)The slug comes from os.Args[5] (line 1281) and is used to construct targetDir without any validation. If a user passes ../../etc as slug, filepath.Join resolves it, and on install failure, os.RemoveAll(targetDir) at line 1334 could delete directories outside the skills folder.
The tool-side (skills_install.go) validates via ValidateSkillIdentifier, but the CLI path bypasses this entirely.
Fix: Add validation before using the slug:
func skillsInstallFromRegistry(cfg *config.Config, registryName, slug string) {
if err := utils.ValidateSkillIdentifier(slug); err != nil {
fmt.Printf("\u2717 Invalid slug: %v\n", err)
os.Exit(1)
}
if err := utils.ValidateSkillIdentifier(registryName); err != nil {
fmt.Printf("\u2717 Invalid registry name: %v\n", err)
os.Exit(1)
}
// ... rest of function[MEDIUM] Malware check happens AFTER ZIP is downloaded and extracted
File: pkg/tools/skills_install.go:115-125
The flow is:
DownloadAndInstall→ fetches metadata, downloads ZIP, extracts to disk- Then checks
result.IsMalwareBlocked - If blocked,
os.RemoveAll(targetDir)
By the time the malware check runs, the ZIP is already fully extracted to disk. If the ZIP itself is weaponized (e.g., a zip bomb or files with exploit payloads), the damage is done before cleanup.
Fix: Check IsMalwareBlocked inside DownloadAndInstall between steps 1 (metadata) and 3 (download):
if meta != nil && meta.IsMalwareBlocked {
return &InstallResult{IsMalwareBlocked: true}, nil
}[MEDIUM] No decompression bomb (zip bomb) protection
File: pkg/utils/zip.go
The ZIP extraction has no limit on total extracted size or individual file sizes. A malicious ZIP with a small compressed size could expand to gigabytes, exhausting disk space on PicoClaw's target hardware (which may have very limited storage).
Fix: Add a maxTotalSize parameter and track bytes written:
const maxExtractedSize = 100 * 1024 * 1024 // 100 MB total
var totalWritten int64
// Inside extractSingleFile:
written, err := io.Copy(outFile, io.LimitReader(rc, maxExtractedSize-totalWritten))
totalWritten += written
if totalWritten >= maxExtractedSize {
return fmt.Errorf("extracted size exceeds limit")
}[MEDIUM] ZIP path traversal check vulnerable to directory prefix collision
File: pkg/utils/zip.go:45
if !strings.HasPrefix(filepath.Clean(destPath), filepath.Clean(targetDir)) {If targetDir is /tmp/skill and a zip entry resolves to /tmp/skill_evil/file, the HasPrefix check passes because /tmp/skill_evil/file starts with /tmp/skill. The fix is to add a separator boundary:
cleanTarget := filepath.Clean(targetDir) + string(filepath.Separator)
cleanDest := filepath.Clean(destPath)
if cleanDest != filepath.Clean(targetDir) && !strings.HasPrefix(cleanDest, cleanTarget) {Note: In the current codebase, slug validation prevents crafting such a targetDir, so this is defense-in-depth for the utility function.
[MEDIUM] Struct type conversion ClawHubConfig(cfg.Tools.Skills.Registries.ClawHub) is fragile
Files: pkg/agent/loop.go:111, cmd/picoclaw/main.go:1307
ClawHub: skills.ClawHubConfig(cfg.Tools.Skills.Registries.ClawHub),This relies on config.ClawHubRegistryConfig and skills.ClawHubConfig having identical field types and names in the same order. If any field is added, reordered, or renamed in one struct but not the other, this will either fail to compile (best case) or silently map wrong values (worst case). This is done in two separate places.
Fix: Use explicit field mapping or create a conversion function in one place:
func ConfigToClawHubConfig(c config.ClawHubRegistryConfig) skills.ClawHubConfig {
return skills.ClawHubConfig{
Enabled: c.Enabled,
BaseURL: c.BaseURL,
// ... explicit mapping
}
}[MEDIUM] RegistryManager creation logic duplicated between agent loop and CLI
Files: pkg/agent/loop.go:109-112 and cmd/picoclaw/main.go:1305-1308
Identical NewRegistryManagerFromConfig calls with the same struct conversion. If config shape changes, both places must be updated.
Fix: Extract to a helper in the skills package.
[LOW] writeOriginMeta error is logged but then _ = err is a dead assignment
File: pkg/tools/skills_install.go:139
_ = errAfter logging the error, _ = err is a no-op. Remove this dead line.
[POSITIVE] Architecture & abstractions
The SkillRegistry interface design is clean and extensible. The RegistryManager with concurrent fan-out search using semaphores is well-implemented. The SearchCache with trigram/Jaccard similarity is a creative solution for fuzzy query matching. The DownloadToFile utility with streaming to temp file addresses the memory constraint well.
[POSITIVE] Security measures in place
- Slug validation via
ValidateSkillIdentifier - ZIP symlink rejection
- ZIP path traversal checks (with the prefix boundary issue noted above)
maxZipSizedownload limit- Malware/suspicious flags from registry metadata
Test Results (Local)
| Package | Result | Notes |
|---|---|---|
go vet ./... |
PASS | Clean |
go build ./cmd/picoclaw/... |
PASS | Compiles |
go test -race ./pkg/skills/... |
PASS (25 tests) | All pass including race detector |
go test -race ./pkg/tools/... |
FAIL (2 failures) | TestInstallSkillToolMissingSlug, TestInstallSkillToolMissingRegistry |
Summary
| Severity | Count | Key Issues |
|---|---|---|
| CRITICAL | 1 | Local mutex in Execute provides zero concurrency protection |
| HIGH | 3 | Failing tests, 50MB JSON response limit, CLI path traversal |
| MEDIUM | 5 | Malware check after extraction, no zip bomb protection, path prefix collision, fragile struct conversion, code duplication |
| LOW | 1 | Dead _ = err assignment |
| POSITIVE | 2 | Clean architecture, good security foundation |
The CRITICAL mutex bug and the 3 HIGH issues must be fixed before merge. The MEDIUM items are strongly recommended, especially the malware check ordering and zip bomb protection — these are security-relevant for a tool that downloads and executes code from the internet.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Looking forward to have this pr merged to main. 👍 |
|
Thanks @harshbansal7 , due to some conflicts arising from recent refactoring, please resolve the conflicts again, and then I will merge this PR. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // buildTrigrams generates hash of trigrams from a string. | ||
| // Example: "hello" → {"hel", "ell", "llo"} | ||
| // "hel" -> 0x0068656c -> 4 bytes; compared to 16 bytes of a string | ||
| func buildTrigrams(s string) []uint32 { | ||
| if len(s) < 3 { | ||
| return nil | ||
| } | ||
|
|
||
| trigrams := make([]uint32, 0, len(s)-2) | ||
| for i := 0; i <= len(s)-3; i++ { | ||
| trigrams = append(trigrams, uint32(s[i])<<16|uint32(s[i+1])<<8|uint32(s[i+2])) | ||
| } | ||
|
|
||
| // Sort and Deduplication | ||
| sort.Slice(trigrams, func(i, j int) bool { return trigrams[i] < trigrams[j] }) | ||
| n := 1 | ||
| for i := 1; i < len(trigrams); i++ { | ||
| if trigrams[i] != trigrams[i-1] { | ||
| trigrams[n] = trigrams[i] | ||
| n++ | ||
| } | ||
| } | ||
|
|
||
| return trigrams[:n] | ||
| } | ||
|
|
||
| // jaccardSimilarity computes |A ∩ B| / |A ∪ B|. | ||
| func jaccardSimilarity(a, b []uint32) float64 { | ||
| if len(a) == 0 && len(b) == 0 { | ||
| return 1 | ||
| } | ||
| i, j := 0, 0 | ||
| intersection := 0 | ||
|
|
||
| for i < len(a) && j < len(b) { | ||
| if a[i] == b[j] { | ||
| intersection++ | ||
| i++ | ||
| j++ | ||
| } else if a[i] < b[j] { | ||
| i++ | ||
| } else { | ||
| j++ | ||
| } | ||
| } | ||
|
|
||
| union := len(a) + len(b) - intersection | ||
| return float64(intersection) / float64(union) | ||
| } |
There was a problem hiding this comment.
For queries shorter than 3 chars, buildTrigrams returns nil, and jaccardSimilarity(nil, nil) returns 1. This means any cache entry created from a <3-char query will be treated as a perfect similarity match for any other <3-char query (on non-exact lookups), causing incorrect cache hits. Consider disabling similarity matching for short queries (require exact match), or have buildTrigrams return a non-empty token for short strings (e.g., hash of the whole string).
| outFile, err := os.Create(destPath) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create file %q: %w", destPath, err) | ||
| } | ||
| // We don't return the close error via return, since it's not a named error return. |
There was a problem hiding this comment.
ZIP extraction currently creates files with os.Create(destPath) and never applies the original entry permissions. This can break skills that ship executable scripts/binaries (e.g., scripts/*.sh) because the executable bit is lost. Consider creating the file with os.OpenFile using f.Mode()&0777 (and possibly Chmod after write) while still rejecting symlinks/special modes.
| // Exact match first. | ||
| if entry, ok := sc.entries[normalized]; ok { | ||
| if time.Since(entry.createdAt) < sc.ttl { | ||
| sc.moveToEndLocked(normalized) | ||
| return copyResults(entry.results), true | ||
| } | ||
| } | ||
|
|
||
| // Similarity match. | ||
| queryTrigrams := buildTrigrams(normalized) | ||
| var bestEntry *cacheEntry | ||
| var bestSim float64 | ||
|
|
||
| for _, entry := range sc.entries { | ||
| if time.Since(entry.createdAt) >= sc.ttl { | ||
| continue // Skip expired. | ||
| } |
There was a problem hiding this comment.
Get leaves expired entries in entries/order on read (exact-hit expired path just falls through). If the cache is used for lookups long after the last Put, expired items will accumulate and continue to be scanned in the similarity loop. Consider evicting expired entries during Get as well (e.g., call evictExpiredLocked() after acquiring the lock, or delete the exact-match key when it’s found expired).
| func TestSearchCacheConcurrency(t *testing.T) { | ||
| cache := NewSearchCache(50, 5*time.Minute) | ||
| done := make(chan struct{}) | ||
|
|
||
| // Concurrent writes | ||
| go func() { | ||
| for i := 0; i < 100; i++ { | ||
| cache.Put("query-write-"+string(rune('a'+i%26)), []SearchResult{{Slug: "x"}}) | ||
| } | ||
| done <- struct{}{} | ||
| }() | ||
|
|
||
| // Concurrent reads | ||
| go func() { | ||
| for i := 0; i < 100; i++ { | ||
| cache.Get("query-write-a") | ||
| } | ||
| done <- struct{}{} | ||
| }() | ||
|
|
||
| <-done | ||
| } |
There was a problem hiding this comment.
TestSearchCacheConcurrency starts two goroutines that both send on done, but the test only receives once (<-done). This leaves one goroutine blocked forever on send, and the test doesn’t actually wait for both the read and write loops to complete. Use a sync.WaitGroup or make done buffered with capacity 2 and receive twice.
| // ValidateSkillIdentifier validates that the given skill identifier (slug or registry name) is non-empty | ||
| // and does not contain path separators ("/", "\\") or ".." for security. | ||
| func ValidateSkillIdentifier(identifier string) error { | ||
| trimmed := strings.TrimSpace(identifier) | ||
| if trimmed == "" { | ||
| return fmt.Errorf("identifier is required and must be a non-empty string") | ||
| } | ||
| if strings.ContainsAny(trimmed, "/\\") || strings.Contains(trimmed, "..") { | ||
| return fmt.Errorf("identifier must not contain path separators or '..' to prevent directory traversal") | ||
| } | ||
| return nil |
There was a problem hiding this comment.
ValidateSkillIdentifier only blocks path traversal, but it allows identifiers that the skills system later considers invalid (e.g., spaces/underscores). SkillsLoader validates names with ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$ (pkg/skills/loader.go), so a skill installed with an identifier outside that pattern may never load/list. Consider tightening validation to match the loader’s name pattern (or reusing the loader’s validator) so install/search/CLI stay consistent.
|
@lxowalle please merge this
|
|
Thanks! |
…hing (sipeed#332) * Add Find Skills and Install Skills * Improvements * fix file name * Update pkg/skills/clawhub_registry.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix * Comments addressed * Resolve comments * fix tests * fixes * Comments resolved * Update pkg/skills/search_cache_repro_test.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * minor fix * fix test * fixes --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


Overview
Issue / Task ID - #287
This PR introduces the following functionalities:
User / Agent Facing:
find_skillstoolinstall_skilltoolpicoclaw skills --registry <registry_name> <slug>Internal
Architecture
Registry
SkillRegistryinterface, that denote individual skill providers, like Clawhub.SearchAllfunction for searching across all providers at once for the given query. This feature uses goroutines (default max 2) to concurrently send requests to multiple providers at once.Skill Registry Construct
The
SkillRegistryis the interface that all skill registries must implement. Each registry represents a different source of skills (e.g., Clawhub).It defines:
Name()— returns the unique name of the registry.Search(ctx, query, limit)— searches the registry for skills matching the query.GetSkillMeta(ctx, slug)— retrieves metadata for a specific skill by slug.DownloadAndInstall(ctx, slug, version, targetDir)— resolves the version, downloads the skill artifact, installs it to the target directory, and returns installation metadata for moderation and user messaging.This construct ensures a clean abstraction between the core system and external providers, making it easy to plug in additional registries without changing the orchestration logic.
Clawhub Implementation
search,skillanddownloadAPI endpoints, to get Name and Description; get skill metadata and the ZIP file for installation respectively.SearchCache Service
Tools
Find Skills Tool
Install Skill Tool
.skill-origin.jsonfile in skill directory to track version, origin, etc.Testing
Agent Test Run

CLI Test Run