- Update
crawlkitto v0.10.0. - Add Cloudflare remote archive scaffolding with
[remote]config,gitcrawl init --remote, GitHub-backedremote loginwith OAuth or token-env bootstrap, remote identity/archive/status commands, cloud-mode search against Worker named queries, andgitcrawl cloud publishingestion without creating a local SQLite database for readers. - Mirror the local SQLite archive into the Worker-backed R2 object store during
gitcrawl cloud publish, alongside the D1 row ingest used for live queries. - Compress the mirrored SQLite archive as a gzip chunk bundle with a manifest so R2 bootstrap/fallback data is smaller and can grow past single-object upload limits.
- Move the
gitcrawl ghcompatibility cache to Octopool with a hard migration error that points users atoctopool loginandoctopool gh .... - Add an optional TUI focus layout, configurable with
gitcrawl tui --layout focus,tui.default_layout, orGITCRAWL_TUI_LAYOUT, thanks @RomneyDa. - Add
gitcrawl clusters-reportfor Markdown or JSON cluster triage reports, thanks @RomneyDa. - Add extra regression coverage for TUI detail-pane keyboard and wheel scrolling, thanks @RomneyDa.
- Bound portable-store Git subprocess cancellation so stale fetch helpers cannot hang read-only commands, and expose the last portable refresh failure in
gitcrawl doctor --json. - Backfill a missing or stale source embedding for
gitcrawl neighborswith a one-row capped embedding request, and show the exact cheapgitcrawl embed --number ... --limit 1recovery command when no OpenAI key is configured. - Add PR-aware thread indexing for changed paths and commit subjects, plus
gitcrawl code indexandsearch --scope code|allfor bounded local full-text search over tracked monorepo source files.
- Improve gh-shim cache hit rates for API projections, explicit
--cachereads, broad run-list fallbacks, stable dated searches,pr checks --watch=false, common local PR view fields, enrichedpr list --jsonfields, and cached pull-request API reads.
- Keep the highlighted member selected in the TUI when the cluster list auto-refreshes, is manually refreshed, or is re-sorted/filtered, instead of silently snapping the selection back to the first row.
- Make the TUI detail pane scrollable again with the keyboard, mouse wheel, and trackpad; its viewport content was only set on the render copy, so the live pane never had anything to scroll.
- Update
crawlkitto v0.8.0. - Retry transient SQLite busy/locked cache writes during sync and keep gh-shim auto-hydration quiet so live fallback reads do not surface scary local cache lock noise.
- Add cached release checks with
gitcrawl check-updateand passive terminal notices when a newer OpenClaw release is available.
- Harden portable-store publishing and reads with manifest integrity checks, temp-DB validation before runtime replacement, stale Git lock cleanup, reclone fallback, and richer
doctorDB health output. - Repair malformed portable-store SQLite caches by preserving the bad DB, resetting/pulling the store checkout, and recopying a healthy runtime mirror before local reads continue.
- Replay GitHub request bodies on rate-limit retries so GraphQL POST retries keep their JSON payload.
- Send GitHub Enterprise GraphQL requests to
/api/graphqlwhen the REST API base URL ends in/api/v3. - Reject durable cluster saves with empty member lists instead of leaving stale active memberships.
- Preserve multiple thread embeddings for the same thread when basis or model differs.
- Preserve OpenAI retry backoff defaults when callers provide partial retry overrides.
- Preserve existing comment text in search documents during metadata-only syncs.
- Keep comment and PR-detail sync writes atomic when GitHub hydration fails, avoiding partial cache rows without a run record.
- Fail PR-detail syncs when GitHub review-thread hydration fails instead of recording a successful partial refresh.
- Consume the shared
openclaw/crawlkitvector and portable-mirror helpers from gitcrawl. - Route OpenAI embedding requests through the shared
openclaw/crawlkit/embedprovider while preserving gitcrawl retry and error handling. - Fetch all paginated GitHub review-thread comments instead of keeping only the first review-thread comment page.
- Keep
gh xcache gcfrom expiring stable PR diff cache entries with the short fallback TTL while the PR head SHA is unchanged. - Fall back or fail for unsupported local
gh pr checksandgh runJSON fields instead of silently omitting them. - Refuse to use the running
gitcrawlexecutable as the realghbackend, including hard-linked shim paths, to avoid recursive gh-shim fallthrough. - Report duplicate OpenAI embedding response indexes explicitly instead of letting a later row overwrite an earlier vector.
- Keep cosine similarity stable for very large finite vectors instead of dropping them after float overflow.
- Allow cluster detail reads to target raw-run or durable-cluster IDs explicitly, avoiding collisions between the two ID namespaces.
- Keep active durable cluster representatives on visible open members instead of closed or hidden historical members.
- Avoid holding SQLite write transactions open while hydrating PR details from GitHub.
- Skip PR check-run and workflow-run hydration when GitHub returns no PR head SHA, avoiding broad workflow-run fetches.
- Ignore cluster graph edges whose endpoints are absent from the visible node set, preventing hidden nodes from merging otherwise separate clusters.
- Make direct
gitcrawl search --mode semanticuse query embeddings and--mode hybridcombine semantic and keyword hits instead of relabeling keyword-only search. - Remove the search-only
--sync-if-staleflag fromgitcrawl refreshhelp text. - Ignore cross-repository
owner/repo#numberreferences when building deterministic cluster edges for the current repository. - Reject non-finite CLI float options such as
NaNbefore commands can mutate local cluster state. - Fetch all paginated GitHub check runs and workflow runs instead of only the first 100 rows.
- Fix GitHub Enterprise pagination when API
Linkheaders include the/api/v3base path, avoiding duplicated paths on follow-up pages. - Retire durable clusters that disappear from a successful clustering run, while still preserving local close overrides across reclustering.
- Derive the default vector directory from custom database paths, including
GITCRAWL_DB_PATH, so separate stores do not share embeddings unlessvector_diris set explicitly. - Refuse to refresh a portable store checkout when its Git remote does not match the requested portable store, avoiding accidental resets of unrelated working trees.
- Ignore non-finite vector similarity scores so malformed embeddings cannot surface as neighbors.
- Docker: add a local image with
/datapersistence and CI smoke coverage. - Make the
ghshim forceGETfor GitHub Search API field calls sogh api search/* -f q=...agent invocations do not fall through asPOST.
- Add cache-backed
gh pr statusreadiness summaries with compact JSON, agent-oriented exit codes, and exact PR hydration that stores GitHub review threads instead of relying only on flattened review comments. - Make gh-shim Actions/release reads liveness-aware: broad
gh run listnow falls through to live GitHub unless it is pinned to a commit or cached PR branch, cached CI/release reads print a stderr provenance note, and--livebypasses shim/cache state. - Record short-lived liveness tombstones after mutating
gh run,gh workflow,gh release, and matchinggh apicalls so immediate status/release checks bypass stale fallthrough cache entries. - Expose shim/backend paths, live mode, liveness tombstones, and live bypass counters in
gh xcache stats.
- Move top-level CLI parsing and
gh xcacheargument parsing onto Kong while keeping the broaderghshim pass-through compatible with GitHub CLI argument shapes. - Keep
gh xcache --helpdiscoverable and makestats --since, JSON output, and snapshot reset parsing share one typed parser path. - Teach the
ghshim about the shared GitHub token rate-limit budget, serve stale successful reads more aggressively when that pooled budget is low, preserve GitHub CLI--jqhandling for cached fallthrough reads, and expose low-budget stale hits inxcache stats. - Avoid extra
gh auth tokensubprocesses during low-budget cache preflight checks.
- Fix gh-shim portable-store auto-hydration so exact issue/PR refreshes write to the runtime mirror instead of dirtying the Git checkout, clear stale portable refresh locks, and make empty open issue discovery fall through when only targeted sync history exists.
- Keep
cluster-detailaligned with the default cluster list by showing closed historical members unless--hide-closedis passed, and fail fast whenGITCRAWL_GH_PATHpoints back at thegitcrawlshim.
- Bump routine release workflow dependencies.
- Add a repo-local
gitcrawlagent skill for local archive, freshness, gh-shim, cluster, and verification workflows. - Accept full GitHub issue and pull request URLs anywhere
gitcrawlexpects a thread number, including sync filters, gh-shim views/diffs, governance commands, neighbor lookup, embedding, and TUI jumps. - Document read-only SQLite query examples in the repo-local agent skill so agents can do exact local archive counts without mutating state.
- Document the crawlkit control surface now available on
main, includingmetadata --json,status --json, anddoctor --jsonfor local launchers and CI. - Clarify that
gitcrawl tuiremains the reference terminal browser for the crawl app family while sharedcrawlkit/tuiconverges on the same panes, sorting, action menus, and status chrome. - Add command-reference coverage for the read-only metadata/status commands.
- Add broader CLI, gh-shim, TUI, and store regression coverage for the verified release surface.
- Improve
ghshim cache coordination and observability with stale-while-revalidate reads, finer Actions/API TTLs, recent-window stats, top miss keys, andxcache snapshot.
- Add Homebrew tap installation via
brew install openclaw/tap/gitcrawl. - Improve the
ghshim cache with canonicalized keys, targeted mutation invalidation, stale-on-rate-limit fallback reads, completed-run TTLs, hit-rate stats, counter reset, and issue auto-hydration. - Add dark-mode support, a theme toggle, and clearer navigation styling to the generated docs site.
- Force embedding refreshes when the embedding input rune cap changes, so stale larger-cap vectors are not reused.
- Expand the
ghshim with local list filters, PR diff caching by cached head SHA, xcache GC, hit/miss/write counters, and throttled portable-store refreshes to reduce GitHub API pressure across agent sessions. - Add explicit PR-detail hydration for files, commits, checks, and workflow runs so
gh pr view,gh pr checks, andgh run list/viewcan answer common review reads from the existing SQLite cache. - Auto-hydrate one exact pull request when local PR detail reads miss or check/run data is stale, using
gh auth tokenifGITHUB_TOKENis absent, then retry from SQLite before falling back to livegh. - Cache more ghx-style read-only fallthroughs, including release, workflow, secret, variable, project, ruleset, gist, org, and search reads; cache repeat read failures by default; and clear the fallthrough cache after the corresponding mutating
ghcommands. - Promote portable backups to the v2 format: keep compact comments, PR files, commits, checks, and workflow runs while stripping raw JSON, generated documents, vectors, clusters, and run history.
- Add crawlkit control metadata/status surfaces with command-local
metadata --json,status --json, anddoctor --json. - Include the primary SQLite database inventory in status JSON so local control surfaces can discover archive storage without opening live stores.
- Route config path handling and SQLite openers through
crawlkitso GitHub archive tooling shares the same foundation as the Slack, Discord, and Notion crawlers. - Keep shared crawl app TUI nomenclature aligned while
gitcrawl tuiremains the richer cluster-browser reference implementation. - Keep the existing
gitcrawl tuias the family reference terminal interface and add CI smoke coverage for its help surface.
- Polish the TUI cluster browser interaction model, including separate cluster/member action menus, softer row state colors, stable viewport refresh, bidirectional age sorting, and buffered trackpad scrolling.
- Add OpenAI embedding retry handling for transient failures and cap oversized embedding inputs before sending them upstream.
- Improve GitHub pagination and retry behavior by surfacing page totals and honoring retry and rate-limit response headers.
- Harden human-key hash parsing and tidy the module graph.
- Fix portable store refreshes when local Git pull configuration tries to rebase onto multiple branch merge refs.
- Honor
GITCRAWL_GITHUB_BASE_URLandGITHUB_BASE_URLduringgitcrawl sync, matching cached search and test-server workflows. - Fix cached
search issues|prsagainst portable stores by using portable-safe thread body and raw JSON columns. - Keep read-only portable-store commands responsive when the backing Git remote is unavailable by making refresh best-effort and non-interactive with bounded SSH connection attempts.
- Add
gitcrawl sync --numbersfor exact issue and pull request hydration, including comment documents, without relying on list ordering or updated-time windows. - Implement
gitcrawl refreshandgitcrawl embedso synced repositories can generate OpenAI embeddings and rebuild durable clusters end to end. - Add
gitcrawl sync --state open|closed|allso incremental backups can refresh recently closed issues and pull requests. - Default
gitcrawl syncto--state all, keeping closed issue and pull request state fresh unless a narrower state is requested. - Let
gitcrawl searchfall back to compact thread title/body data when portable stores have pruned generated document indexes. - Refresh clean portable-store checkouts before read-only commands so
search,threads, clusters, and the TUI see freshly published GitHub backup data automatically. - Refresh portable-store status and clear stale SQLite sidecars so
doctorand local queries report freshly pulled backup data instead of stale sync metadata. - Open writable runtime mirrors for portable-store configs so
gitcrawl embed,refresh, and semantic neighbor generation can persist local vectors without mutating the GitHub backup checkout. - Show active primary cluster memberships by default in
clusters,durable-clusters, and the TUI, with--include-closedreserved for historical audit views. - Split generated clusters with bounded nearest-neighbor graph safeguards, GitHub reference evidence, and cross-kind score pruning so weak similarity bridges stop merging unrelated reports into one mega-cluster.
- Tighten clustering precision by ignoring ambiguous one-digit prose references and requiring weak embedding edges to share concrete title tokens unless they have high similarity or direct GitHub reference evidence.
- Treat later body-only issue references as weak evidence unless they share title overlap, while still preserving title and lead-body references for canonical issue/PR fix clusters.
- Hide GitHub-closed members from latest-run cluster summaries and details by default;
--include-closedstill shows the full historical cluster. - Add release plumbing for GitHub release archives via GoReleaser.