feat(workspace): FTS5 foundation + pluggable indexer + agent tools by alt-glitch · Pull Request #11796 · NousResearch/hermes-agent

alt-glitch · 2026-04-17T20:58:35Z

Summary

Stacked PR: builds a pluggable workspace indexing + search stack on top of the SQLite FTS5 foundation. Originally split across three PRs, now landed as one.

Layer 1 — FTS5 Foundation

Core storage, chunking, and CLI — the base layer that everything else sits on.

workspace/ package: SQLite FTS5 store, file discovery, config loading, search, CLI commands
Chunking via Chonkie Pipeline — three pre-built pipelines dispatched by file suffix: markdown (MarkdownChef + RecursiveChunker), code (CodeChunker), plain (RecursiveChunker). All three end with OverlapRefinery.
MarkdownChef splits markdown into prose / code / tables / images — each stored as its own kind (markdown_text / markdown_code / markdown_table / markdown_image) with per-modality metadata (language on code, alias text on images)
OverlapRefinery: suffix-method overlap stored in a separate context column, indexed into FTS via the retrieval_text trigger so queries find matches that only appear in the overlap region
.hermesignore support with full gitignore semantics via pathspec (precedence: .hermesignore > .gitignore > built-in defaults). The .hermesignore file itself is never indexed.
CLI: hermes workspace roots add/remove/list, hermes workspace index, hermes workspace search <query> with --path, --glob, --limit filters. JSON output by default, --human for Rich

Design decisions:

FTS5-only, no vector search yet — BM25 with porter stemmer handles the common case well
RecursiveChunker is the only prose chunker; semantic/neural were considered and dropped (BM25 doesn't benefit from topical coherence, and the pinned model deps + _neural_enforce_size wart weren't paying for themselves)
Absolute paths as primary keys, word-based tokenizer, schema v2 with context + chunk_metadata columns
Per-file error isolation with SAVEPOINT rollback — one bad file never aborts the run or corrupts previously indexed data
Concurrent-safe: sqlite3.connect(timeout=5.0) plus bounded retry on the one-shot schema bootstrap (WAL-pragma init doesn't honor busy_timeout)
Config signature includes chunk_size, overlap, overlap mode/method — any change triggers reindex

Salvaged from: PRs #1324 (original design spec by @teknium1) and #5840 (modularized pipeline, salvaged from #5619). Trimmed to FTS5-only with clean code in a flat package, no plugin system.

Layer 2 — Pluggable Indexer Architecture

The FTS5 implementation becomes the DefaultIndexer behind a swappable interface. Community can write plugins that replace the entire backend (semantic search, vector store, etc.) without touching core code.

BaseIndexer ABC (workspace/base.py) — plugin contract with index(), search(), status(), list_files(), retrieve(), delete(). Two required methods + four optional with no-op defaults.
DefaultIndexer (workspace/default.py) — restructured indexer.py + search.py into a class implementing the ABC. Internal methods (_build_pipelines, _process_file, etc.) are overridable for subclass-and-tweak patterns.
Plugin discovery at plugins/workspace/<name>/ mirroring the context_engine plugin pattern — register(ctx) function + filesystem discovery. get_indexer(config) factory picks plugin by indexer: config key; sentinel "default" skips plugin lookup.
Pydantic v2 migration for all workspace config models with field validators, replacing frozen dataclasses.
Semtools plugin (plugins/workspace/semtools/) — real-world validation of the abstractions. Wraps the semtools Rust CLI (model2vec semantic search) with idempotent npm i -g bootstrap. Implements all 6 BaseIndexer methods, with retrieve() intentionally left as no-op default (semtools uses embeddings, not chunked content storage).

Layer 3 — CLI + Agent Tools + Slash Commands (all backends)

Full surface coverage — same backend exposed through four channels. All four route through get_indexer(config) so switching the indexer: config key flips the whole stack transparently.

CLI subcommands (hermes workspace ...):

status — file count, chunk count, DB size, DB path
list — all indexed files with size + chunk count
retrieve <path> — all chunks for a specific file (JSON or human)
delete <path> — remove a file + its chunks from the index
Joins existing index, search, roots commands

Agent tools (6 separate tools, one-job-each design):

Tool	Default	Purpose
`workspace_search`	Yes (core)	FTS5 BM25 search across chunks
`workspace_index`	Yes (core)	Rebuild the workspace index
`workspace_status`	Opt-in	Index stats
`workspace_list`	Opt-in	List indexed files
`workspace_retrieve`	Opt-in	Get all chunks for a path
`workspace_delete`	Opt-in	Remove a file from the index

Each tool has its own focused schema (no monolithic action-dispatch). All gated on workspace.enabled via check_fn. Full set available via the workspace toolset for opt-in.

Interactive slash commands (CLI REPL, /workspace ...):

Registered in COMMAND_REGISTRY with tab-completion for all subcommands
Rich-formatted output in hermes_cli/workspace_slash.py
Dispatches through the same get_indexer() factory as the CLI

Gateway slash commands (Telegram, Discord, Slack, Signal, WhatsApp, Matrix, etc.):

_handle_workspace_command() in gateway/run.py — centralized dispatch (works across all platforms, no per-platform wiring)
Blocking SQLite I/O runs via run_in_executor to avoid blocking the event loop

Layer 4 — File Parsing (PDF, DOCX, PPTX → Markdown)

In-memory parsing layer that converts non-text binary formats to markdown before chunking. Parsed content flows through the existing markdown chunking pipeline (headings, tables, code blocks extracted properly).

FileParser ABC (workspace/parsers.py) — template-method pattern: abstract _convert(path) -> str + concrete parse(path) -> str|None with uniform error handling and logging
Two built-in backends:
- MarkitdownParser — wraps markitdown library (lazy import, cached instance across calls)
- PandocParser — wraps pandoc CLI via subprocess with 120s timeout
CompositeParser — routes file extensions to the correct backend. build_parser(config) factory builds the routing table from config.
ParsingConfig — Pydantic model under knowledgebase.parsing with default backend + per-extension overrides (e.g., use markitdown for PDF but pandoc for DOCX)
PARSEABLE_SUFFIXES — .pdf, .docx, .pptx carved out of BINARY_SUFFIXES in file discovery so they pass through to the parser instead of being skipped
DefaultIndexer integration — parsed files get suffix = ".md" override to route through the markdown chunking pipeline. Config signature includes parsing config so backend changes trigger reindex.
Graceful degradation — if markitdown isn't installed, parser logs a warning and skips the file (counted as error). Unknown backend names in config also log a warning.
[parsing] extra in pyproject.toml: markitdown[pdf,docx,pptx]>=0.1.0. Wired into the [workspace] extra via hermes-agent[parsing].

Backend Parity

Same 4 surfaces x any backend = zero coupling. Verified end-to-end:

Surface	DefaultIndexer (FTS5)	SemtoolsIndexer (plugin)
`hermes workspace ...` CLI	Yes	Yes
`workspace_*` agent tools	Yes (14/14 assertions)	Yes (dispatches through plugin)
`/workspace` CLI REPL slash	Yes (tmux-verified)	Yes
`/workspace` gateway slash	Yes	Yes

Flip indexer: "semtools" in config.yaml and every surface transparently routes through the semtools plugin. Zero changes needed in the tool/slash/CLI layers.

What comes next

Reranking: Add optional re-ranker support for search results.
More plugins: Validate abstractions with additional indexer plugins (vector stores, hybrid search).

File layout

File	Purpose
`workspace/__init__.py`	`get_indexer()` factory + public API
`workspace/base.py`	`BaseIndexer` ABC
`workspace/default.py`	`DefaultIndexer` — Chonkie + SQLite FTS5
`workspace/config.py`	Pydantic `WorkspaceConfig` / `KnowledgebaseConfig` / `ParsingConfig`
`workspace/constants.py`	Ignore patterns, file type sets, `PARSEABLE_SUFFIXES`, path helpers
`workspace/types.py`	`FileRecord`, `ChunkRecord`, `SearchResult`, `IndexingError`, `IndexSummary`
`workspace/store.py`	`SQLiteFTS5Store` — schema, CRUD, BM25 search, FTS5 triggers, WAL-safe bootstrap
`workspace/files.py`	File discovery with pathspec ignore parsing + parseable file passthrough
`workspace/parsers.py`	`FileParser` ABC, `MarkitdownParser`, `PandocParser`, `CompositeParser`, `build_parser()`
`workspace/commands.py`	CLI command handlers (all 7 actions)
`plugins/workspace/__init__.py`	Plugin discovery
`plugins/workspace/semtools/`	Semtools plugin (external tool validation, all 6 BaseIndexer methods)
`tools/workspace_tools.py`	6 agent tools with schemas + handlers
`hermes_cli/workspace_slash.py`	`/workspace` interactive CLI REPL slash

Modified files

hermes_cli/main.py — registers hermes workspace subcommand tree
hermes_cli/commands.py — adds CommandDef("workspace", ...) to COMMAND_REGISTRY
cli.py — dispatch branch for CLI REPL slash command
gateway/run.py — dispatch branch + _handle_workspace_command() for messaging platforms
hermes_cli/config.py — adds knowledgebase section to DEFAULT_CONFIG
toolsets.py — adds workspace tools to _HERMES_CORE_TOOLS + standalone workspace toolset
pyproject.toml — adds pathspec, charset-normalizer to core deps; chonkie[code] to [workspace] extra; markitdown[pdf,docx,pptx] to [parsing] extra

Test plan

github-actions · 2026-04-17T20:58:49Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…ad code - Rename IndexError → IndexingError (shadows Python builtin) - Use cached CodeChunker instead of per-block instantiation - Collapse identical strategy branches in _process_plain - Remove unused suffix param from _process_code/_process_plain - Remove dead iter_workspace_files function - Expose overlap property on _ChunkerCache (was accessing private _config) - Rewrite _build_line_offsets with regex (was O(n) char loop) - Fix duplicate --human argparse registration - Fix LIKE %/_ semantic bug in path prefix search (use substr) - Pre-compile FTS token regex at module level - Use dataclasses.replace for frozen record updates - Remove decisions.md and workspace-findings.md from PR - ruff format + lint clean on workspace/

github-actions · 2026-04-17T21:11:37Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Install hook files modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

chonknick · 2026-04-17T22:00:33Z

Hey! A few things that could simplify this significantly.

1. Use `Pipeline` instead of manual wiring

The _process_markdown function + _apply_overlap + _ChunkerCache (~300 lines) manually chains MarkdownChef → chunker → OverlapRefinery. Chonkie has a built-in Pipeline that handles this:

from chonkie import Pipeline

# Markdown files
doc = (Pipeline()
    .process_with("markdown")
    .chunk_with("recursive", tokenizer="word", chunk_size=512)
    .refine_with("overlap", tokenizer="word", context_size=32)
    .run(texts=text))

# doc.chunks  -> text chunks with overlap context
# doc.tables  -> MarkdownTable objects
# doc.code    -> MarkdownCode objects (with .language)
# doc.images  -> MarkdownImage objects (with .alias, .content, .link)

# Plain text / code files
doc = (Pipeline()
    .chunk_with("recursive", tokenizer="word", chunk_size=512)
    .refine_with("overlap", tokenizer="word", context_size=32)
    .run(texts=text))

This also eliminates the getattr fallback chains guessing at attribute names (getattr(block, "content", None) or getattr(block, "text", "") etc.) — the actual API is stable: MarkdownCode.content, MarkdownCode.language, MarkdownImage.alias, MarkdownImage.content, MarkdownTable.content.

One caveat: the pipeline only re-chunks doc.chunks (prose text segments). Code blocks, tables, and images pass through from MarkdownChef as-is — they're not further split by the chunker. For FTS indexing this is probably fine since markdown code blocks are usually reasonably sized, but if you need to split large code blocks with CodeChunker, you'd need to handle that as a post-pipeline step over doc.code.

2. Drop semantic/neural strategies

For FTS5/BM25 search, semantic and neural chunking don't provide meaningful benefit — BM25 is keyword-based, so topically coherent chunk boundaries don't improve match quality. RecursiveChunker handles this well. Dropping those strategies removes:

Two pinned model dependencies (potion-base-32M, chonky_modernbert_base_1)
The _neural_enforce_size workaround (which also mutates chunk objects directly — fragile)
The extra chonkie[semantic] / chonkie[neural] install paths
The _STRATEGY_DEPS validation logic

If you want a faster option, FastChunker is SIMD-accelerated (~100+ GB/s) and works in the pipeline too:

doc = (Pipeline()
    .process_with("markdown")
    .chunk_with("fast", chunk_size=2048)  # byte-based, no tokenizer needed
    .refine_with("overlap", tokenizer="word", context_size=32)
    .run(texts=text))

3. Minor issues

_apply_overlap reconstructs Chunk objects: You're converting ChunkRecord → chonkie.Chunk → back to ChunkRecord just for the refinery. The Pipeline handles this natively.
CodeChunker ignores known language: In markdown code blocks, you already have the language from MarkdownChef but use a single cached CodeChunker(language="auto"). The Pipeline dispatches this correctly.

Happy to help if you have questions about the Pipeline API!

Pins the target behavior for the Chonkie Pipeline migration: - Markdown files produce one ChunkRecord per modality with clean metadata - Small markdown files remain multimodal (no _single_chunk short-circuit) - Overlap context propagates and is a suffix of prior chunk's content - Deprecated strategy/threshold keys load without error - Signature change forces re-indexing Marked xfail(strict=True) until the migration lands in the next commit.

Replaces the manual MarkdownChef → chunker → OverlapRefinery wiring (and the _ChunkerCache / _apply_overlap / _group_overlap_runs / _neural_enforce_size layer) with three pre-built chonkie.Pipeline instances dispatched by file suffix. - Drop 'semantic' and 'neural' chunking strategies and their model pins; RecursiveChunker is the only prose chunker. BM25/FTS5 doesn't benefit from topical coherence. - Drop ChunkingConfig.strategy and ChunkingConfig.threshold. No legacy compatibility — old keys in user configs are silently ignored. - Drop block_index / src / link / row_count / column_count metadata as not load-bearing for keyword search. - Drop _single_chunk short-circuit so small files still flow through the Pipeline (preserves prose/code split for tiny multimodal markdown). - Drop manual _apply_overlap; OverlapRefinery.refine_document handles prose chunks inside the Pipeline, populating Chunk.context directly. - Bump CHUNKING_PLAN_VERSION from v1 to v2 so existing indexes get re-built cleanly on upgrade.

Code-review follow-up to the chonkie Pipeline migration: - Delete _extract_first_heading and _kind_from_suffix (only callers were _single_chunk, which was removed in the migration). - Drop unused `import pytest` in test_indexer_pipeline.py (all xfail markers removed post-migration). - Update test_indexer_pipeline.py docstrings to describe post-migration invariants rather than "current impl" behavior that no longer exists.

Follow-up to the chonkie Pipeline migration. Parallel black-box verification surfaced four bugs and three minor follow-ups: - Add PRAGMA busy_timeout via sqlite3.connect(timeout=5.0) — fixes concurrent index crashes exposed when _build_pipelines removed the lazy-init skew that previously hid the race. - Resolve path_prefix in search_workspace (Python API entry) so it matches the indexer's resolved stored paths, mirroring what the CLI already does in commands.py. - Hardcode .hermesignore exclusion in discovery, and add it to DEFAULT_IGNORE_PATTERNS belt-and-suspenders. - Extend DiscoveryResult with filtered_count and roll it into files_skipped so empty/oversized files stop vanishing from summaries. - Tighten pipelines dict type from dict[str, Any] to dict[str, Pipeline]. - Drop dead {"language": lang} branch in _process_code — CodeChunker language="auto" never populates the attribute. - Tighten test_small_markdown_file_is_split_into_modalities to assert prose doesn't swallow the code fence. Adds four regression tests covering each fix.

- Narrow markdown pipeline result with isinstance assert (static type correctness; .code/.tables/.images access no longer leaks Document abstraction). - Simplify _execute_with_lock_retry to 5 linear-backoff attempts; the helper was tuned for WAL schema bootstrap, not repeated retry. - Add Literal['markdown','code','plain'] alias for pipeline keys so typos at the dispatch site become type errors. - Fix misleading stage="discover" label on post-discovery errors; relabel as "read" where it actually applies. - Extract _make_config / _write into tests/workspace/conftest.py fixtures so the two test files share one source. - Factor str(Path(raw).resolve()) into workspace.constants.resolve_path_prefix and call from both search_workspace and the CLI command. - Drop a stale WHAT-comment on the retry backoff line.

The MarkdownDocument narrow in _process_markdown addressed one of three call sites. _process_code and _process_plain had the same Pyright gap — .chunks access on Document | list[Document]. Narrow with isinstance assert, consistent with the markdown path.

- Move PipelineKind = Literal[...] below all imports in indexer.py - Add 'from __future__ import annotations' to store.py so the SQLiteFTS5Store self-reference in __enter__'s return annotation resolves without the explicit string quote.

Drop 'from __future__ import annotations' from store.py and use quoted forward refs instead. Add matching quotes in config.py for KnowledgebaseConfig.from_dict and WorkspaceConfig.from_dict — these were raising NameError at class-definition time after an earlier lint pass stripped the future import that had been masking them.

Plugin contract for workspace backends. Implementations must define __init__(config), index(), and search(). status() is optional.

…xer everywhere Delete the old workspace/indexer.py and workspace/search.py modules. All test imports now use workspace.default.DefaultIndexer directly. Backwards-compat re-exports removed from workspace/__init__.py.

- Remove `from __future__ import annotations` from all new workspace files - Convert TYPE_CHECKING imports to real imports in base.py (no circular deps) - Quote self-referential forward ref in config.py model_validator - Add null checks on spec.loader in plugin discovery to satisfy ty

Adds a workspace indexer plugin backed by @llamaindex/semtools, a Rust CLI that does semantic search via model2vec. The plugin auto-installs semtools on first use and delegates indexing to semtools' lazy approach (embed-on-search). Includes 23 tests covering subclass contract, plugin discovery, factory integration, CLI invocation, result parsing, filtering, error handling, and edge cases.

Semtools is an external CLI tool — integration tests would require npm + network access and add CI complexity for minimal value.

Complete workspace CLI coverage with status, list, retrieve, delete commands. Add 6 separate agent tools (workspace_search, workspace_index, workspace_status, workspace_list, workspace_retrieve, workspace_delete) with per-tool schemas and check_fn gating on workspace.enabled. Wire /workspace slash command in interactive REPL with Rich formatting. - workspace_search and workspace_index are default-enabled in core tools - Full workspace toolset available for opt-in via /tools enable - BaseIndexer ABC extended with list_files(), retrieve(), delete() - SQLiteFTS5Store gains list_files() and get_chunks_for_file()

# Conflicts: # toolsets.py # uv.lock

github-actions · 2026-04-19T19:30:11Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Implements list_files() via workspace file discovery, delete() via semtools workspace prune, and enriches status() with root_dir and total_documents from 'semtools workspace status --json'. retrieve() intentionally left as no-op default — semtools uses embeddings, not chunked content storage. All 6 agent tools now route through plugin dispatch for both default and semtools backends with zero tool-layer changes. Also fix pyproject.toml merge: include both tui_gateway and workspace packages.

Adds _handle_workspace_command() to GatewayRunner with centralized dispatch for status, search, list, retrieve, delete, index, and roots subcommands. Works across Telegram, Discord, Slack, Signal, WhatsApp, Matrix, and all other platforms (dispatch is centralized in gateway/run.py, not per-platform). Uses run_in_executor for blocking SQLite I/O. Reuses workspace.commands helpers for roots add/remove to stay DRY with the CLI surface.

github-actions · 2026-04-19T19:40:39Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

New build_workspace_guidance(available_tools) returns a single guidance block that grows with which workspace tools are present. Core paragraph appears when workspace_search is available; retrieve/list/index add their own paragraphs when those tools are also available. workspace_delete is intentionally not prompted (destructive). Follows the existing MEMORY_GUIDANCE/SESSION_SEARCH_GUIDANCE/SKILLS_GUIDANCE pattern in agent/prompt_builder.py.

Wires build_workspace_guidance() into AIAgent._build_system_prompt() alongside the existing memory/session_search/skills hooks. Guidance is assembled dynamically from self.valid_tool_names and appended to the tool_guidance section. Closes the 0/6 workspace_search discoverability gap measured in the 2026-04-20 A/B dogfood.

M1: self.valid_tool_names is already a set[str], no need to wrap. M3: Add 'assert out is not None' guards so pyright correctly narrows the str|None return type in tests that then use in/index()/lower(). Code review cleanup from Task 2 review.

Makes the tool schema description explicitly tell the LLM to prefer workspace_search over grep/find/cat. Pairs with the new system-prompt guidance from build_workspace_guidance() — belt-and-suspenders on tool discoverability.

github-actions · 2026-04-19T21:12:14Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Adds markitdown[pdf,docx,pptx] as a new [parsing] optional extra and wires it into the [workspace] extra via hermes-agent[parsing].

- Cache MarkItDown instance across _convert() calls instead of recreating per file (perf during batch indexing) - Log warning when configured backend name is unknown in build_parser() - Normalize suffix casing in CompositeParser.can_parse() - Remove unused import in test file

github-actions · 2026-04-19T22:41:01Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

alt-glitch added 4 commits April 17, 2026 07:25

Add workspace FTS indexing and search

228a231

Harden workspace indexing and document verification

3db1a7e

chore(workspace): remove committed planning docs

9f42086

fix(workspace): harden indexing and CLI edge cases

066d285

alt-glitch changed the title ~~Sid/workspace salvage~~ feat(workspace): FTS5 indexing, search, and layered chunking foundation Apr 17, 2026

alt-glitch added 20 commits April 18, 2026 07:36

lint fixes et. al

5d99a78

feat(workspace): add BaseIndexer ABC

742cb55

Plugin contract for workspace backends. Implementations must define __init__(config), index(), and search(). status() is optional.

refactor(workspace): migrate config to Pydantic models

b40075e

feat(workspace): add DefaultIndexer class

577e273

refactor(workspace): wire CLI through get_indexer() factory

09c50eb

feat(workspace): add plugin discovery for workspace indexers

d934eba

test(workspace): add plugin architecture integration tests

9f8ca18

chore(workspace): remove semtools plugin tests

d9672ba

Semtools is an external CLI tool — integration tests would require npm + network access and add CI complexity for minimal value.

alt-glitch changed the title ~~feat(workspace): FTS5 indexing, search, and layered chunking foundation~~ feat(workspace): pluggable indexer architecture + CLI/tool/slash coverage Apr 19, 2026

Merge remote-tracking branch 'origin/main' into sid/workspace-salvage

15f623f

# Conflicts: # toolsets.py # uv.lock

alt-glitch marked this pull request as ready for review April 19, 2026 19:30

alt-glitch changed the title ~~feat(workspace): pluggable indexer architecture + CLI/tool/slash coverage~~ feat(workspace): FTS5 foundation + pluggable indexer + agent tools Apr 19, 2026

alt-glitch added 2 commits April 20, 2026 01:06

alt-glitch added 4 commits April 20, 2026 01:51

alt-glitch added 8 commits April 20, 2026 03:35

feat(workspace): add PARSEABLE_SUFFIXES and ParsingConfig

ec0fa5a

feat(workspace): FileParser ABC and MarkitdownParser

5008f12

feat(workspace): PandocParser backend

e9ab7c7

feat(workspace): CompositeParser and build_parser() factory

c938e28

feat(workspace): wire parser into discovery and DefaultIndexer

d431a20

fix(tests): stabilize parser tests under xdist parallelism

91b2c19

feat(workspace): add [parsing] extra with markitdown deps

11618e9

Adds markitdown[pdf,docx,pptx] as a new [parsing] optional extra and wires it into the [workspace] extra via hermes-agent[parsing].

alt-glitch requested a review from teknium1 April 19, 2026 22:43

alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/tools Tool registry, model_tools, toolsets labels Apr 24, 2026

nepenth mentioned this pull request Apr 30, 2026

feat(gateway): guard repo side effects with workspace binding #17938

Open

alt-glitch closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workspace): FTS5 foundation + pluggable indexer + agent tools#11796

feat(workspace): FTS5 foundation + pluggable indexer + agent tools#11796
alt-glitch wants to merge 40 commits into
mainfrom
sid/workspace-salvage

alt-glitch commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

chonknick commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alt-glitch commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Layer 1 — FTS5 Foundation

Layer 2 — Pluggable Indexer Architecture

Layer 3 — CLI + Agent Tools + Slash Commands (all backends)

Layer 4 — File Parsing (PDF, DOCX, PPTX → Markdown)

Backend Parity

What comes next

File layout

Modified files

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Install hook files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 17, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Install hook files modified

⚠️ WARNING: Dependency manifest files modified

Uh oh!

chonknick commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Use Pipeline instead of manual wiring

2. Drop semantic/neural strategies

3. Minor issues

Uh oh!

github-actions Bot commented Apr 19, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 19, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 19, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Dependency manifest files modified

Uh oh!

github-actions Bot commented Apr 19, 2026

⚠️ Supply Chain Risk Detected

⚠️ WARNING: Dependency manifest files modified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alt-glitch commented Apr 17, 2026 •

edited

Loading

chonknick commented Apr 17, 2026 •

edited

Loading

1. Use `Pipeline` instead of manual wiring