Summary
validateSemanticQuery() treats any token containing - as a negation operator (-term) and aborts the entire structured query. This breaks any vec/hyde sub-query that legitimately contains hyphenated terms (auto-archive, multi-session, personal-documenter, etc.) — extremely common in technical writing.
Reproduce
qmd query --no-rerank "$(printf 'lex: anything\nhyde: A passage about auto-archive features and multi-session support.')" -n 5
Observed
error: Line 2 (hyde): Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.
The query aborts. No results returned.
Expected
Hyphens inside words (auto-archive, multi-session) should be passed through to the embedder as part of the natural-language passage. Only a leading - followed by a token (or -"phrase") on a token boundary should be parsed as a negation operator — and only in lex queries, where negation is documented behavior. Vec and hyde queries don't support negation at all per the docs, so the validator probably shouldn't be looking for -term tokens there in the first place.
Affected
- v2.1.0 release tag and current
main HEAD (commit e8de7ca at time of filing).
- Any agent / script generating hyde passages from natural-language source where hyphenated terms are common (technical docs, software architecture writing, API names).
Workaround
Strip hyphens before constructing the hyde/vec query:
hyde = passage.replace(/-/g, ' ');
But this hurts embedder quality — auto archive is a less specific signal than auto-archive would be if it reached the model.
Suggested fix
Either:
- Skip the negation check entirely for
vec/hyde types (the error message already says negation is unsupported there — the validator detecting it is asymmetric).
- Tighten the negation regex to only match a leading
- at token boundary (e.g., (?:^|\s)-\S), so auto-archive doesn't trigger but query -baseball still does.
Source: src/store.ts validateSemanticQuery() (called from structuredSearch at line ~3425 in main HEAD).
Environment
- qmd v2.1.0 (commit
e8de7ca)
- Linux x86_64, bun 1.3.9
Summary
validateSemanticQuery()treats any token containing-as a negation operator (-term) and aborts the entire structured query. This breaks any vec/hyde sub-query that legitimately contains hyphenated terms (auto-archive,multi-session,personal-documenter, etc.) — extremely common in technical writing.Reproduce
qmd query --no-rerank "$(printf 'lex: anything\nhyde: A passage about auto-archive features and multi-session support.')" -n 5Observed
The query aborts. No results returned.
Expected
Hyphens inside words (
auto-archive,multi-session) should be passed through to the embedder as part of the natural-language passage. Only a leading-followed by a token (or-"phrase") on a token boundary should be parsed as a negation operator — and only inlexqueries, where negation is documented behavior. Vec and hyde queries don't support negation at all per the docs, so the validator probably shouldn't be looking for-termtokens there in the first place.Affected
mainHEAD (commite8de7caat time of filing).Workaround
Strip hyphens before constructing the hyde/vec query:
But this hurts embedder quality —
auto archiveis a less specific signal thanauto-archivewould be if it reached the model.Suggested fix
Either:
vec/hydetypes (the error message already says negation is unsupported there — the validator detecting it is asymmetric).-at token boundary (e.g.,(?:^|\s)-\S), soauto-archivedoesn't trigger butquery -baseballstill does.Source:
src/store.tsvalidateSemanticQuery()(called fromstructuredSearchat line ~3425 in main HEAD).Environment
e8de7ca)