Skip to content

validateSemanticQuery rejects intra-word hyphens in vec/hyde queries (false negation match) #618

@okabadayi

Description

@okabadayi

Summary

validateSemanticQuery() treats any token containing - as a negation operator (-term) and aborts the entire structured query. This breaks any vec/hyde sub-query that legitimately contains hyphenated terms (auto-archive, multi-session, personal-documenter, etc.) — extremely common in technical writing.

Reproduce

qmd query --no-rerank "$(printf 'lex: anything\nhyde: A passage about auto-archive features and multi-session support.')" -n 5

Observed

error: Line 2 (hyde): Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.

The query aborts. No results returned.

Expected

Hyphens inside words (auto-archive, multi-session) should be passed through to the embedder as part of the natural-language passage. Only a leading - followed by a token (or -"phrase") on a token boundary should be parsed as a negation operator — and only in lex queries, where negation is documented behavior. Vec and hyde queries don't support negation at all per the docs, so the validator probably shouldn't be looking for -term tokens there in the first place.

Affected

  • v2.1.0 release tag and current main HEAD (commit e8de7ca at time of filing).
  • Any agent / script generating hyde passages from natural-language source where hyphenated terms are common (technical docs, software architecture writing, API names).

Workaround

Strip hyphens before constructing the hyde/vec query:

hyde = passage.replace(/-/g, ' ');

But this hurts embedder quality — auto archive is a less specific signal than auto-archive would be if it reached the model.

Suggested fix

Either:

  1. Skip the negation check entirely for vec/hyde types (the error message already says negation is unsupported there — the validator detecting it is asymmetric).
  2. Tighten the negation regex to only match a leading - at token boundary (e.g., (?:^|\s)-\S), so auto-archive doesn't trigger but query -baseball still does.

Source: src/store.ts validateSemanticQuery() (called from structuredSearch at line ~3425 in main HEAD).

Environment

  • qmd v2.1.0 (commit e8de7ca)
  • Linux x86_64, bun 1.3.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions