codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness)

## codedb v0.2.5823 — a few correctness/UX findings from a controlled audit

Hi — Pro user here, big fan of codedb (symbol recall was effectively 100% to the exact file:line in our tests, `codedb_word` is exhaustive and fast, and `codedb_outline` matched Python `ast` exactly on real files). While auditing we hit a few smaller things and would like to know if they're bugs, intended, or our misuse. Environment: codedb 0.2.5823, macOS arm64; ground truth computed with Python `ast`/`re`.

### 1. `codedb_outline` / `codedb_symbol` return nothing for non-ASCII (e.g. Korean) identifiers
```sh
printf 'def \xed\x95\x9c():\n    return 1\n' > /tmp/uni.py    # "def 한():"
# index a folder containing it, then:
# codedb_outline uni.py        -> header only, 0 symbols
# codedb_symbol  한            -> "no results"
# codedb_search/word for the bytes DOES find it
```
Python `ast` parses the function fine (valid Python 3 identifier). Is non-ASCII identifier support intended for the structural layer?

### 2. `codedb_find` returns confident hits for queries that match no filename (no score floor)
```
codedb_find "zzznosuchfilexyz"  -> notrail.py (32.79), oracle.json (31.36), unicode.py (25.30) ...
codedb_find "Widget"            -> empty.py (28.29), crlf.txt (24.96) ...   (no file named Widget)
```
The fuzzy subsequence matcher never returns empty, so a non-match looks like a ranked result. A score floor (or an explicit "no match") would help callers distinguish.

### 3. Kind labeling: Python `class` shown as `struct_def` / uniform `fn` header
`codedb_symbol Widget` on a Python `class Widget:` reports kind `struct_def`, and the status header labels every symbol `fn` regardless of type. `ast` says `class`. Cosmetic, but it confuses class-vs-function navigation in a Python codebase.

### 4. `codedb_search` caps at 50 results — is `codedb_word` the intended exhaustive tool?
For a token occurring 180x, `codedb_search` returns at most 50 (default 20), while `codedb_word` returned all 180+ uncapped. We've adopted `codedb_word` for exhaustive single-identifier lookups — just confirming that's the intended pattern, and whether the `codedb_search` cap is configurable.

### 5. Snapshot staleness + `/tmp` refusal — intended? (please confirm the supported workflow)
- After an out-of-band edit (appending a function) without re-indexing, `codedb_find/search/outline` miss the new symbol (serves the precomputed snapshot). We assume re-indexing is required — is there an auto-refresh or a "stale" indicator?
- `codedb index` refuses paths under `/tmp` ("refusing to index temporary root"). We worked around it by copying under `$HOME`; just confirming this is by design.

### 6. Benchmark question: "63us per op (p50)"
We measured ~46-49ms/op cold from the CLI (each invocation reloads the ~4069-file snapshot, ~28-34ms) and ~1-5ms/op via the warm MCP daemon; the internal "⚡" tick is ~451us. Does the 63us figure refer to the internal in-memory lookup (excluding snapshot load + process/transport)? Is there a way to keep the snapshot warm across CLI calls?

Thanks for codedb — genuinely useful. Happy to share our corpus generator/repros if helpful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness) #518

codedb v0.2.5823 — a few correctness/UX findings from a controlled audit

1. `codedb_outline` / `codedb_symbol` return nothing for non-ASCII (e.g. Korean) identifiers

2. `codedb_find` returns confident hits for queries that match no filename (no score floor)

3. Kind labeling: Python `class` shown as `struct_def` / uniform `fn` header

4. `codedb_search` caps at 50 results — is `codedb_word` the intended exhaustive tool?

5. Snapshot staleness + `/tmp` refusal — intended? (please confirm the supported workflow)

6. Benchmark question: "63us per op (p50)"

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness) #518

Description

codedb v0.2.5823 — a few correctness/UX findings from a controlled audit

1. codedb_outline / codedb_symbol return nothing for non-ASCII (e.g. Korean) identifiers

2. codedb_find returns confident hits for queries that match no filename (no score floor)

3. Kind labeling: Python class shown as struct_def / uniform fn header

4. codedb_search caps at 50 results — is codedb_word the intended exhaustive tool?

5. Snapshot staleness + /tmp refusal — intended? (please confirm the supported workflow)

6. Benchmark question: "63us per op (p50)"

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `codedb_outline` / `codedb_symbol` return nothing for non-ASCII (e.g. Korean) identifiers

2. `codedb_find` returns confident hits for queries that match no filename (no score floor)

3. Kind labeling: Python `class` shown as `struct_def` / uniform `fn` header

4. `codedb_search` caps at 50 results — is `codedb_word` the intended exhaustive tool?

5. Snapshot staleness + `/tmp` refusal — intended? (please confirm the supported workflow)