Skip to content

fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports#107

Merged
DeusData merged 1 commit intoDeusData:mainfrom
halindrome:fix/parse-imports-quadratic
Mar 23, 2026
Merged

fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports#107
DeusData merged 1 commit intoDeusData:mainfrom
halindrome:fix/parse-imports-quadratic

Conversation

@halindrome
Copy link
Copy Markdown
Contributor

@halindrome halindrome commented Mar 21, 2026

Fixes #106.

Problem

parse_generic_imports iterated top-level AST nodes with indexed ts_node_child(root, i). This is O(i) per call internally (walks the child linked list from 0 to i via ts_node_child_with_descendant), making the loop O(N²) total. ts_node_next_sibling() has the same problem — it also calls ts_node_child_with_descendant internally.

On repos with generated TypeScript .d.ts files or large Perl files with 5,000–50,000 top-level AST nodes, this causes the indexer to spin at 100% CPU indefinitely.

Fix

Replace with TSTreeCursor. ts_tree_cursor_goto_next_sibling() maintains traversal state and is O(1) per step. Only the iteration mechanism changes — the loop body logic is identical.

// Before (O(N²))
uint32_t count = ts_node_child_count(ctx->root);
for (uint32_t i = 0; i < count; i++) {
    TSNode node = ts_node_child(ctx->root, i);
    ...
}

// After (O(N))
TSTreeCursor cursor = ts_tree_cursor_new(ctx->root);
if (ts_tree_cursor_goto_first_child(&cursor)) {
    do {
        TSNode node = ts_tree_cursor_current_node(&cursor);
        ...
    } while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);

Test Results

  • Unit tests: 2042/2042 passed
  • Live test: a monorepo with 6 submodules (TypeScript, Perl, Java, HTML, CSS) that previously hung indefinitely now indexes in 71 seconds — 142,908 nodes, 218,685 edges

…c_imports

ts_node_child(root, i) and ts_node_next_sibling() both call
ts_node_child_with_descendant() internally, making sibling iteration
O(N²) total on roots with many children (e.g. generated TypeScript
.d.ts files or large Perl files with 5,000–50,000 top-level nodes).

Replace with TSTreeCursor: ts_tree_cursor_goto_next_sibling() maintains
traversal state across calls and is O(1) per step, reducing total
complexity from O(N²) to O(N).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@halindrome halindrome changed the title fix(cbm): replace O(N²) indexed loop with O(N) sibling-chain in parse_generic_imports fix(cbm): replace O(N²) ts_node_child loop with O(N) TSTreeCursor in parse_generic_imports Mar 21, 2026
@halindrome
Copy link
Copy Markdown
Contributor Author

QA Round 1

Reviewer: claude-opus-4-6
Date: 2026-03-21
Verdict: ✅ PASS (one MINOR pre-existing note, not a regression)

Findings

[PASS] Cursor traversal correctness
The TSTreeCursor-based loop visits exactly the same set of nodes as the old indexed loop. ts_tree_cursor_new(ctx->root) positions at root, goto_first_child() moves to the first child, and the do { ... } while (goto_next_sibling) pattern iterates all siblings exhaustively. No nodes are skipped or double-visited. The continue on line 591 correctly invokes the while condition, advancing to the next sibling.

[PASS] Resource management
ts_tree_cursor_delete(&cursor) is called on both exit paths: early return when root has no children (lines 585–586), and loop exhaustion (line 629). No leak.

[PASS] Thread safety
TSTreeCursor cursor is stack-allocated within parse_generic_imports(). No shared state.

[PASS] Cursor root vs. child semantics
Confirmed against tree-sitter API header: ts_tree_cursor_new(node) positions on that node; goto_first_child() moves to the first child of root — matching the old ts_node_child(root, 0) behavior. Correct.

[MINOR] Pre-existing fallback logic (not a regression)
Line 609: if (ctx->result->imports.count == 0) checks the cumulative count across all nodes, not whether the current node produced an import. Once any node succeeds via field-path extraction, the text-stripping fallback is permanently disabled for all subsequent nodes. This bug existed in the old indexed-loop code and is not introduced by this PR. Recommend tracking count_before per node in a follow-up.

Summary

The cursor conversion is correct and complete. The pre-existing fallback logic issue is out of scope for this fix. No changes needed for Round 1.

@halindrome
Copy link
Copy Markdown
Contributor Author

QA Round 2

Reviewer: claude-opus-4-6
Date: 2026-03-21
Verdict: ✅ PASS

Findings

[PASS] All call sites accounted for
parse_generic_imports is static with 21 call sites, all within cbm_extract_imports() in the same file. No signature change needed. No external callers.

[PASS] Edge cases correct

  • Empty root: goto_first_child returns false → cursor deleted → early return. Correct.
  • Single child: do-while processes one child, goto_next_sibling returns false → exits cleanly. Correct.
  • do-while is semantically equivalent to the old for loop — same node set, same order, same continue behavior.

[PASS] Test coverage
Tests exist for 13+ languages dispatched through parse_generic_imports (Kotlin, Scala, C#, Dart, Groovy, Perl, Elixir, Haskell, OCaml, Erlang, Zig, Lean, Magma). No large-file performance test exists, but that is a pre-existing gap.

[INFO] Other O(N²) extractors (out of scope)
Seven language-specific extractors (parse_go_imports, parse_python_imports, parse_java_imports, parse_rust_imports, parse_c_imports, parse_ruby_imports, parse_lua_imports) still use the indexed ts_node_child(ctx->root, i) pattern. These are out of scope for this PR but could be converted in a follow-up.

Summary

No issues found. The cursor conversion is correct on all edge cases and all call sites are accounted for. The seven remaining O(N²) extractors are noted for a follow-up issue.

@halindrome
Copy link
Copy Markdown
Contributor Author

QA Complete

Both rounds returned PASS with no confirmed critical or major findings. The PR is ready for review.

Summary of QA rounds:

  • Round 1: ✅ PASS — cursor traversal correct, resource management clean, one pre-existing minor note (fallback count check is per-file not per-node; existed before this PR, out of scope)
  • Round 2: ✅ PASS — all call sites accounted for, edge cases correct, 7 other O(N²) extractors noted as follow-up candidates

No fix commits were needed. The implementation is correct as submitted.

@DeusData DeusData added the stability/performance Server crashes, OOM, hangs, high CPU/memory label Mar 21, 2026
@DeusData DeusData merged commit 605cd92 into DeusData:main Mar 23, 2026
@DeusData
Copy link
Copy Markdown
Owner

Merged to main. Fixes #106. Thanks @halindrome — correct and minimal fix for a real indexer hang. TSTreeCursor is the right approach for sibling traversal in tree-sitter. Verified on Linux arm64 (ASan+LeakSan), Windows (cross-compile), and macOS arm64.

@halindrome
Copy link
Copy Markdown
Contributor Author

Thanks @DeusData. Note that QA round 2 suggested also fixing this in 7 other places as a follow-up task. I have this on my todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parse_generic_imports: O(N²) ts_node_child() loop hangs on large AST roots

3 participants