Skip to content

fix(query): correct last_child_step_index in cases where a new step wasn't created.#4422

Merged
WillLillis merged 1 commit intotree-sitter:masterfrom
WillLillis:query_steps
May 3, 2025
Merged

fix(query): correct last_child_step_index in cases where a new step wasn't created.#4422
WillLillis merged 1 commit intotree-sitter:masterfrom
WillLillis:query_steps

Conversation

@WillLillis
Copy link
Member

@WillLillis WillLillis commented May 2, 2025

The Problem

Not 100% confident I have the root cause, but here's the basic idea:

The full query:

(call_expression
  function: (scoped_identifier
    path: (scoped_identifier (identifier) @_regex (#any-of? @_regex "Regex" "RegexBuilder") .))
  (#set! injection.language "regex"))

For the stack frame that causes the OOB access, we're inside the for loop here:

tree-sitter/lib/src/query.c

Lines 2477 to 2487 in bfc5d11

for (;;) {
// Parse a negated field assertion
if (stream->next == '!') {
stream_advance(stream);
stream_skip_whitespace(stream);
if (!stream_is_ident_start(stream)) {
capture_quantifiers_delete(&child_capture_quantifiers);
return TSQueryErrorSyntax;
}
const char *field_name = stream->input;
stream_scan_identifier(stream);

and the start of the stream is pointing to (#any-of? @_regex "Regex" "RegexBuilder")...... Further down in the loop's body, step_index stores the value of self-steps.size here:

tree-sitter/lib/src/query.c

Lines 2518 to 2525 in bfc5d11

uint16_t step_index = self->steps.size;
TSQueryError e = ts_query__parse_pattern(
self,
stream,
depth + 1,
child_is_immediate,
&child_capture_quantifiers
);

The recursive call to ts_query__parse_pattern immediately after parses the predicate (#any-of? @_regex "Regex" "RegexBuilder"). Parsing the predicate did not push new entries to self->steps, so when last_child_step_index is assigned to step_index following that call, it now indexes past the end of self->steps. The loop runs again, and this time it parses the anchor, here:

tree-sitter/lib/src/query.c

Lines 2511 to 2516 in bfc5d11

// Parse a sibling anchor
if (stream->next == '.') {
child_is_immediate = true;
stream_advance(stream);
stream_skip_whitespace(stream);
}

After this, another recursive call to ts_query__parse_pattern returns immediately with PARENT_DONE, since the stream now points to ). The first 3 if conditions following the recursive call are satisfied, so we then use the out of bounds last_child_step_index to index into self->steps.contents, causing the UB.

The Solution

Add a simple check for this case, decrementing step_index before its assignment to last_child_step_index if necessary.

@WillLillis WillLillis requested a review from amaanq May 2, 2025 04:35
@WillLillis WillLillis added c-library query Related to query execution ci:backport release-0.25 Backport label labels May 2, 2025
wasn't created.

This fixes an OOB access to `self.steps` when a last child anchor
immediately follows a predicate.
@WillLillis WillLillis merged commit b1d2b7c into tree-sitter:master May 3, 2025
18 checks passed
@WillLillis WillLillis deleted the query_steps branch May 3, 2025 21:27
@tree-sitter-ci-bot
Copy link

Successfully created backport PR for release-0.25:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c-library ci:backport release-0.25 Backport label query Related to query execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Undefined behaviour/out of bounds array access in query parser

3 participants