Skip to content

fix(lib): address strict aliasing violations with Array type#5242

Merged
WillLillis merged 2 commits intotree-sitter:masterfrom
WillLillis:array_aliasing
Jan 24, 2026
Merged

fix(lib): address strict aliasing violations with Array type#5242
WillLillis merged 2 commits intotree-sitter:masterfrom
WillLillis:array_aliasing

Conversation

@WillLillis
Copy link
Member

@WillLillis WillLillis commented Jan 17, 2026

Altering the Array type itself isn't feasible, as this causes unacceptable breakage with existing parsers that depend on it. Instead, pass in individual Array fields for to various _array__* functions.

Any time the contents of an array may be modified (freed, reallocd, etc), return the potentially new address out by value. This prevents any strict aliasing violations as we're no longer writing to a type-casted pointer.

Note that the array.h included in a grammar's src/tree-sitter/ directory is updated upon running tree-sitter generate.

This is a rework of #5194, thanks @nwf for taking an initial pass and discussing solutions!

@WillLillis
Copy link
Member Author

WillLillis commented Jan 18, 2026

I believe the CI failures are now due to this line in tree-sitter-php's scanner:

https://github.com/tree-sitter/tree-sitter-php/blob/master/common/scanner.h#L468

(Here's the snippet since GH won't render it inline)

array_delete(&array_pop(&scanner->heredocs).word);

Taking the address of a non-lvalue is undefined behavior, and this seems to be it. I've opened a PR to fix the issue here (tree-sitter/tree-sitter-php#293), so once that's merged and a new release is published, we can update the tag in test/fixtures/fixtures.json, and then (barring any other issues with other corpus grammars) this PR can proceed.

@WillLillis WillLillis marked this pull request as ready for review January 18, 2026 06:08
@clason
Copy link
Contributor

clason commented Jan 19, 2026

Once the php scanner is fixed, I'd like to run a crater test with Neovim built against this PR. (I'm sure that's not the only parser with... questionable behavior out there...)

WillLillis and others added 2 commits January 20, 2026 23:53
Set tree-sitter-php fixture to `upstream_test_fixture` branch.
Also remove the `--update` flag, as it does nothing.
Altering the `Array` type itself isn't feasible, as this causes
unacceptable breakage with existing parsers that depend on it. Instead,
pass in individual `Array` fields for to various `_array__*` functions.

Any time the `contents` of an array may be modified (`free`d, `realloc`d,
etc), return the potentially new address out by value. This prevents any
strict aliasing violations as we're no longer writing to a type-casted
pointer.

Co-authored-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
@WillLillis
Copy link
Member Author

I've added 29281b1, which allows fixtures.json to specify an override branch for a test fixture grammar to be checked out to. I've created the upstream_test_fixture branch in the tree-sitter-php repo, and have pointed the fixtures.json file to use that branch instead of the v0.24.2 tag. The changes in this branch include fixing the undefined behavior described in this comment, as well as regenerating the parser in order to update the repo's array.h copies.

CC @clason I think this is a better solution than disabling the PHP tests, especially given how hard to predict/reason about some of these crashes are.

@nwf
Copy link
Contributor

nwf commented Jan 21, 2026

I am not entirely sure that this will close #4960, though it is surely a huge improvement, since, as I noted over in #4960 (comment), there is also some strict-aliasing violation going on in tree_cursor.c. I wouldn't be surprised if that isn't biting anyone yet, but, strictly speaking...

@WillLillis
Copy link
Member Author

I am not entirely sure that this will close #4960, though it is surely a huge improvement, since, as I noted over in #4960 (comment), there is also some strict-aliasing violation going on in tree_cursor.c. I wouldn't be surprised if that isn't biting anyone yet, but, strictly speaking...

I was planning on extracting the violations in tree_cursor.c to a separate issue and keep #4960 scoped to the crashes observed on ppc64el (I still need to fight with docker in order to test these changes on there again.). I'm planning on taking a look at tree_cursor.c after we get this resolved. 🙂

@nwf
Copy link
Contributor

nwf commented Jan 21, 2026

Sounds like a plan! Thank you!

@WillLillis
Copy link
Member Author

I've gone through most of the grammars with external scanners in nvim-treesitter to check for further fallout from this change. Besides tree-sitter-php, The following repos crashed when regenerated and tested:

@clason
Copy link
Contributor

clason commented Jan 24, 2026

also

All three PRs are now merged, so this is good to go?

@WillLillis WillLillis merged commit 5177b3d into tree-sitter:master Jan 24, 2026
20 checks passed
@WillLillis WillLillis deleted the array_aliasing branch January 24, 2026 19:18
@tree-sitter-ci-bot
Copy link

Successfully created backport PR for release-0.26:

@nwf
Copy link
Contributor

nwf commented Jan 24, 2026

Belatedly (sorry!), can say that cargo xtask test looks to have only unrelated failures on my powerpc64le Debian box. (Not quite sure why "tests::corpus_test::test_feature_corpus_files" is failing with "called Result::unwrap() on an Err value: ScannerSymbols(ScannerSymbolError { missing: ["tree_sitter_external_unicode_column_alignment_external_scanner_create", ...", but it is, but it doesn't seem relevant).

@WillLillis
Copy link
Member Author

Belatedly (sorry!), can say that cargo xtask test looks to have only unrelated failures on my powerpc64le Debian box. (Not quite sure why "tests::corpus_test::test_feature_corpus_files" is failing with "called Result::unwrap() on an Err value: ScannerSymbols(ScannerSymbolError { missing: ["tree_sitter_external_unicode_column_alignment_external_scanner_create", ...", but it is, but it doesn't seem relevant).

That shouldn't be relevant to these changes, but I'm not sure why you'd get that error. Have you cargo xtask fetch-fixtures and cargo xtask generate-fixtures?

@WillLillis
Copy link
Member Author

WillLillis commented Jan 28, 2026

Following up on this, the fixture corpus tests are now passing for me in a ppcle64 ubuntu docker container. I have ran into the same missing symbol issue raised by @nwf above. The problem seems to be here:

let section = if cfg!(all(target_arch = "powerpc64", target_os = "linux")) {
" D "
} else {
" T "
};

For some background, on non-Windows platforms we use the system installed nm to verify symbols in the generated parser shared library. There is a special case for powerpc64 to use the D section of nm's output, and T for all other architectures. It looks like with the a more recent version of binutils, ld now places the functions in the correct section, or something in nm changed to indicate the correct section. Potentially related discussion:

https://lists.gnu.org/archive/html/bug-binutils/2021-11/msg00024.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"tree-sitter test" crashes on ppc64el since v0.25.7

3 participants