Issues such as #11937, #11929 basically suggests that the token ranges are overlapping which is causing the panic in downstream tools.
I think it would be useful to validate the parsed output in the test cases to make sure that the token ranges don't overlap. This would be similar to the validation done on the AST.