Skip to content

[clang] Improve nested name specifier AST representation#147835

Merged
mizvekov merged 2 commits intomainfrom
users/mizvekov/name-qualification-refactor
Aug 9, 2025
Merged

[clang] Improve nested name specifier AST representation#147835
mizvekov merged 2 commits intomainfrom
users/mizvekov/name-qualification-refactor

Conversation

@mizvekov
Copy link
Contributor

@mizvekov mizvekov commented Jul 9, 2025

This is a major change on how we represent nested name qualifications in the AST.

  • The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy.
  • The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present.
  • TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for stdexec. For one datapoint, for test_on2.cpp in that project, which is the slowest compiling test, this patch improves -c compilation time by about 7.2%, with the -fsyntax-only improvement being at ~12%.

This has great results on compile-time-tracker as well:
image

This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in clang/include/clang/AST and clang/lib/AST, with also important changes in clang/lib/Sema/TreeTransform.h.

The rest and bulk of the changes are mostly consequences of the changes in API.

PS: TagType::getDecl is renamed to getOriginalDecl in this patch, just for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes #43179
Fixes #68670
Fixes #92757

@shafik
Copy link
Collaborator

shafik commented Sep 3, 2025

This feels like a regression, unless you think the bug is elsewhere and the assert just catches it now: #156458 (comment)

@katzdm
Copy link
Contributor

katzdm commented Sep 5, 2025

"And on that day, every maintainer of a nontrivial clang fork cried out to the sky in unison."

@joanahalili
Copy link

joanahalili commented Sep 10, 2025

@mizvekov there is one more instance of a clang crash that we are encountering during our testing at google. Here is the reproducer https://godbolt.org/z/jfeW9d9Ts. Please take a look.

@alexfh
Copy link
Contributor

alexfh commented Sep 10, 2025

For convenience copying the code and clang stdout here:

class A;
using T = __attribute__((swiftasynccall)) void(
    __attribute__((swift_async_context)) A *);
T R;
class A;
__attribute__((swiftasynccall)) void
fa(__attribute__((swift_async_context))
                                  A *) {
  R == fa;
}
clang++: /root/llvm-project/llvm/tools/clang/lib/AST/ASTContext.cpp:14198: clang::QualType getCommonNonSugarTypeNode(const clang::ASTContext&, const clang::Type*, clang::Qualifiers&, const clang::Type*, clang::Qualifiers&): Assertion `EPIX.ExtParameterInfos == EPIY.ExtParameterInfos' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/clang++ -g -o /app/output.s -mllvm --x86-asm-syntax=intel -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -std=gnu++20 <source>
1.	<source>:9:10: current parser token ';'
2.	<source>:8:40: parsing function body 'fa'
3.	<source>:8:40: in compound statement ('{}')
 #0 0x0000000004083a38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4083a38)
 #1 0x0000000004080e64 llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4080e64)
 #2 0x0000000003fc51e8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007af017a42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007af017a969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007af017a42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007af017a287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x00007af017a2871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #8 0x00007af017a39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #9 0x00000000076059d8 getCommonNonSugarTypeNode(clang::ASTContext const&, clang::Type const*, clang::Qualifiers&, clang::Type const*, clang::Qualifiers&) ASTContext.cpp:0:0
#10 0x0000000007605f3b clang::ASTContext::getCommonSugaredType(clang::QualType, clang::QualType, bool) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x7605f3b)
#11 0x0000000006d4390d clang::Sema::FindCompositePointerType(clang::SourceLocation, clang::Expr*&, clang::Expr*&, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6d4390d)
#12 0x0000000006c38cd1 convertPointersToCompositeType(clang::Sema&, clang::SourceLocation, clang::ActionResult<clang::Expr*, true>&, clang::ActionResult<clang::Expr*, true>&) SemaExpr.cpp:0:0
#13 0x0000000006c6c125 clang::Sema::CheckCompareOperands(clang::ActionResult<clang::Expr*, true>&, clang::ActionResult<clang::Expr*, true>&, clang::SourceLocation, clang::BinaryOperatorKind) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6c6c125)
#14 0x0000000006c7454d clang::Sema::CreateBuiltinBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6c7454d)
#15 0x0000000006c75f98 clang::Sema::BuildBinOp(clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6c75f98)
#16 0x0000000006c76606 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6c76606)
#17 0x00000000067af1da clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult<clang::Expr*, true>, clang::prec::Level) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x67af1da)
#18 0x00000000067b27f9 clang::Parser::ParseExpression(clang::TypoCorrectionTypeBehavior) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x67b27f9)
#19 0x00000000068363f9 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x68363f9)
#20 0x000000000682d784 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 24u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::ParsedAttributes&, clang::ParsedAttributes&, clang::LabelDecl*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x682d784)
#21 0x000000000682e5f9 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 24u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::LabelDecl*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x682e5f9)
#22 0x0000000006836b0e clang::Parser::ParseCompoundStatementBody(bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6836b0e)
#23 0x000000000683732a clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x683732a)
#24 0x000000000673e82b clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x673e82b)
#25 0x0000000006774c8d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6774c8d)
#26 0x00000000067315fe clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x67315fe)
#27 0x0000000006731d9f clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6731d9f)
#28 0x0000000006739bda clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x6739bda)
#29 0x000000000673ab75 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x673ab75)
#30 0x000000000671b5ea clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x671b5ea)
#31 0x00000000049bbfa8 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x49bbfa8)
#32 0x0000000004cb65c5 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4cb65c5)
#33 0x0000000004c31b8e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4c31b8e)
#34 0x0000000004dabd81 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4dabd81)
#35 0x0000000000db57bf cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xdb57bf)
#36 0x0000000000dac46a ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#37 0x0000000004a25019 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0
#38 0x0000000003fc5684 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x3fc5684)
#39 0x0000000004a2562f clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0
#40 0x00000000049e619d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x49e619d)
#41 0x00000000049e722e clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x49e722e)
#42 0x00000000049ef1c5 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x49ef1c5)
#43 0x0000000000db1c75 clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xdb1c75)
#44 0x0000000000c647d4 main (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xc647d4)
#45 0x00007af017a29d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#46 0x00007af017a29e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#47 0x0000000000dabf15 _start (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xdabf15)

@alexfh
Copy link
Contributor

alexfh commented Sep 10, 2025

And a bit cleaned up (https://godbolt.org/z/q9fGPj54d):

class A;
using T = void(__attribute__((swift_async_context)) A *);
T R;
class A;
void fa(__attribute__((swift_async_context)) A *);

bool f() {
  return R == fa;
}
clang++: /root/llvm-project/llvm/tools/clang/lib/AST/ASTContext.cpp:14198: clang::QualType getCommonNonSugarTypeNode(const clang::ASTContext&, const clang::Type*, clang::Qualifiers&, const clang::Type*, clang::Qualifiers&): Assertion `EPIX.ExtParameterInfos == EPIY.ExtParameterInfos' failed.
...

@mizvekov
Copy link
Contributor Author

FWIW that crash is pre-existing, this PR just adds another way to reproduce it.

Here is a reproducer for clang-21, using Typedefs instead of RecordTypes to induce the problem: https://compiler-explorer.com/z/E89P31brT

class A;
using B = A;
using T = void(__attribute__((swift_async_context)) B *);
T R;

using B = A;
void fa(__attribute__((swift_async_context)) B *);

bool f() {
  return R == fa;
}

@mizvekov
Copy link
Contributor Author

@joanahalili @alexfh that's fixed in #157925

@zmodem
Copy link
Collaborator

zmodem commented Sep 11, 2025

On a positive note, we just updated Chromium's toolchain and saw a 5% reduction in build time, which I'm assuming is due to this change :-)

@emaxx-google
Copy link
Contributor

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

@mizvekov
Copy link
Contributor Author

mizvekov commented Oct 9, 2025

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

I think the documentation is right and matches what's happening.
It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type.
The type was previously split-up, each chunk being its own nested name specifier.
Now a Type is always a NestedNameSpecifier component by itself.

Sounds like you want to get the beginning of the range as something like:

if (auto TL = NNSL->getAsTypeLoc())
  return TL.getNonPrefixBeginLoc();
else
  return NNSL->getLocalBeginLoc();

I didn't add a helper for this because there was just a single user in the llvm codebase using this pattern.
See the clang-tools-extra/clangd/FindTarget.cpp change from this patch.

@hokein
Copy link
Collaborator

hokein commented Oct 9, 2025

I think the documentation is right and matches what's happening.
It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type.
The type was previously split-up, each chunk being its own nested name specifier.
Now a Type is always a NestedNameSpecifier component by itself.

Just my two cents.

I think the term "local" is ambiguous, and my mental model has always been that it refers to the last specifier in a qualified name. This seems to have been the behavior, and the public API doc implies it.

This recent change seems to break that model and introduces behavior that feels inconsistent:

  • For namespace specifier ns1::ns2::, getLocalSourceRange returns the range of "ns2::";
  • For type specifier ns1::C1::, getLocalSourceRange returns the range of "ns1::C1::";

This difference is confusing. While I understand the reasoning behind the internal implementation, API users aren't likely to be aware of it, and the "last piece" model seems more intuitive.

If this new behavior is intended, could we update the comments for getLocalSourceRange() to clarify these new rules and provide examples for each case? or even would it make sense to preserve the old behavior for the sake of API consistency?

@mizvekov
Copy link
Contributor Author

Yeah, all except one user of getLocalSourceRange expected the behavior to be what it is now.

I propose adding a note about this in the documentation, add a new helper which does what getLocalSourceRange used to do, and migrate the single existing user to use that.

@emaxx-google
Copy link
Contributor

emaxx-google commented Oct 12, 2025

Hello, did these changes modify the semantics of NestedNameSpecifierLoc::getLocalSourceRange()? For a code like void n::C::f() {}, previous versions of Clang returned C from getLocalSourceRange(), but now it's returning n::C. The new behavior seems to contradict the documentation of this method.

I think the documentation is right and matches what's happening. It's just that what's considered local has changed in the case of a NestedNameSpecifier which is a Type. The type was previously split-up, each chunk being its own nested name specifier. Now a Type is always a NestedNameSpecifier component by itself.

Sounds like you want to get the beginning of the range as something like:

Thanks, this workaround works for me. It's just that such subtle changes of the API (the function interface wasn't changed, its documentation wasn't changed and changelog wasn't clear on this backwards-incompatible change) are very hard to pinpoint&address in downstream projects.

@mizvekov
Copy link
Contributor Author

After another look at this, we can remove that single potential user and that's a simplification over the status quo.

See #163206

Also removes TypeLoc::getNonPrefixBeginLoc, which becomes unused after this change.

@emaxx-google see if your problem can be solved with a change similar to the above patch, that would be better than the solution I posted earlier.

I added the note on this patch, but also see that the example in that documentation already reflected that change.

@mizvekov
Copy link
Contributor Author

mizvekov commented Oct 13, 2025

I just posted #163271 which will rename getOriginalDecl back to getDecl, as I think enough time has passed and there has been a big enough window to facilitate any rebases.

Let me know if anyone needs more time.

Ideally we would like to get this in before the release, so that upstream users that need to support multiple clang versions don't need a bunch of extra #if #elses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 backend:AMDGPU backend:ARC backend:ARM backend:CSKY backend:Hexagon backend:Lanai backend:loongarch backend:MIPS backend:PowerPC backend:RISC-V backend:Sparc backend:SystemZ backend:WebAssembly backend:X86 clang:analysis clang:as-a-library libclang and C++ API clang:bytecode Issues for the clang bytecode constexpr interpreter clang:codegen IR generation bugs: mangling, exceptions, etc. clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category clang-tidy clang-tools-extra clangd coroutines C++20 coroutines debuginfo HLSL HLSL Language Support libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb

Projects

None yet