Skip to content

codegen: Unify printing of and add some more internal IR validity errors#58429

Merged
Keno merged 1 commit intomasterfrom
kf/irerrorprint
May 16, 2025
Merged

codegen: Unify printing of and add some more internal IR validity errors#58429
Keno merged 1 commit intomasterfrom
kf/irerrorprint

Conversation

@Keno
Copy link
Copy Markdown
Member

@Keno Keno commented May 15, 2025

What should codegen do when it detects invalid IR? There are a few reasonable options. One is to just assert - this is relatively straightforward but also not very friendly because it immediately takes down your session, so you can't inspect what values may have caused the issue. Additionally, often allow invalid IR in dead code (because otherwise transformation passes would have to check whether the transformation that they're doing makes some code dead, which can be expensive or not necessarily visible).

Thus we generally try to keep codegen going, trying to bail back to valid code as quickly as possible. In a few places, we additionally insert an unmodeled emit_error("Internal Error"). These are a bit weird because the earlier stages of the compiler do not know that they can exist, but in practice they mostly work fine (although they can cause additional crashes if there are try/catches in the way). If we don't, then we generally just stop codegening, so if execution ever were to reach there, it'll just run into whatever code comes after, likely crashing fairly soon.

Because of this consideration, the emit_error is generally preferred. However, the tradeoff is that it increases the size of the code and LLVM can no longer optimize under the assumption that a particular code branch doesn't happen. We thus need to be judicious in where we use it. The general guidance is that it's fine to use in situations where the IR itself is obviously invalid, but should not be arbitrarily added to all instances of unreachable (putting behind NDEBUG is fine).

This PR cleans this up a bit by changing all these error locations to print INTERNAL ERROR - IR Validity, as well as adding a new one to emit_invoke when there's a manifest mismatch in type of the arguments and the signature. This new case is particularly useful after the recent addition of ABIOverride, as that makes it more likely that external AbstractInterpreters are doing ABI shenanigans.

What should codegen do when it detects invalid IR?
There are a few reasonable options. One is to just assert -
this is relatively straightforward but also not very friendly
because it immediately takes down your session, so you can't
inspect what values may have caused the issue. Additionally,
often allow invalid IR in dead code (because otherwise transformation
passes would have to check whether the transformation that they're
doing makes some code dead, which can be expensive or not necessarily
visible).

Thus we generally try to keep codegen going, trying to bail back to
valid code as quickly as possible. In a few places, we additionally
insert an unmodeled `emit_error("Internal Error")`. These are a bit
weird because the earlier stages of the compiler do not know that
they can exist, but in practice they mostly work fine (although
they can cause additional crashes if there are try/catches in the
way). If we don't, then we generally just stop codegening, so if
execution ever were to reach there, it'll just run into whatever
code comes after, likely crashing fairly soon.

Because of this consideration, the `emit_error` is generally preferred.
However, the tradeoff is that it increases the size of the code and LLVM
can no longer optimize under the assumption that a particular code
branch doesn't happen. We thus need to be judicious in where we use it.
The general guidance is that it's fine to use in situations where the
IR itself is obviously invalid, but should not be arbitrarily added
to all instances of `unreachable` (putting behind NDEBUG is fine).

This PR cleans this up a bit by changing all these error locations
to print `INTERNAL ERROR - IR Validity`, as well as adding a new
one to `emit_invoke` when there's a manifest mismatch in type
of the arguments and the signature. This new case is particularly
useful after the recent addition of ABIOverride, as that makes it
more likely that external AbstractInterpreters are doing ABI shenanigans.
@Keno Keno force-pushed the kf/irerrorprint branch from 491b9bd to 64fdba4 Compare May 15, 2025 23:47
@oscardssmith oscardssmith added the compiler:codegen Generation of LLVM IR and native code label May 15, 2025
@Keno Keno merged commit a483d90 into master May 16, 2025
6 of 8 checks passed
@Keno Keno deleted the kf/irerrorprint branch May 16, 2025 19:52
Keno added a commit that referenced this pull request Jun 4, 2025
See #58429 for general discussion of these kinds of errors.
We don't verify this one in the IR verifier, because it's currently
not really part of the structural invariants of the IR. One could
imagine using IRCode with a non-codegen backend that uses something
else here. Nevertheless, codegen needs to complain of course if
someone put something here it doesn't understand.
Keno added a commit that referenced this pull request Jun 4, 2025
See #58429 for general discussion of these kinds of errors. We don't
verify this one in the IR verifier, because it's currently not really
part of the structural invariants of the IR. One could imagine using
IRCode with a non-codegen backend that uses something else here.
Nevertheless, codegen needs to complain of course if someone put
something here it doesn't understand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler:codegen Generation of LLVM IR and native code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants