Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/jit-contrib |
There was a problem hiding this comment.
Pull request overview
This PR targets CodeQL/static-analysis findings in the CoreCLR ilasm tool by addressing integer type correctness and potential unaligned UTF-16 access patterns in the parser/helpers.
Changes:
- Use an unsigned 16-bit loop index when iterating metadata streams during PE file creation.
- Avoid unaligned
WCHAR*dereferences inSymWby reading viamemcpy. - Add/adjust alignment handling for Unicode parsing helpers (aligned buffers; enforce even-byte lengths for UTF-16 token copies).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/coreclr/ilasm/writer.cpp |
Updates stream-iteration index type to avoid signed/size issues when walking metadata streams. |
src/coreclr/ilasm/grammar_after.cpp |
Hardens Unicode parsing helpers against unaligned access and odd-length UTF-16 token handling (plus aligned scratch buffers). |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| uCount &= 0x1FFF; // limit: 8K wchars | ||
| // CodeQL [SM02986] this cast is safe because buff is WCHAR aligned and we null terminate below | ||
| WCHAR* wzBuff = (WCHAR*)buff; | ||
| memcpy(buff,pLine,uCount); | ||
| wzBuff[uCount >> 1] = 0; |
There was a problem hiding this comment.
In the Unicode path, buff is a char[] that is cast to WCHAR* and written through (wzBuff[uCount >> 1] = 0;). alignas(WCHAR) fixes alignment but not the strict-aliasing/effective-type issue of writing a WCHAR into storage declared as char[]. Consider using a WCHAR-typed buffer for the Unicode path (or a union/std::aligned_storage approach) so the terminator write is well-defined without relying on a suppression.
src/coreclr/ilasm/grammar_after.cpp
Outdated
| alignas(WCHAR) static char dbuff[256]; | ||
| char* pdummy = NULL; | ||
| if(L > 254) L = 254; | ||
| // L must be even in the Unicode case, so the null | ||
| // termination isn't split across two WCHARs | ||
| L &= ~(size_t)1; | ||
| memcpy(dbuff,begNum,L); | ||
| dbuff[L] = 0; | ||
| dbuff[L+1] = 0; | ||
| // CodeQL [SM02986] Cast to WCHAR is safe because dbuff is properly aligned and we ensure proper null termination above | ||
| *ppRes = new double(u16_strtod((const WCHAR*)dbuff, (WCHAR**)&pdummy)); | ||
| return ((unsigned)(pdummy - dbuff)); |
There was a problem hiding this comment.
In GetDoubleW, dbuff is a char[] that is later reinterpreted as const WCHAR* for u16_strtod. Even with alignas(WCHAR), accessing a char[] through a WCHAR* can violate C++ strict-aliasing/effective-type rules and is vulnerable to miscompilation under optimization. Consider changing the temporary buffer to a WCHAR[] (and tracking L/pdummy in WCHAR units) to avoid the cast and the CodeQL suppression entirely.
| alignas(WCHAR) static char dbuff[256]; | |
| char* pdummy = NULL; | |
| if(L > 254) L = 254; | |
| // L must be even in the Unicode case, so the null | |
| // termination isn't split across two WCHARs | |
| L &= ~(size_t)1; | |
| memcpy(dbuff,begNum,L); | |
| dbuff[L] = 0; | |
| dbuff[L+1] = 0; | |
| // CodeQL [SM02986] Cast to WCHAR is safe because dbuff is properly aligned and we ensure proper null termination above | |
| *ppRes = new double(u16_strtod((const WCHAR*)dbuff, (WCHAR**)&pdummy)); | |
| return ((unsigned)(pdummy - dbuff)); | |
| static WCHAR dbuff[128]; | |
| WCHAR* pdummy = NULL; | |
| // Limit L (in bytes) to leave room for a terminating WCHAR null. | |
| const unsigned maxBytes = static_cast<unsigned>(sizeof(dbuff) - sizeof(WCHAR)); | |
| if (L > maxBytes) | |
| { | |
| L = maxBytes; | |
| } | |
| // L must be even in the Unicode case, so the null | |
| // termination isn't split across two WCHARs. | |
| L &= ~(size_t)1; | |
| memcpy(dbuff, begNum, L); | |
| const size_t wcharCount = L / sizeof(WCHAR); | |
| dbuff[wcharCount] = 0; | |
| *ppRes = new double(u16_strtod(dbuff, &pdummy)); | |
| const size_t consumedWChars = static_cast<size_t>(pdummy - dbuff); | |
| return static_cast<unsigned>(consumedWChars * sizeof(WCHAR)); |
There was a problem hiding this comment.
I believe we compile with fno strict aliasing
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I think focus should move to the managed ilasm in |
Until that is working, which is a long ways off, we still need to satisfy tooling and all the requirements for the existing code. |
It is just pending test infra support, everything else is in place; all TODOs addressed. |
Co-authored-by: Aaron R Robinson <arobins@microsoft.com>
Fix two issues that got flagged
1 - Fix a mistyped loop counter.
2 - One of the files frequently casts CHAR*->WCHAR*. Retyping isn't reasonable, because these methods are called through function pointers using char*. These casts are probably safe since they all point into MapViewOfFile, but I add alignment constraints and bailouts if somehow we have an odd pointer. The analyzer still complains however, so I also need an explicit suppression of the warning.
I'm kicking off another analysis pipeline in a bit to make sure the suppressions get picked up