-
Notifications
You must be signed in to change notification settings - Fork 506
Name Mangling Robustness for Parameterized Types #2175
Conversation
458e3d5 to
d2c251a
Compare
| // restricted to that use only. Replace them if they happened to be used in any identifiers in | ||
| // the compilation input. | ||
| return _mangleForCplusPlus | ||
| ? santizedName.Replace(EnterNameScopeSequence, "_AA_").Replace(ExitNameScopeSequence, "_VV_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to use a single character that is also acceptable as part of a C++ identifier? E.g. acute accent and cedilla are allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or use < and > for when we're writing an object file, and the weird characters for C++ - basically I'm trying to see if we can get rid of the double String.Replace. If you look at the GC allocation profile of the compiler right now, we allocate a ton of strings because of how inefficient NameMangler already is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to keep the C++ to ASCII as much as possible, partially because different compilers have different levels of support for unicode character identifiers and this emitted C++ code is supposed to be ultra portable. I'll do a quick survey of the current support in gcc / llvm without requiring -fextended-identifiers switch and see what we can expect to work broadly.
Agree ... the name mangling needs revamp, for both correctness and overall efficiency (do not allocate as much to construct the mangled names; and make the mangled names shorter): Here are some things that I have been wondering about:
I assume that this PR is fixing a problem that Simon is running into. I would be ok with it to get things unblocked; and make the overall cleanup separately later. @nattress Could you please add a case to |
|
Opened #2178 |
MichalStrehovsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to keep SanitizeName allocation free probably doesn't matter given we're going to dig through everything anyway. LGTM
Fix issue 1964 and provide robustness for mangling type names. Previously, there was no scoping around generic instantiation argument lists in a mangled type name. That would allow situations where the following two types would have the same mangled name: [test]Gen1<[Test.CoreLib]System.Object[]>[] [test]Gen1<[Test.CoreLib]System.Object[][]> These would both mangle as test_Foo_Gen_1__Test_CoreLib_System_Object__Array__Array. To fix this, use reserved character sequences to denote a range of generic instantiation arguments and for the appended marker for arrays, byrefs, and pointer types. For RyuJIT, use standard angle brackets since object files allow them. For C++, we need to generate valid identifiers, so use "_A_" to denote a beginning marker, and "_V_" for an ending marker. If you tilt your head sideways, they sort of look like angle brackets. To prevent input code from clashing with these markers, when sanitizing names, replace them with something else: _A_ => _AA_; _V_ => _VV_. We don't need to worry about that with RyuJIT since angle brackets are already stripped during sanitization.
d2c251a to
dbdba48
Compare
|
|
||
| // Emit the Unicode byte order mark to prevent the MS C++ compiler treating the emitted | ||
| // C++ code as codepage 1251 | ||
| _out.Write('\uFEFF'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just use the other constructor of StreamWriter here? The one that takes an explicit Encoding parameter and pass it Encoding.Utf8 - that one should emit a BOM.
You might also be fixing #663 with this.
dbdba48 to
c49f5e6
Compare
Fix issue #1964 and provide robustness for mangled type names.