Name Mangling Robustness for Parameterized Types #2175

nattress · 2016-11-10T01:18:29Z

Fix issue #1964 and provide robustness for mangled type names.

MichalStrehovsky · 2016-11-10T17:13:52Z

src/ILCompiler.Compiler/src/Compiler/NameMangler.cs

+            // restricted to that use only. Replace them if they happened to be used in any identifiers in 
+            // the compilation input.
+            return _mangleForCplusPlus
+                ? santizedName.Replace(EnterNameScopeSequence, "_AA_").Replace(ExitNameScopeSequence, "_VV_")


Would it be easier to use a single character that is also acceptable as part of a C++ identifier? E.g. acute accent and cedilla are allowed.

(Or use < and > for when we're writing an object file, and the weird characters for C++ - basically I'm trying to see if we can get rid of the double String.Replace. If you look at the GC allocation profile of the compiler right now, we allocate a ton of strings because of how inefficient NameMangler already is)

I was trying to keep the C++ to ASCII as much as possible, partially because different compilers have different levels of support for unicode character identifiers and this emitted C++ code is supposed to be ultra portable. I'll do a quick survey of the current support in gcc / llvm without requiring -fextended-identifiers switch and see what we can expect to work broadly.

jkotas · 2016-11-10T17:36:30Z

you look at the GC allocation profile of the compiler right now, we allocate a ton of strings

Agree ... the name mangling needs revamp, for both correctness and overall efficiency (do not allocate as much to construct the mangled names; and make the mangled names shorter):

Here are some things that I have been wondering about:

Do we need to include the namespaces in the typenames? What if we instead just use the type simple name; and disambiguate it if there is conflict - similar to what we do for methods.
Use shortcut for the system module or frequently used system types. E.g. The reference to System.String type today is something like: System_Private_CoreLib__System__String. What if it is just something like #String?
CppCodeGen and native codegen name mangling can be quite different
Store the mangled strings as UTF8
...

I assume that this PR is fixing a problem that Simon is running into. I would be ok with it to get things unblocked; and make the overall cleanup separately later.

@nattress Could you please add a case to tests\src\Simple\Generics\Generics.cs that fails before this change and passes with it?

jkotas · 2016-11-10T17:53:23Z

Opened #2178

MichalStrehovsky

Trying to keep SanitizeName allocation free probably doesn't matter given we're going to dig through everything anyway. LGTM

Fix issue 1964 and provide robustness for mangling type names. Previously, there was no scoping around generic instantiation argument lists in a mangled type name. That would allow situations where the following two types would have the same mangled name: [test]Gen1<[Test.CoreLib]System.Object[]>[] [test]Gen1<[Test.CoreLib]System.Object[][]> These would both mangle as test_Foo_Gen_1__Test_CoreLib_System_Object__Array__Array. To fix this, use reserved character sequences to denote a range of generic instantiation arguments and for the appended marker for arrays, byrefs, and pointer types. For RyuJIT, use standard angle brackets since object files allow them. For C++, we need to generate valid identifiers, so use "_A_" to denote a beginning marker, and "_V_" for an ending marker. If you tilt your head sideways, they sort of look like angle brackets. To prevent input code from clashing with these markers, when sanitizing names, replace them with something else: _A_ => _AA_; _V_ => _VV_. We don't need to worry about that with RyuJIT since angle brackets are already stripped during sanitization.

MichalStrehovsky · 2016-11-11T23:18:14Z

src/ILCompiler.Compiler/src/CppCodeGen/CppWriter.cs


+            // Emit the Unicode byte order mark to prevent the MS C++ compiler treating the emitted
+            // C++ code as codepage 1251
+            _out.Write('\uFEFF');


Can you just use the other constructor of StreamWriter here? The one that takes an explicit Encoding parameter and pass it Encoding.Utf8 - that one should emit a BOM.

You might also be fixing #663 with this.

dnfclas added the cla-already-signed label Nov 10, 2016

nattress force-pushed the namemangling branch from 458e3d5 to d2c251a Compare November 10, 2016 01:21

MichalStrehovsky reviewed Nov 10, 2016

View reviewed changes

jkotas mentioned this pull request Nov 10, 2016

Namemangling revamp #2178

Open

MichalStrehovsky approved these changes Nov 11, 2016

View reviewed changes

nattress force-pushed the namemangling branch from d2c251a to dbdba48 Compare November 11, 2016 23:10

MichalStrehovsky reviewed Nov 11, 2016

View reviewed changes

Add regression test

c49f5e6

nattress force-pushed the namemangling branch from dbdba48 to c49f5e6 Compare November 12, 2016 00:26

nattress merged commit bfdf0e0 into dotnet:master Nov 12, 2016

nattress deleted the namemangling branch November 12, 2016 01:24

MichalStrehovsky mentioned this pull request Jan 25, 2017

Constructed type name mangling robustness #1964

Closed

PaulusParssinen mentioned this pull request Jul 22, 2024

wip: ILC & R2R UTF-8 name mangling. PaulusParssinen/runtime#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Name Mangling Robustness for Parameterized Types #2175

Name Mangling Robustness for Parameterized Types #2175

Uh oh!

nattress commented Nov 10, 2016 •

edited

Loading

Uh oh!

MichalStrehovsky Nov 10, 2016

Uh oh!

MichalStrehovsky Nov 10, 2016

Uh oh!

nattress Nov 11, 2016

Uh oh!

jkotas commented Nov 10, 2016 •

edited

Loading

Uh oh!

jkotas commented Nov 10, 2016

Uh oh!

MichalStrehovsky left a comment

Uh oh!

MichalStrehovsky Nov 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Name Mangling Robustness for Parameterized Types #2175

Name Mangling Robustness for Parameterized Types #2175

Uh oh!

Conversation

nattress commented Nov 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichalStrehovsky Nov 10, 2016

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky Nov 10, 2016

Choose a reason for hiding this comment

Uh oh!

nattress Nov 11, 2016

Choose a reason for hiding this comment

Uh oh!

jkotas commented Nov 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Nov 10, 2016

Uh oh!

MichalStrehovsky left a comment

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky Nov 11, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nattress commented Nov 10, 2016 •

edited

Loading

jkotas commented Nov 10, 2016 •

edited

Loading