wip: ILC & R2R UTF-8 name mangling. #3

PaulusParssinen · 2024-05-17T17:04:22Z

wip: Previewing diffs

todo: desc
todo: r2r correctness

Background

dotnet/corert/issues/2178
dotnet/corert/pull/2200

todo: write motivation

This PR

Introduce ref struct Utf8StringBuilder
- To mirror already familiar internal ValueStringBuilder but UTF-8.
- Pooling now happens in the Utf8StringBuilder using the ArrayPool<byte>.Shared
  - Previously the builder instance was pooled (and consequently the underlying per-thread buffer, which did very well). There's pros and cons on both approaches as always but the control over the initialBuffer allows us to avoid allocating or renting altogether
  - Previously only the extension method GetName(this ISymbolNode ..) had pooling logic, but now we can fallback to pool anywhere where we build mangled symbol name strings.
Avoid intermediate string allocations
- For the existing a GetMangled[Type/Method/Field/String]Name methods,
  introduce AppendMangled[Type/Method/Field/String]Name overload which can be used to append directly to the existing builder (very common operation)
  - This removes a lot of "back-and-forth transcoding between UTF-8 <-> UTF-16 due to implicit string -> Utf8String cast and its overhead.

Problems and questions

This PR was inteded to keep as unintrusive as possible for easier review but that turned out to be very hard due to viral nature of symbol names. In my opinion, it was more straightforward to go all in with the Utf8String so this PR flows it through all the way to the ObjectWriter logic. Choosing a cut off point for the Utf8String elsewhere would render null the transcoding/allocation savings.
pre-existing inconsistent name sanitization. Fix now? Still relevant as this was for primarily for the Cpp backend?
- Name Mangling Robustness for Parameterized Types dotnet/corert#2175

Potential future improvements

If we get custom UTF-8 string interpolated handlers, they'll allow tidying the mangling logic.
IPrefixMangled[Method/Type] could be optimized to append mangled names

Results

`dotnet new webapiaot`

Stage 2 Goldilocks (TodosApi)

-100MB of GC heap allocations removed?

Compile ILC with ILC

todo:

Bench with using the NAOT compiled ILC

todo: how to measure NAOT GC perf

* And pass it around as ref. This required dropping the fluent pattern.

accidentally fixed bugs maybe, a lot of todos left

* The last diff is inconsistent usage of AppendMangledTypeName in InterfaceDispatchMap. Other nodes append unmangled < & > characters but in this type it was sanitized, causing this diff.

* more consistent of u8str initialization * simplify StringTableBuilder.CreateIndex

* Also reduce some allocations in unwind utf-8 paths

src/coreclr/tools/Common/Compiler/NameMangler.cs

...ools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/InterfaceDispatchCellSectionNode.cs

src/coreclr/tools/Common/Compiler/DependencyAnalysis/EmbeddedDataContainerNode.cs

* bug #1: don't allow for values out of the SerializationRecordType enum range * bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type * bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use) * bug dotnet#4: document the fact that IOException can be thrown * bug dotnet#5: throw SerializationException rather than OverflowException when parsing the decimal fails * bug dotnet#6: 0 and 17 are illegal values for PrimitiveType enum * bug dotnet#7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown)

Currently, offsets are incorrectly treated as indices which is leading to incorrect code being emitted. e.g., `ScatterWithByteOffsets<long>` emits `ST1D Zdata.D, Pg, [Xbase, Zoffsets.D, lsl #3]` instead of, `ST1D Zdata.D, Pg, [Xbase, Zoffsets.D]`

PaulusParssinen changed the title ~~wip: ILC & R2R name mangling.~~ wip: ILC & R2R UTF-8 name mangling. May 17, 2024

PaulusParssinen force-pushed the ilc-name-mangling-alloc branch 2 times, most recently from 933391b to b789b7c Compare May 29, 2024 23:05

PaulusParssinen marked this pull request as draft May 29, 2024 23:05

PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from b789b7c to 730ab37 Compare May 31, 2024 19:31

PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from f1b73e7 to 3753bef Compare June 30, 2024 23:47

PaulusParssinen added 9 commits July 20, 2024 22:54

Make Utf8StringBuilder a ref struct

b71f0c7

* And pass it around as ref. This required dropping the fluent pattern.

name mangling revamp wip: backup progress

3834d51

wip: object writer u8 speedrun any %

16d9f57

repro project compiles

0915e5f

accidentally fixed bugs maybe, a lot of todos left

One name diff left against repro.csproj

9369a21

* The last diff is inconsistent usage of AppendMangledTypeName in InterfaceDispatchMap. Other nodes append unmangled < & > characters but in this type it was sanitized, causing this diff.

Use custom interpolation more

4ec7c87

adjustments

69141d4

* more consistent of u8str initialization * simplify StringTableBuilder.CreateIndex

Adjust some code to its original form

71dfc6c

* Also reduce some allocations in unwind utf-8 paths

Initialize more Utf8Strings with stackalloc scratch buffer

2a934ca

PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from 3753bef to 2a934ca Compare July 20, 2024 19:55

PaulusParssinen commented Jul 20, 2024

View reviewed changes

src/coreclr/tools/Common/Compiler/NameMangler.cs Outdated Show resolved Hide resolved

PaulusParssinen commented Jul 20, 2024

View reviewed changes

...ools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/InterfaceDispatchCellSectionNode.cs Outdated Show resolved Hide resolved

PaulusParssinen commented Jul 20, 2024

View reviewed changes

src/coreclr/tools/Common/Compiler/DependencyAnalysis/EmbeddedDataContainerNode.cs Outdated Show resolved Hide resolved

PaulusParssinen added 4 commits July 20, 2024 21:21

Adjust comment and restore couple calls to original form

0e7fc3e

Update pulled upstream changes to use Utf8String

b4e9359

Use custom interpolation in crossgen2 name mangling

bfc5d56

Merge branch 'main' into ilc-name-mangling-alloc

0fdf418

PaulusParssinen mentioned this pull request Sep 8, 2025

Switch managed type system and compilers to utf-8 dotnet/runtime#119385

Merged

PaulusParssinen closed this Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wip: ILC & R2R UTF-8 name mangling. #3

wip: ILC & R2R UTF-8 name mangling. #3

Uh oh!

PaulusParssinen commented May 17, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wip: ILC & R2R UTF-8 name mangling. #3

wip: ILC & R2R UTF-8 name mangling. #3

Uh oh!

Conversation

PaulusParssinen commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

wip: Previewing diffs

Background

This PR

Problems and questions

Potential future improvements

Results

dotnet new webapiaot

Stage 2 Goldilocks (TodosApi)

Compile ILC with ILC

Bench with using the NAOT compiled ILC

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PaulusParssinen commented May 17, 2024 •

edited

Loading

`dotnet new webapiaot`