Skip to content

Conversation

@PaulusParssinen
Copy link
Owner

@PaulusParssinen PaulusParssinen commented May 17, 2024

wip: Previewing diffs

todo: desc
todo: r2r correctness

Background

dotnet/corert/issues/2178
dotnet/corert/pull/2200

todo: write motivation

This PR

  • Introduce ref struct Utf8StringBuilder
    • To mirror already familiar internal ValueStringBuilder but UTF-8.
    • Pooling now happens in the Utf8StringBuilder using the ArrayPool<byte>.Shared
      • Previously the builder instance was pooled (and consequently the underlying per-thread buffer, which did very well). There's pros and cons on both approaches as always but the control over the initialBuffer allows us to avoid allocating or renting altogether
      • Previously only the extension method GetName(this ISymbolNode ..) had pooling logic, but now we can fallback to pool anywhere where we build mangled symbol name strings.
  • Avoid intermediate string allocations
    • For the existing a GetMangled[Type/Method/Field/String]Name methods,
      introduce AppendMangled[Type/Method/Field/String]Name overload which can be used to append directly to the existing builder (very common operation)
      • This removes a lot of "back-and-forth transcoding between UTF-8 <-> UTF-16 due to implicit string -> Utf8String cast and its overhead.

Problems and questions

  • This PR was inteded to keep as unintrusive as possible for easier review but that turned out to be very hard due to viral nature of symbol names. In my opinion, it was more straightforward to go all in with the Utf8String so this PR flows it through all the way to the ObjectWriter logic. Choosing a cut off point for the Utf8String elsewhere would render null the transcoding/allocation savings.
  • pre-existing inconsistent name sanitization. Fix now? Still relevant as this was for primarily for the Cpp backend?

Potential future improvements

  • If we get custom UTF-8 string interpolated handlers, they'll allow tidying the mangling logic.
  • IPrefixMangled[Method/Type] could be optimized to append mangled names

Results

dotnet new webapiaot

Stage 2 Goldilocks (TodosApi)

  • -100MB of GC heap allocations removed?
    250601_todosapi

Compile ILC with ILC

  • todo:

Bench with using the NAOT compiled ILC

  • todo: how to measure NAOT GC perf

@PaulusParssinen PaulusParssinen changed the title wip: ILC & R2R name mangling. wip: ILC & R2R UTF-8 name mangling. May 17, 2024
@PaulusParssinen PaulusParssinen force-pushed the ilc-name-mangling-alloc branch 2 times, most recently from 933391b to b789b7c Compare May 29, 2024 23:05
@PaulusParssinen PaulusParssinen marked this pull request as draft May 29, 2024 23:05
@PaulusParssinen PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from b789b7c to 730ab37 Compare May 31, 2024 19:31
@PaulusParssinen PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from f1b73e7 to 3753bef Compare June 30, 2024 23:47
* And pass it around as ref. This required dropping the fluent pattern.
accidentally fixed bugs maybe, a lot of todos left
* The last diff is inconsistent usage of AppendMangledTypeName in InterfaceDispatchMap. Other nodes append unmangled < & > characters but in this type it was sanitized, causing this diff.
* more consistent of u8str initialization

* simplify StringTableBuilder.CreateIndex
* Also reduce some allocations in unwind utf-8 paths
@PaulusParssinen PaulusParssinen force-pushed the ilc-name-mangling-alloc branch from 3753bef to 2a934ca Compare July 20, 2024 19:55
PaulusParssinen pushed a commit that referenced this pull request Nov 17, 2024
* bug #1: don't allow for values out of the SerializationRecordType enum range

* bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type

* bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use)

* bug dotnet#4: document the fact that IOException can be thrown

* bug dotnet#5: throw SerializationException rather than OverflowException when parsing the decimal fails

* bug dotnet#6: 0 and 17 are illegal values for PrimitiveType enum

* bug dotnet#7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown)
PaulusParssinen pushed a commit that referenced this pull request Oct 13, 2025
Currently, offsets are incorrectly treated as indices which is
leading to incorrect code being emitted.

e.g., `ScatterWithByteOffsets<long>` emits
`ST1D Zdata.D, Pg, [Xbase, Zoffsets.D, lsl #3]`
instead of,
`ST1D Zdata.D, Pg, [Xbase, Zoffsets.D]`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants