Skip to content

Breaking change: Limit the total size of all user strings within an assembly to 2^24B #9852

@tmat

Description

@tmat

The #UserStrings metadata heap is addressable by 24bit index. That means that all user strings (i.e. arguments to ldstr instructions) in the assembly have to fit into 2^24 bytes after they are serialized to the heap, except for the last string which has to start below 2^24 but can span further.

The order of strings is pretty much determined by the location of the string constant in source and the ordering of source files. In extreme cases, when a user project has a lot of strings and they happen to be ordered such that the last string is quite long, one may just switch the order of method definitions and get a compilation error since the heap index of the string that's now last doesn't fit into 24 bits.

The native compiler is using native IMetaDataEmit metadata writer. Experiments show that the writer has a rather arbitrary limit on the total size of encoded #UserStrings (something around 0x0E000000). Seems like an implementation detail.

I propose we cap the total encoded length of all user strings emitted into an assembly by 2^24 bytes to avoid surprises. It would be a breaking change, however it would only affect projects that are already on the very edge of what we can compile today and are very fragile.

The enforcement would be implemented in System.Reflection.Metadata.

Unfortunately the ECMA spec doesn't say anything about total size of heaps. In our errata https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/Ecma-335-Issues.md#heap-sizes we limit the sizes of #Blob, #Guid and #String heaps to 2^29. The limit on #UserString heap would be added. It should also be clarified that blobs, strings and guids can't span beyond the heap size even though their index is within the limit.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions