Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Avoid substring allocations in WebUtility.HtmlDecode#29402

Merged
stephentoub merged 3 commits intodotnet:masterfrom
juliushardt:remove-substring-allocations-in-WebUtility
May 1, 2018
Merged

Avoid substring allocations in WebUtility.HtmlDecode#29402
stephentoub merged 3 commits intodotnet:masterfrom
juliushardt:remove-substring-allocations-in-WebUtility

Conversation

@juliushardt
Copy link
Contributor

Fixes #13960. In contrast to #27250, this PR focuses exclusively on the substring allocations and not on the StringBuilder/TextWriter part.

There are three substring allocations in WebUtility: L199, L203 and L234.

While the first two ones can be avoided by simply changing a few API calls since uint.TryParse now accepts Span<char>, the third one involves a bit more work as it requires the html entity lookup table to be modified. @jamesqo described a possible solution in #13960, however, as he pointed out, it might have an initialization impact. The one implemented here is less flexible, but comes without additional initialization costs. The idea is that because all supported html entity strings are 8 characters or less and are ASCII-only (and hence each character of the entity string can be represented by a single byte), they can be squeezed into an UInt64, which serves as the key in the lookup table. Instructions to compute the key for possible future entries are included in the code comments. A disadvantage of this approach is that potential future HTML entity strings of 9 characters or more would require an additional code path. That being said, HtmlDecode now runs up to 35% faster:

Method Mean Error StdDev Gen 0 Allocated
TestOldImplementation 2.804 us 0.0558 us 0.0933 us 0.4425 1864 B
TestNewImplementation 1.798 us 0.0354 us 0.0363 us 0.2365 1000 B

I think that the performance improvements justify the added complexity and decreased readability.

@dnfclas
Copy link

dnfclas commented Apr 29, 2018

CLA assistant check
All CLA requirements met.

["clubs"] = '\x2663',
["hearts"] = '\x2665',
["diams"] = '\x2666',
[1953461617] = '\x0022', // "quot"
Copy link
Member

@stephentoub stephentoub Apr 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very hard to maintain. Since this table is computed once, I'd rather see a helper added that converts string to ulong, and then change each entry to just use that helper, e.g.

 [ToUInt64Key("quot")] = '\x0022',

You can then use the same helper in the lookup.

// maps entity strings => unicode chars
private static readonly LowLevelDictionary<string, char> s_lookupTable =
new LowLevelDictionary<string, char>(Count, StringComparer.Ordinal)
private static readonly LowLevelDictionary<ulong, char> s_lookupTable =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be faster if this is switched to regular Dictionary ?

LowLevelDictionary is left-over from the layering effort in the early .NET Core days. It is not as optimized as regular Dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of this change it seems we could aim to remove LLD from CoreFX entirely. It is only used here and in UriSyntax.cs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas worth me opening an issue to remove LLD from CoreFX? I am not sure how involved perf verification would have to be of such a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that replacing LowLevelDictionary with Dictionary does not affect the performance negatively. Here are some numbers (30000 calls to HtmlDecode per run):

LowLevelDictionary

Run Mean Error StdDev Gen 0 Allocated
1 410.4 ms 16.52 ms 0.9337 ms 59125.0000 177.61 MB
2 412.0 ms 32.60 ms 1.842 ms 59125.0000 177.61 MB
3 411.4 ms 36.08 ms 2.039 ms 59125.0000 177.61 MB
4 410.2 ms 1.824 ms 0.1030 ms 59125.0000 177.61 MB
5 413.7 ms 6.160 ms 0.3480 ms 59125.0000 177.61 MB

Regular Dictionary

Run Mean Error StdDev Gen 0 Allocated
1 413.5 ms 18.43 ms 1.041 ms 56375.0000 169.37 MB
2 406.3 ms 18.16 ms 1.026 ms 56375.0000 169.37 MB
3 406.9 ms 45.48 ms 2.570 ms 56375.0000 169.37 MB
4 410.8 ms 84.31 ms 4.763 ms 56375.0000 169.37 MB
5 411.4 ms 35.25 ms 1.992 ms 56375.0000 169.37 MB

return default;
}

key |= (ulong)entity[i] << 8 * i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shift by variable amount tends to be more expensive than shift by fixed amount. It may be better to do it other way: key = (key << 8) | entity[i]

else
{
string entity = value.Substring(entityOffset, entityLength);
ReadOnlySpan<char> entity = valueSpan.Slice(entityOffset, entityLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than storing value.AsSpan() into a local and then using valueSpan.Slice(offset, count) in these three places, you can just use value.AsSpan(offset, count) in these three place.

- Use regular Dictionary instead of LowLevelDictionary
- Use AsSpan overload instead of AsSpan() and Slice
- Avoid shift by variable amount
- Use helper method to generate keys to make the code easier to maintain
{
if (entity[i] > 0xFF)
{
return default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

return 0;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I also changed L1046 to return (char)0; to make it consistent with L238 where it checks if (entityChar != (char)0).


private static ulong ToUInt64Key(ReadOnlySpan<char> entity)
{
// The ulong key is the reversed single-byte character representation of the actual entity string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to assert here that the length is <= 8. If that ever doesn't hold, this in combination with the lookup table could give incorrect answers.

- Add assert to ToUInt64Key
- Replace default with 0
@stephentoub
Copy link
Member

Thanks!

@stephentoub stephentoub merged commit 1681f77 into dotnet:master May 1, 2018
@karelz karelz added this to the 2.2.0 milestone May 1, 2018
@juliushardt juliushardt deleted the remove-substring-allocations-in-WebUtility branch May 2, 2018 17:39
morganbr pushed a commit to morganbr/corefx that referenced this pull request Jul 7, 2018
* Avoid substring allocations in WebUtility.HtmlDecode

* Update changes to HtmlDecode based on feedback

- Use regular Dictionary instead of LowLevelDictionary
- Use AsSpan overload instead of AsSpan() and Slice
- Avoid shift by variable amount
- Use helper method to generate keys to make the code easier to maintain

* Assert that entity length is <= 8 in ToUInt64Key

- Add assert to ToUInt64Key
- Replace default with 0
zacharycmontoya pushed a commit that referenced this pull request Aug 14, 2018
* Manually update the version of Microsoft.TargetingPack.Private.NETNative

* Set Microsoft.TargetPack.Private.NETNative version to track live TFS version

* Move Hashtable & friends to shared parition

Include serialization info table for Hashtable

Remove hashtable & hashprovider src link

* Compile hashtable & friends only on uapaot

* Fix build breaks

* Remove Opcodes due to Opcodes moved to S.P.Corelib

* Move Hashtable & friends to shared parition (dotnet/coreclr#17316) (#29307)

* Move Hashtable & friends to shared parition

* Move HashHelper serialization logic into its own file

* Remove unchecked keyword in Hashtable

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Moving ConcurrentQueue to shared folder (dotnet/coreclr#17956)

* Moving ConcurrentQueue to shared folder

Fixes: #17751

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Moving ConcurrentQueue to shared (dotnet/coreclr#18024)

Making IProducerConsumerCollectionDebugView apis public

Fixes: #17751

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Modify corefx to use ConcurrentQueue from shared

* Delete dead code after Hashtable.cs move (#29432)

* Replace LowLevelDictionary with Dictionary (#29411)

* Avoid substring allocations in WebUtility.HtmlDecode (#29402)

* Avoid substring allocations in WebUtility.HtmlDecode

* Update changes to HtmlDecode based on feedback

- Use regular Dictionary instead of LowLevelDictionary
- Use AsSpan overload instead of AsSpan() and Slice
- Avoid shift by variable amount
- Use helper method to generate keys to make the code easier to maintain

* Assert that entity length is <= 8 in ToUInt64Key

- Add assert to ToUInt64Key
- Replace default with 0

* Remove CoreSetup, External, and Sni dependencies

* Update the CoreFx dependency to release/2.1 so that Microsoft.NETCore.Platforms is defined correctly. Manually invoke the dependency update

* Removing Fedora 26 and adding 28 as appropriate.  Fix a couple README.md typos.

* Remove Ubuntu 17.10 from CoreFX official runs
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…9402)

* Avoid substring allocations in WebUtility.HtmlDecode

* Update changes to HtmlDecode based on feedback

- Use regular Dictionary instead of LowLevelDictionary
- Use AsSpan overload instead of AsSpan() and Slice
- Avoid shift by variable amount
- Use helper method to generate keys to make the code easier to maintain

* Assert that entity length is <= 8 in ToUInt64Key

- Add assert to ToUInt64Key
- Replace default with 0


Commit migrated from dotnet/corefx@1681f77
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants