Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Optimize TextEncoder.FindFirstCharacterToEncodeUtf8#42177

Closed
gfoidl wants to merge 43 commits intodotnet:masterfrom
gfoidl:TextEncoder_ImproveEscapingCheck
Closed

Optimize TextEncoder.FindFirstCharacterToEncodeUtf8#42177
gfoidl wants to merge 43 commits intodotnet:masterfrom
gfoidl:TextEncoder_ImproveEscapingCheck

Conversation

@gfoidl
Copy link
Member

@gfoidl gfoidl commented Oct 28, 2019

This PR is based on #42073 (that's why it's marked as draft, to see the changes after #42073 got merged: https://github.com/gfoidl/corefx/compare/ImproveEscapingCheck...gfoidl:TextEncoder_ImproveEscapingCheck Edit: #42073 is now merged, so this one is ready for review)

Description

Only the ASCII-path was touched.

Benchmarks

Benchmarks are based on the ones from #41845 (https://gist.github.com/ahsonkhan/c566f5e7d65c1fde5a83a67be290c4ee#file-test_escapingbenchmark-cs got adapted for this)

with SSSE3 enabled
| Method |       TestStringData |      Mean |     Error |    StdDev |    Median | Ratio | RatioSD |
|------- |--------------------- |----------:|----------:|----------:|----------:|------:|--------:|
| Master | (1, -(...)te[]) [22] |  21.41 ns | 0.0309 ns | 0.0274 ns |  21.40 ns |  1.00 |    0.00 |
|     PR | (1, -(...)te[]) [22] |  19.23 ns | 0.1229 ns | 0.1089 ns |  19.19 ns |  0.90 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (1, 0(...)te[]) [21] |  21.60 ns | 0.3552 ns | 0.2966 ns |  21.50 ns |  1.00 |    0.00 |
|     PR | (1, 0(...)te[]) [21] |  18.79 ns | 0.0220 ns | 0.0184 ns |  18.79 ns |  0.87 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (100,(...)te[]) [24] | 112.40 ns | 0.4321 ns | 0.3373 ns | 112.30 ns |  1.00 |    0.00 |
|     PR | (100,(...)te[]) [24] |  39.95 ns | 0.3994 ns | 0.3335 ns |  39.82 ns |  0.35 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (15, (...)te[]) [23] |  42.50 ns | 0.2383 ns | 0.1990 ns |  42.41 ns |  1.00 |    0.00 |
|     PR | (15, (...)te[]) [23] |  40.09 ns | 0.0848 ns | 0.0708 ns |  40.08 ns |  0.94 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (16, (...)te[]) [23] |  34.57 ns | 0.1952 ns | 0.1630 ns |  34.50 ns |  1.00 |    0.00 |
|     PR | (16, (...)te[]) [23] |  22.14 ns | 0.0277 ns | 0.0216 ns |  22.14 ns |  0.64 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (17, (...)te[]) [23] |  35.92 ns | 0.2825 ns | 0.2359 ns |  35.81 ns |  1.00 |    0.00 |
|     PR | (17, (...)te[]) [23] |  23.30 ns | 0.1098 ns | 0.0917 ns |  23.26 ns |  0.65 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (2, -(...)te[]) [22] |  23.01 ns | 0.1779 ns | 0.1577 ns |  22.93 ns |  1.00 |    0.00 |
|     PR | (2, -(...)te[]) [22] |  20.69 ns | 0.0429 ns | 0.0358 ns |  20.68 ns |  0.90 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (32, (...)te[]) [23] |  49.03 ns | 0.3464 ns | 0.2892 ns |  48.93 ns |  1.00 |    0.00 |
|     PR | (32, (...)te[]) [23] |  24.76 ns | 0.0560 ns | 0.0468 ns |  24.75 ns |  0.50 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (4, -(...)te[]) [22] |  25.90 ns | 0.0300 ns | 0.0250 ns |  25.90 ns |  1.00 |    0.00 |
|     PR | (4, -(...)te[]) [22] |  23.83 ns | 0.2902 ns | 0.2573 ns |  23.68 ns |  0.92 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (7, -(...)te[]) [22] |  30.38 ns | 0.0407 ns | 0.0318 ns |  30.39 ns |  1.00 |    0.00 |
|     PR | (7, -(...)te[]) [22] |  28.26 ns | 0.1200 ns | 0.1064 ns |  28.28 ns |  0.93 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 0(...)te[]) [21] |  21.42 ns | 0.0438 ns | 0.0366 ns |  21.42 ns |  1.00 |    0.00 |
|     PR | (7, 0(...)te[]) [21] |  18.97 ns | 0.4146 ns | 0.3462 ns |  18.80 ns |  0.89 |    0.02 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 1(...)te[]) [21] |  23.00 ns | 0.2232 ns | 0.1864 ns |  22.90 ns |  1.00 |    0.00 |
|     PR | (7, 1(...)te[]) [21] |  20.33 ns | 0.0969 ns | 0.0809 ns |  20.30 ns |  0.88 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 2(...)te[]) [21] |  24.41 ns | 0.0599 ns | 0.0531 ns |  24.40 ns |  1.00 |    0.00 |
|     PR | (7, 2(...)te[]) [21] |  21.84 ns | 0.1300 ns | 0.1085 ns |  21.79 ns |  0.89 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 3(...)te[]) [21] |  26.00 ns | 0.2157 ns | 0.1801 ns |  25.91 ns |  1.00 |    0.00 |
|     PR | (7, 3(...)te[]) [21] |  23.52 ns | 0.3503 ns | 0.3277 ns |  23.30 ns |  0.90 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 4(...)te[]) [21] |  27.40 ns | 0.0437 ns | 0.0341 ns |  27.40 ns |  1.00 |    0.00 |
|     PR | (7, 4(...)te[]) [21] |  24.78 ns | 0.0283 ns | 0.0221 ns |  24.78 ns |  0.90 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 5(...)te[]) [21] |  29.10 ns | 0.4427 ns | 0.3924 ns |  29.00 ns |  1.00 |    0.00 |
|     PR | (7, 5(...)te[]) [21] |  28.39 ns | 2.4063 ns | 3.5271 ns |  26.35 ns |  1.05 |    0.13 |
|        |                      |           |           |           |           |       |         |
| Master | (7, 6(...)te[]) [21] |  30.41 ns | 0.2024 ns | 0.1893 ns |  30.31 ns |  1.00 |    0.00 |
|     PR | (7, 6(...)te[]) [21] |  27.78 ns | 0.0993 ns | 0.0880 ns |  27.74 ns |  0.91 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (8, -(...)te[]) [22] |  32.05 ns | 0.4349 ns | 0.3632 ns |  31.93 ns |  1.00 |    0.00 |
|     PR | (8, -(...)te[]) [22] |  29.80 ns | 0.3755 ns | 0.3512 ns |  29.63 ns |  0.93 |    0.02 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 0(...)te[]) [21] |  21.42 ns | 0.0466 ns | 0.0389 ns |  21.42 ns |  1.00 |    0.00 |
|     PR | (8, 0(...)te[]) [21] |  18.79 ns | 0.0309 ns | 0.0241 ns |  18.80 ns |  0.88 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 1(...)te[]) [21] |  22.88 ns | 0.0246 ns | 0.0205 ns |  22.88 ns |  1.00 |    0.00 |
|     PR | (8, 1(...)te[]) [21] |  20.31 ns | 0.0312 ns | 0.0292 ns |  20.30 ns |  0.89 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 2(...)te[]) [21] |  24.49 ns | 0.1688 ns | 0.1579 ns |  24.41 ns |  1.00 |    0.00 |
|     PR | (8, 2(...)te[]) [21] |  21.81 ns | 0.0416 ns | 0.0369 ns |  21.79 ns |  0.89 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 3(...)te[]) [21] |  25.90 ns | 0.0334 ns | 0.0296 ns |  25.90 ns |  1.00 |    0.00 |
|     PR | (8, 3(...)te[]) [21] |  23.36 ns | 0.1907 ns | 0.1593 ns |  23.29 ns |  0.90 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 4(...)te[]) [21] |  27.38 ns | 0.0476 ns | 0.0371 ns |  27.38 ns |  1.00 |    0.00 |
|     PR | (8, 4(...)te[]) [21] |  24.84 ns | 0.0851 ns | 0.0754 ns |  24.82 ns |  0.91 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 5(...)te[]) [21] |  28.92 ns | 0.1522 ns | 0.1349 ns |  28.87 ns |  1.00 |    0.00 |
|     PR | (8, 5(...)te[]) [21] |  26.30 ns | 0.0560 ns | 0.0467 ns |  26.29 ns |  0.91 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 6(...)te[]) [21] |  30.33 ns | 0.1331 ns | 0.1112 ns |  30.28 ns |  1.00 |    0.00 |
|     PR | (8, 6(...)te[]) [21] |  27.80 ns | 0.0974 ns | 0.0814 ns |  27.78 ns |  0.92 |    0.00 |
|        |                      |           |           |           |           |       |         |
| Master | (8, 7(...)te[]) [21] |  31.89 ns | 0.1779 ns | 0.1577 ns |  31.82 ns |  1.00 |    0.00 |
|     PR | (8, 7(...)te[]) [21] |  29.41 ns | 0.3828 ns | 0.3196 ns |  29.28 ns |  0.92 |    0.01 |
|        |                      |           |           |           |           |       |         |
| Master | (9, -(...)te[]) [22] |  33.34 ns | 0.0397 ns | 0.0310 ns |  33.33 ns |  1.00 |    0.00 |
|     PR | (9, -(...)te[]) [22] |  31.16 ns | 0.0385 ns | 0.0300 ns |  31.16 ns |  0.93 |    0.00 |

with SSSE3 disabled
| Method |       TestStringData |      Mean |     Error |    StdDev | Ratio | RatioSD |
|------- |--------------------- |----------:|----------:|----------:|------:|--------:|
| Master | (1, -(...)te[]) [22] |  21.65 ns | 0.4649 ns | 0.4774 ns |  1.00 |    0.00 |
|     PR | (1, -(...)te[]) [22] |  19.51 ns | 0.0188 ns | 0.0147 ns |  0.90 |    0.02 |
|        |                      |           |           |           |       |         |
| Master | (1, 0(...)te[]) [21] |  21.42 ns | 0.0343 ns | 0.0267 ns |  1.00 |    0.00 |
|     PR | (1, 0(...)te[]) [21] |  18.88 ns | 0.1829 ns | 0.1621 ns |  0.88 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (100,(...)te[]) [24] | 112.47 ns | 0.5272 ns | 0.4674 ns |  1.00 |    0.00 |
|     PR | (100,(...)te[]) [24] |  99.12 ns | 1.4151 ns | 1.1816 ns |  0.88 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (15, (...)te[]) [23] |  42.37 ns | 0.2029 ns | 0.1694 ns |  1.00 |    0.00 |
|     PR | (15, (...)te[]) [23] |  40.50 ns | 0.0637 ns | 0.0497 ns |  0.96 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (16, (...)te[]) [23] |  34.43 ns | 0.0591 ns | 0.0493 ns |  1.00 |    0.00 |
|     PR | (16, (...)te[]) [23] |  31.39 ns | 0.3907 ns | 0.3463 ns |  0.91 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (17, (...)te[]) [23] |  37.43 ns | 0.7463 ns | 0.7986 ns |  1.00 |    0.00 |
|     PR | (17, (...)te[]) [23] |  32.30 ns | 0.2275 ns | 0.2017 ns |  0.87 |    0.03 |
|        |                      |           |           |           |       |         |
| Master | (2, -(...)te[]) [22] |  22.92 ns | 0.0272 ns | 0.0227 ns |  1.00 |    0.00 |
|     PR | (2, -(...)te[]) [22] |  21.27 ns | 0.4480 ns | 0.3498 ns |  0.93 |    0.02 |
|        |                      |           |           |           |       |         |
| Master | (32, (...)te[]) [23] |  48.86 ns | 0.0676 ns | 0.0599 ns |  1.00 |    0.00 |
|     PR | (32, (...)te[]) [23] |  42.96 ns | 0.2491 ns | 0.2080 ns |  0.88 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (4, -(...)te[]) [22] |  25.94 ns | 0.0771 ns | 0.0644 ns |  1.00 |    0.00 |
|     PR | (4, -(...)te[]) [22] |  31.73 ns | 0.7657 ns | 1.9901 ns |  1.12 |    0.16 |
|        |                      |           |           |           |       |         |
| Master | (7, -(...)te[]) [22] |  30.79 ns | 0.7730 ns | 1.0051 ns |  1.00 |    0.00 |
|     PR | (7, -(...)te[]) [22] |  28.45 ns | 0.0284 ns | 0.0222 ns |  0.92 |    0.03 |
|        |                      |           |           |           |       |         |
| Master | (7, 0(...)te[]) [21] |  21.46 ns | 0.0657 ns | 0.0582 ns |  1.00 |    0.00 |
|     PR | (7, 0(...)te[]) [21] |  19.01 ns | 0.3988 ns | 0.3535 ns |  0.89 |    0.02 |
|        |                      |           |           |           |       |         |
| Master | (7, 1(...)te[]) [21] |  23.00 ns | 0.2197 ns | 0.1715 ns |  1.00 |    0.00 |
|     PR | (7, 1(...)te[]) [21] |  20.29 ns | 0.0572 ns | 0.0478 ns |  0.88 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (7, 2(...)te[]) [21] |  24.38 ns | 0.0477 ns | 0.0398 ns |  1.00 |    0.00 |
|     PR | (7, 2(...)te[]) [21] |  21.80 ns | 0.0297 ns | 0.0248 ns |  0.89 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (7, 3(...)te[]) [21] |  25.99 ns | 0.2287 ns | 0.1909 ns |  1.00 |    0.00 |
|     PR | (7, 3(...)te[]) [21] |  24.08 ns | 1.2012 ns | 1.2335 ns |  0.93 |    0.06 |
|        |                      |           |           |           |       |         |
| Master | (7, 4(...)te[]) [21] |  27.39 ns | 0.0266 ns | 0.0222 ns |  1.00 |    0.00 |
|     PR | (7, 4(...)te[]) [21] |  24.80 ns | 0.0294 ns | 0.0245 ns |  0.91 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (7, 5(...)te[]) [21] |  29.02 ns | 0.1999 ns | 0.1772 ns |  1.00 |    0.00 |
|     PR | (7, 5(...)te[]) [21] |  26.43 ns | 0.2822 ns | 0.2204 ns |  0.91 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (7, 6(...)te[]) [21] |  30.30 ns | 0.0366 ns | 0.0305 ns |  1.00 |    0.00 |
|     PR | (7, 6(...)te[]) [21] |  27.75 ns | 0.1068 ns | 0.0892 ns |  0.92 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (8, -(...)te[]) [22] |  31.95 ns | 0.2510 ns | 0.2096 ns |  1.00 |    0.00 |
|     PR | (8, -(...)te[]) [22] |  30.01 ns | 0.1453 ns | 0.1213 ns |  0.94 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (8, 0(...)te[]) [21] |  21.55 ns | 0.2247 ns | 0.1992 ns |  1.00 |    0.00 |
|     PR | (8, 0(...)te[]) [21] |  18.83 ns | 0.0913 ns | 0.0763 ns |  0.87 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (8, 1(...)te[]) [21] |  23.27 ns | 0.3067 ns | 0.2719 ns |  1.00 |    0.00 |
|     PR | (8, 1(...)te[]) [21] |  20.37 ns | 0.0895 ns | 0.0837 ns |  0.88 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (8, 2(...)te[]) [21] |  24.34 ns | 0.1790 ns | 0.1587 ns |  1.00 |    0.00 |
|     PR | (8, 2(...)te[]) [21] |  21.80 ns | 0.0488 ns | 0.0381 ns |  0.90 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (8, 3(...)te[]) [21] |  25.97 ns | 0.1587 ns | 0.1407 ns |  1.00 |    0.00 |
|     PR | (8, 3(...)te[]) [21] |  23.31 ns | 0.0357 ns | 0.0279 ns |  0.90 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (8, 4(...)te[]) [21] |  27.46 ns | 0.1939 ns | 0.1619 ns |  1.00 |    0.00 |
|     PR | (8, 4(...)te[]) [21] |  24.79 ns | 0.0462 ns | 0.0386 ns |  0.90 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (8, 5(...)te[]) [21] |  29.19 ns | 0.5000 ns | 0.4432 ns |  1.00 |    0.00 |
|     PR | (8, 5(...)te[]) [21] |  26.49 ns | 0.5548 ns | 0.5189 ns |  0.91 |    0.02 |
|        |                      |           |           |           |       |         |
| Master | (8, 6(...)te[]) [21] |  30.33 ns | 0.0411 ns | 0.0321 ns |  1.00 |    0.00 |
|     PR | (8, 6(...)te[]) [21] |  27.72 ns | 0.0359 ns | 0.0300 ns |  0.91 |    0.00 |
|        |                      |           |           |           |       |         |
| Master | (8, 7(...)te[]) [21] |  31.87 ns | 0.1597 ns | 0.1415 ns |  1.00 |    0.00 |
|     PR | (8, 7(...)te[]) [21] |  29.37 ns | 0.1656 ns | 0.1549 ns |  0.92 |    0.01 |
|        |                      |           |           |           |       |         |
| Master | (9, -(...)te[]) [22] |  33.41 ns | 0.0572 ns | 0.0447 ns |  1.00 |    0.00 |
|     PR | (9, -(...)te[]) [22] |  39.11 ns | 0.8235 ns | 2.1550 ns |  1.07 |    0.13 |

/cc: @ahsonkhan

ahsonkhan and others added 30 commits October 16, 2019 18:11
…ith CreateEscapingMaskSse2 (renamed from CreateEscapingMask)
Somehow this reference was left there, it's not needed anymore.
System\Text\Encodings\Web\Ssse3Helper.cs(101,13): error SA1115: (NETCORE_ENGINEERING_TELEMETRY=Build) The parameter should begin on the line after the previous parameter.
Copy link
Member Author

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for review

Vector128<sbyte> bitMaskLookup,
Vector128<sbyte> bitPosLookup,
Vector128<sbyte> nibbleMaskSByte,
Vector128<sbyte> nullMaskSByte)
Copy link
Member Author

@gfoidl gfoidl Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need this arguments in order to be able to hoist these vectors outside the loop (in TextEncoder) as the JIT won't do it here.

Codegen for DefaultJavaScriptEncoderBasicLatin.FindFirstCharacterToEncode is affected only in minimal way:

G_M60692_IG11:      ;; bbWeight=4
       C4C17A6F27           vmovdqu  xmm4, xmmword ptr [r15]
       C4C159636710         vpacksswb xmm4, xmm4, xmmword ptr [r15+16]
       C5F828E8             vmovaps  xmm5, xmm0
       C5F828F1             vmovaps  xmm6, xmm1
       C5F828FA             vmovaps  xmm7, xmm2
       C57828C3             vmovaps  xmm8, xmm3
       C5B172D404           vpsrld   xmm9, xmm4, 4
       C531DBCF             vpand    xmm9, xmm9, xmm7
       C5D9DBE7             vpand    xmm4, xmm4, xmm7
       C4E25100E4           vpshufb  xmm4, xmm5, xmm4
       C4C24900E9           vpshufb  xmm5, xmm6, xmm9
       C5D1DBE4             vpand    xmm4, xmm5, xmm4
       C4C15964E8           vpcmpgtb xmm5, xmm4, xmm8
       C5B964E4             vpcmpgtb xmm4, xmm8, xmm4
       C5D1EBE4             vpor     xmm4, xmm5, xmm4
       C5F9D7C4             vpmovmskb eax, xmm4
       85C0                 test     eax, eax
       0F851B010000         jne      G_M60692_IG17
       4983C720             add      r15, 32
       4D3BFD               cmp      r15, r13
       76A8                 jbe      SHORT G_M60692_IG11

Those vmovaps aren't needed at all*...
In the benchmarks this is just above noise. For larger inputs it's not measurable, for smaller one (just so that vectorization kicks in):

|         Method |       TestStringData |      Mean |     Error |    StdDev | Ratio |
|--------------- |--------------------- |----------:|----------:|----------:|------:|
|       PR_42073 | (16, (...)dcsv) [26] |  8.911 ns | 0.0505 ns | 0.0447 ns |  1.00 |
| PR_TextEncoder | (16, (...)dcsv) [26] |  9.085 ns | 0.0178 ns | 0.0158 ns |  1.02 |
|                |                      |           |           |           |       |
|       PR_42073 |   (9, -1, rddnegsne) | 10.245 ns | 0.0445 ns | 0.0372 ns |  1.00 |
| PR_TextEncoder |   (9, -1, rddnegsne) | 10.656 ns | 0.0152 ns | 0.0135 ns |  1.04 |

I think this is acceptable (especially if the JIT will be fixed so it doesn't emit these moves...I believe there is already on issue out, but I don't find it rigth now (or is that issue only for scalar-registers?)).
If not code for CreateEscapingMask has to be duplicated.


* if the register allocator would make these actually zero-latency, then there's still the instructions that need to be fetched, decoded, ... and it makes the loop larger.

// casted to signed byte.
int index = Sse2.MoveMask(sourceValue);

if (index == 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flipped the if, as this results in better code-gen for the vectorized path (SSSE3).

{
byte* p = (byte*)startingAddress;
if (DoesAsciiNeedEncoding(p[0])) goto Return;
if (DoesAsciiNeedEncoding(p[1])) goto Return1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These multiple returns break the dependency-chain on idx -- and it's inspired from SpanHelpers<T>.IndexOf.

Although I have no strong opinion whether to keep it or not.
With SSSE3 available this won't be executed anyway, but I kept it for other platforms where SSSE3 isn't available.

}
startingAddress = (sbyte*)ptr + idx;
}
while (utf8Text.Length - 16 >= idx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf8Text.Length - 16 is kept in a register 😉


private readonly int[] _asciiNeedsEscaping = new int[0x80];
private bool _isAsciiCacheInitialized;
private readonly bool[] _asciiNeedsEscaping = new bool[0x80];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to bool[] as I didn't see any benefit from using int[] with 1 or -1 (and forgot to ask in the review in that PR).

@GrabYourPitchforks
Copy link
Member

The pshufb trick seems reasonable. I didn't review the code in detail though. Are you looking for the other PR to be approved first before completing this one, or do you intend on committing both PRs at the same time?

@gfoidl
Copy link
Member Author

gfoidl commented Oct 28, 2019

Are you looking for the other PR to be approved first before completing this one

This way (as this PR builds upon the other PRs, so I think it's easier -- although this PR is complete so far).

Although it's a one-time initialization (per instance of TextEncoder), this can be done O(n) instead of O(2n) (where n = 128) and with calling WillEncode just once per ASCII-value instead of two times.
Won't be a huge improvement, but the first call to FindFirstCharacterToEncodeUtf8 will benefit a bit from this change.

Cf. https://github.com/dotnet/corefx/pull/42177/files#r339950018
I got trapped by corefx not zero-initializing locals...
@gfoidl gfoidl marked this pull request as ready for review November 3, 2019 01:04
@gfoidl
Copy link
Member Author

gfoidl commented Nov 3, 2019

@GrabYourPitchforks this PR is now ready for review.
I don't know whether github sent a notification for that or not. If one was sent, excuse this "push".

@GrabYourPitchforks
Copy link
Member

Ok, I'll recover a bit from your last PR and get to this next week. :)

@maryamariyan
Copy link

maryamariyan commented Nov 6, 2019

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your corefx repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/libraries <path to the patch created in step 1>

@danmoseley
Copy link
Member

Just curious, what's the test strategy for this and #42073 - how do we cover the various paths given our x64 machines are presumably homogenous?

Another refcount for https://github.com/dotnet/corefx/issues/36113 ?

@gfoidl
Copy link
Member Author

gfoidl commented Nov 7, 2019

Another refcount for #36113 ?

Ideally yes.

@ViktorHofer ViktorHofer added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 14, 2019
@gfoidl
Copy link
Member Author

gfoidl commented Nov 26, 2019

--> dotnet/runtime#284

@gfoidl gfoidl closed this Nov 26, 2019
@gfoidl gfoidl deleted the TextEncoder_ImproveEscapingCheck branch November 26, 2019 12:34
@karelz karelz added this to the 5.0 milestone Dec 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Text.Encodings.Web * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants