Port System.Net.WebUtility missing members#13875
Conversation
|
|
||
| public static void HtmlEncode(string value, TextWriter output) | ||
| { | ||
| // Since we already have a private method that decodes the value using a StringBuilder we don't want to have duplicate code in order to have a better mantainance. |
There was a problem hiding this comment.
Nit, I wouldn't bother with the comments, since they only make sense if you know there was previously an explicit implementation in terms of TextWriter. Up to you.
| public static void HtmlEncode(string value, TextWriter output) | ||
| { | ||
| // Since we already have a private method that decodes the value using a StringBuilder we don't want to have duplicate code in order to have a better mantainance. | ||
| output.Write(HtmlEncode(value)); |
There was a problem hiding this comment.
This is sufficient to bring the functionality to .NET Core, but it can be improved... In desktop, the encoded chars are written directly to the TextWriter, whereas with this, a (potentially large) string is being allocated and then that string is written to the TextWriter. It'd be nice to avoid the string allocation.
In desktop, the shared implementation is in HtmlEncode(string, TextWriter) and HtmlEncode(string) allocates a StringWriter to pass to HtmlEncode(string, TextWriter). However, in .NET Core, since the other overload wasn't present, HtmlEncode(string) was changed to use StringBuilder (acquired from a StringBuilderCache) instead of using StringWriter.
This could be reworked to share an implementation, without having to switch back to the desktop implementation.
Something like...
public static string HtmlEncode(string value)
{
...
StringBuilder sb = StringBuilderCache.Acquire(value.Length);
HtmlEncode(value, index, sb, (w, s) => w.Append(s), (w, c) => w.Append(c));
return StringBuilderCache.GetStringAndRelease(sb);
}
public static void HtmlEncode(string value, TextWriter output)
{
...
HtmlEncode(value, index, output, (w, s) => w.Write(s), (w, c) => w.Write(c));
}
private static unsafe void HtmlEncode<TWriter>(string value, int index, TWriter writer, Action<TWriter, string> writeString, Action<TWriter, char> writeChar)
{
...
// Use writeString(writer, "whatever") or writeChar(writer, 'w')
}Of course, this could be done subsequently in a separate PR.
There was a problem hiding this comment.
This approach works. @safern and I discussed this before the PR went up.
The downside seems to be that it would make the existing code slower and would introduce a couple of allocations for the 2 lambdas.
There was a problem hiding this comment.
would introduce a couple of allocations for the 2 lambdas.
The compiler will lazily allocate and cache/reuse the lambda delegate instances.
There was a problem hiding this comment.
This would be better than the approach we discussed earlier, right? @AlexGhiondea
Just to give you a little bit of context @justinvp, we were thinking of creating an internal abstract class WriterBaseClass (this is just a random name), and making two other internal classes that would inherit from this one, those to classes would be:
SBWrapperand TextWriterWrapperwhere them would override virtual write(char)and write(string) methods from WriterBaseClassthat would write either to the StringBuilderor the TextWriter.
SBWrapper would override ToString and call StringBuilderCache.GetStringAndRelease(sb).
But I think your approach works better cause this one would have extra allocations as well and would have to make virtual calls.
There was a problem hiding this comment.
By the way, I noticed the same problem being solved using delegates in a recent PR:
corefx/src/System.Runtime.Extensions/src/System/Security/SecurityElement.cs
Lines 546 to 560 in 768e18c
There was a problem hiding this comment.
@safern I think they are just as good. Splitting into base classes makes it easier if you have more than 2 methods you would need to wrap. Otherwise you end up with a lot of lambdas you need to write :).
There should not be a lot of performance difference between calling via a delegate and calling via callvirt.
And if we don't think the original code is used on the hot path, then this becomes a question of what is the nicest way to write the code and @justinvp came up with an elegant one.
There was a problem hiding this comment.
Ok I see! Thanks for explaining :) @AlexGhiondea
Then I'll go for @justinvp solution!
Will update this PR.
There was a problem hiding this comment.
@justinvp Your solution seems like it would be slower for the main path, since an extra function call would be added for every char/string appended. Since it's more common to call the overload that returns a string rather than one that writes it to a TextWriter, it seems like what's currently checked in is the best solution.
The nicest (=most compact) way to write the code (using what's available) is what @safern has done.
If you want to make this fast, you do not want to do either of these. Both delegate call or virtual call per character sucks performance wise. If you want to make this fast, I believe that the best way is to introduce a new helper class to building strings that gets the underlying storage from ArrayPool. This may actually improve the string case as well in some cases because the cached StringBuilder is only up to 256 characters, and there is a good chance that encoded URLs length will be beyond this limit. We may want to eventually get rid StringBuilderCache completely, and just switch everything to PooledStringBuilder. It has number of better characteristics compared to StringBuilderCache: works for any lengths; the buffer cache is central - there is no one StringBuilder cached per thread per each .dll. Something like: |
Nice! |
|
I'm guessing the STringBuilder overload is more important than the TextWriter overload? My preference is to submit the way @safern already did, which does not affect the existing code. Open an issue for someone to optimize it later probably along the lines of Jan's recommendation. It is reasonable for now to check it in-as is. |
Agree. |
| { | ||
| public partial class WebUtilityTests | ||
| { | ||
| [Theory] |
There was a problem hiding this comment.
Nit: Indentation looks off here. Looks like there is a tab that should be replaced with spaces.
|
I merged with the first approach as @danmosemsft said, and will go ahead and open a new issue pointing to this PR in order to optimize this suggesting to use what @jkotas suggested. Thanks for the reviews guys :) @AlexGhiondea @jkotas @justinvp @danmosemsft |
* Port System.Net.WebUtility missing members Commit migrated from dotnet/corefx@3cb0550
Fixes #12142
cc @danmosemsft @AlexGhiondea @stephentoub @geoffkizer