Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Port System.Net.WebUtility missing members#13875

Merged
safern merged 4 commits intodotnet:masterfrom
safern:WebUtility
Nov 22, 2016
Merged

Port System.Net.WebUtility missing members#13875
safern merged 4 commits intodotnet:masterfrom
safern:WebUtility

Conversation

@safern
Copy link
Member

@safern safern commented Nov 22, 2016

Fixes #12142

cc @danmosemsft @AlexGhiondea @stephentoub @geoffkizer


public static void HtmlEncode(string value, TextWriter output)
{
// Since we already have a private method that decodes the value using a StringBuilder we don't want to have duplicate code in order to have a better mantainance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I wouldn't bother with the comments, since they only make sense if you know there was previously an explicit implementation in terms of TextWriter. Up to you.

public static void HtmlEncode(string value, TextWriter output)
{
// Since we already have a private method that decodes the value using a StringBuilder we don't want to have duplicate code in order to have a better mantainance.
output.Write(HtmlEncode(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sufficient to bring the functionality to .NET Core, but it can be improved... In desktop, the encoded chars are written directly to the TextWriter, whereas with this, a (potentially large) string is being allocated and then that string is written to the TextWriter. It'd be nice to avoid the string allocation.

In desktop, the shared implementation is in HtmlEncode(string, TextWriter) and HtmlEncode(string) allocates a StringWriter to pass to HtmlEncode(string, TextWriter). However, in .NET Core, since the other overload wasn't present, HtmlEncode(string) was changed to use StringBuilder (acquired from a StringBuilderCache) instead of using StringWriter.

This could be reworked to share an implementation, without having to switch back to the desktop implementation.

Something like...

public static string HtmlEncode(string value)
{
    ...

    StringBuilder sb = StringBuilderCache.Acquire(value.Length);
    HtmlEncode(value, index, sb, (w, s) => w.Append(s), (w, c) => w.Append(c));
    return StringBuilderCache.GetStringAndRelease(sb);
}

public static void HtmlEncode(string value, TextWriter output)
{
    ...

    HtmlEncode(value, index, output, (w, s) => w.Write(s), (w, c) => w.Write(c));
}

private static unsafe void HtmlEncode<TWriter>(string value, int index, TWriter writer, Action<TWriter, string> writeString, Action<TWriter, char> writeChar)
{
    ...

    // Use writeString(writer, "whatever") or writeChar(writer, 'w')
}

Of course, this could be done subsequently in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works. @safern and I discussed this before the PR went up.

The downside seems to be that it would make the existing code slower and would introduce a couple of allocations for the 2 lambdas.

Copy link
Contributor

@justinvp justinvp Nov 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would introduce a couple of allocations for the 2 lambdas.

The compiler will lazily allocate and cache/reuse the lambda delegate instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better than the approach we discussed earlier, right? @AlexGhiondea

Just to give you a little bit of context @justinvp, we were thinking of creating an internal abstract class WriterBaseClass (this is just a random name), and making two other internal classes that would inherit from this one, those to classes would be:

SBWrapperand TextWriterWrapperwhere them would override virtual write(char)and write(string) methods from WriterBaseClassthat would write either to the StringBuilderor the TextWriter.

SBWrapper would override ToString and call StringBuilderCache.GetStringAndRelease(sb).

But I think your approach works better cause this one would have extra allocations as well and would have to make virtual calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I noticed the same problem being solved using delegates in a recent PR:

public override string ToString()
{
StringBuilder sb = new StringBuilder();
ToString("", sb, (obj, str) => ((StringBuilder)obj).Append(str));
return sb.ToString();
}
internal void ToWriter(StreamWriter writer)
{
ToString("", writer, (obj, str) => ((StreamWriter)obj).Write(str));
}
private void ToString(string indent, object obj, Action<object, string> write)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@safern I think they are just as good. Splitting into base classes makes it easier if you have more than 2 methods you would need to wrap. Otherwise you end up with a lot of lambdas you need to write :).

There should not be a lot of performance difference between calling via a delegate and calling via callvirt.

And if we don't think the original code is used on the hot path, then this becomes a question of what is the nicest way to write the code and @justinvp came up with an elegant one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see! Thanks for explaining :) @AlexGhiondea
Then I'll go for @justinvp solution!

Will update this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinvp Your solution seems like it would be slower for the main path, since an extra function call would be added for every char/string appended. Since it's more common to call the overload that returns a string rather than one that writes it to a TextWriter, it seems like what's currently checked in is the best solution.

@safern safern self-assigned this Nov 22, 2016
@safern safern added this to the 1.2.0 milestone Nov 22, 2016
@jkotas
Copy link
Member

jkotas commented Nov 22, 2016

the nicest way to write the code

The nicest (=most compact) way to write the code (using what's available) is what @safern has done.

There should not be a lot of performance difference between calling via a delegate and calling via callvirt.

If you want to make this fast, you do not want to do either of these. Both delegate call or virtual call per character sucks performance wise.

If you want to make this fast, I believe that the best way is to introduce a new helper class to building strings that gets the underlying storage from ArrayPool. This may actually improve the string case as well in some cases because the cached StringBuilder is only up to 256 characters, and there is a good chance that encoded URLs length will be beyond this limit.

We may want to eventually get rid StringBuilderCache completely, and just switch everything to PooledStringBuilder. It has number of better characteristics compared to StringBuilderCache: works for any lengths; the buffer cache is central - there is no one StringBuilder cached per thread per each .dll.

Something like:

private struct PooledStringBuilder
{
    private char _buffer;
    private int _length = 0;

    PooledBuilder(int estimatedSize)
    {
        _buffer = ArrayPool<char>.Shared.Rent(estimatedSize);
    }

    public void Relese()
    {
        if (_buffer != null)
        {
            ArrayPool<char>.Shared.Return(_buffer);
            _buffer = null;
        }
    }

    public byte[] UnderlyingArray => _buffer;
    public int Length => _length;

    public string GetString()
    {
        return new string(_buffer, 0, _length);
    }

    public string GetStringAndRelease(sb)
    {
        string ret = GetString();
        Relese();
        return ret;
    }

    public void Append(char c)
    {
        Ensure(1);
        _buffer[_length++] = c;
    }

    public void Append(string s)
    {
        Ensure(s.Length);
        for (int i = 0; i < s.Length; i++)
             _buffer[_length++] = s[i];
    }

    private void Ensure(int extraSpace)
    {
        if ((uint)(_length + extraSpace) > (uint)_buffer.Length)
            Grow(extraSpace);
    }

    private void Grow(int extraSpace)
    {
        byte[] oldBuffer = _buffer;
        int newSize = Math.Max(2 * oldBuffer.Length, _length + extraSpace);
        byte[] newBuffer = ArrayPool<char>.Shared.Rent(newSize);
        Buffer.BlockCopy(oldBuffer, 0, newBuffer, 0, _length);
        _buffer = newBuffer;
        ArrayPool<char>.Shared.Return(oldBuffer);
    }
}

...
    PooledStringBuilder sb = new PooledStringBuilder(2 * value.Length);
    HtmlEncode(value, index, sb);
    return sb.GetStringAndRelease(sb);
...

...
    PooledStringBuilder sb = new PooledStringBuilder(2 * value.Length);
    HtmlEncode(value, index, sb);
    output.Write(sb.UnderlyingArray, 0, sb.Length);
    sb.Relese();
...

@justinvp
Copy link
Contributor

If you want to make this fast, I believe that the best way is to introduce a new helper class to building strings that gets the underlying storage from ArrayPool.

Nice!

@danmoseley
Copy link
Member

I'm guessing the STringBuilder overload is more important than the TextWriter overload? My preference is to submit the way @safern already did, which does not affect the existing code. Open an issue for someone to optimize it later probably along the lines of Jan's recommendation. It is reasonable for now to check it in-as is.

@jkotas
Copy link
Member

jkotas commented Nov 22, 2016

submit the way @safern already did

Agree.

{
public partial class WebUtilityTests
{
[Theory]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indentation looks off here. Looks like there is a tab that should be replaced with spaces.

@safern safern merged commit 3cb0550 into dotnet:master Nov 22, 2016
@safern safern deleted the WebUtility branch November 22, 2016 19:46
@safern
Copy link
Member Author

safern commented Nov 22, 2016

I merged with the first approach as @danmosemsft said, and will go ahead and open a new issue pointing to this PR in order to optimize this suggesting to use what @jkotas suggested.

Thanks for the reviews guys :)

@AlexGhiondea @jkotas @justinvp @danmosemsft

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants