Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Almost double the performance of parsing Guids#3965

Closed
hughbe wants to merge 1 commit intodotnet:masterfrom
hughbe:guid-performance
Closed

Almost double the performance of parsing Guids#3965
hughbe wants to merge 1 commit intodotnet:masterfrom
hughbe:guid-performance

Conversation

@hughbe
Copy link

@hughbe hughbe commented Mar 29, 2016

Remove substring and other allocations
Bring in a utility helper to avoid calling native code

Average improvement of -40% + for parsing each type of Guid.

Benchmark Results:

  • Time is measured in milliseconds
  • Formula for counting the change in time: (newTime - oldTime) / oldTime * 100
    benchmarks

Benchmark Code:

Click Here

public static void TimeAction(string prefix, Action action)
{
    var sw = new Stopwatch();
    for (int iter = 0; iter < 5; iter++)
    {
        int gen0 = GC.CollectionCount(0);
        sw.Restart();
        for (int i = 0; i < 10000000; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine($"{prefix}Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");
    }
}

static void Main(string[] args)
{
    Console.WriteLine("**** Measure \"00000000000000000000000000000000\" ****");
    TimeAction("Old: ", () => new Guid("a8a110d5fc4943c5bf46802db8f843ff"));
    TimeAction("New: ", () => new GuidNew("a8a110d5fc4943c5bf46802db8f843ff"));

    Console.WriteLine("**** Measure \"00000000-0000-0000-0000-000000000000\" ****");
    TimeAction("Old: ", () => new Guid("a8a110d5-fc49-43c5-bf46-802db8f843ff"));
    TimeAction("New: ", () => new GuidNew("a8a110d5-fc49-43c5-bf46-802db8f843ff"));

    Console.WriteLine("**** Measure (00000000-0000-0000-0000-000000000000) ****");
    TimeAction("Old: ", () => new Guid("(a8a110d5-fc49-43c5-bf46-802db8f843ff)"));
    TimeAction("New: ", () => new GuidNew("(a8a110d5-fc49-43c5-bf46-802db8f843ff)"));

    Console.WriteLine("**** Measure {00000000-0000-0000-0000-000000000000} ****");
    TimeAction("Old: ", () => new Guid("{a8a110d5-fc49-43c5-bf46-802db8f843ff}"));
    TimeAction("New: ", () => new GuidNew("{a8a110d5-fc49-43c5-bf46-802db8f843ff}"));

    Console.WriteLine("**** Measure {0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}} ****");
    TimeAction("Old: ", () => new Guid("{0xa8a110d5,0xfc49,0x43c5,{0xbf,0x46,0x80,0x2d,0xb8,0xf8,0x43,0xff}}"));
    TimeAction("New: ", () => new GuidNew("{0xa8a110d5,0xfc49,0x43c5,{0xbf,0x46,0x80,0x2d,0xb8,0xf8,0x43,0xff}}"));

    Console.ReadLine();
}

@hughbe hughbe changed the title Considerably improve the performance of parsing Guids Almost double the performance of parsing Guids Mar 29, 2016
else {
if ((flags & GuidStyles.RequireDashes) != 0) {
// dashes are required
for (int i = end; i > start; i--)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be
for (int i = end - 1; i > start; i--)
instead, since guidString[end] is guaranteed to be '\0'.
However, I think it would be more elegant to write these two loops as

int start = 0;
while (start < g.Length && char.IsWhitespace(guidString[start]))
  start++;

int end = g.Length;
while (end > start && char.IsWhitespace(guidString[end - 1]))
  end--;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I pushed a commit adopting your more elegant code

@mattwarren
Copy link

@hughbe Just to back up your findings, here's a benchmark of

new Guid(0x12345678, 0xb, 0xc, 0, 1, 2, 3, 4, 5, 6, 7)
v
new Guid("{12345678-000b-000c-0001-020304050607}")

image

For reference, here is the full benchmark code (note that it's using BenchmarkDotNet)

length = newLength;
fixed (char* newGuidString = new string(chars, 0, newLength))
{
return newGuidString;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is unsafe, after the fixed block exits, there's nothing preventing the GC from relocating the managed object backing newGuidString and preventing the pointer from dangling.

@ellismg
Copy link

ellismg commented Apr 7, 2016

First, it's great to see the improvements here, so thanks a lot for starting the conversation. Looking over the diff, I suspect some of the wins come from being smarter about the sorts of allocations we do (I see a bunch of places where calls to .Substring have been removed).

I don't like how much unsafe code we end up having to add to the system. Seeing char* everywhere makes me nervous. Can we get a majority of the wins without having to use char* across most of these methods (e.g. could we continue to use the parse methods that take a String but provide offsets that we apply after we do the fixed in there?).

If we do need to introduce more places we use char*, It would also be ideal if we could scope unsafe usage to either small methods or limited blocks in larger methods, just to reduce the area of code we need to audit where we risk breaking type safety. I think this will help with the long term maintenance of the code.

@hughbe
Copy link
Author

hughbe commented Apr 7, 2016

Thanks for the review. I recall that plenty of those char* were used because I think that indexing into a char* is quicker than indexing into a string. Could you let me know there is no change or in fact it's better to index into a string?
I'll take a look at places to remove unsafe code

}
}
catch(IndexOutOfRangeException ex) {
catch (IndexOutOfRangeException ex)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellismg Is catching this IndexOutOfRangeException and the ArgumentException below necessary, or actually hiding a bug that has the potential to happen?

@hughbe
Copy link
Author

hughbe commented Apr 7, 2016

I've updated the PR just squashing the old commits (so thats easier to review) and shortening the diff for you.
I've added benchmarks too - looks like we're not suffering (maybe we're even benefitting) from removing unsafe code

@hughbe
Copy link
Author

hughbe commented Apr 8, 2016

Just updated the PR to not allocate anything if a hex string contains whitespace - this more than doubled the perf of creating a hex guid

{
chArr[newLength++] = curChar;
}
char c = guidString[start];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? It looks like c is never used.

Use a bit of (safe) unsafe code to speed up code and remove allocations
Bring in a utility helper to avoid calling native code

Average improvement of 40% + for parsing each type of Guid
@hughbe
Copy link
Author

hughbe commented Sep 26, 2016

Closing for now, fails corefx tests - will send in an updated PR soon

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants