Skip to content

Conversation

@logiclrd
Copy link
Contributor

This PR makes some performance improvements based on profiling with #1419's changes, testing DataTable objects with many records:

  • Updated TypeExtensions.cs to short-circuit GetClosedGenericInterfaces if the type isn't an object type.
  • Updated AssertionScope.cs to support lazily resolving the context string, storing a Lazy promise instead for as long as needed. Added internal method ResolveContext() to be used by AssertionRuleEquivalencyStep.cs before recursing, so that the rule doesn't get picked up as the context. Updated EquivalencyValidator.cs to set a promise rather than actually computing the Context value on-the-spot every time a new scope is created.
  • Updated ContextDataItems.cs to replace DataItems inline rather than removing/re-adding list elements. This means that the list won't always have the most recently added item at the back, but that doesn't seem to be important. Didn't break any tests, anyway.
  • Updated CyclicReferenceDetector.cs to use a Dictionary keyed on the target object instead of a List for its object references, so that when the number of objects gets extremely large, performance doesn't suffer due to linear scans. Added an internal Object accessor to ObjectReference.cs.

IMPORTANT

  • The code complies with the Coding Guidelines for C#.
  • The changes are covered by a new or existing set of unit tests which follow the Arrange-Act-Assert syntax such as is used in this example.
  • If the contribution adds a feature or fixes a bug, please update the release notes, which are published on the website.
  • If the contribution changes the public API the changes needs to be included by running AcceptApiChanges.ps1/AcceptApiChanges.sh.
  • If the contribution affects the documentation, please include your changes in this pull request so the documentation will appear on the website.

@logiclrd
Copy link
Contributor Author

logiclrd commented Nov 17, 2020

Oops, I picked the changes onto an out-of-date develop. Sigh. Fixing.

@lg2de
Copy link
Contributor

lg2de commented Nov 17, 2020

Would be great to have some measurement on the improvements, e.g. using BenchmarkDotNet with example data before and after these changes.

@logiclrd
Copy link
Contributor Author

That's reasonable. These changes were originally a part of #1419, in which I ran into severe performance issues when trying to work with very large DataTable objects. The two main performance killers were that:

  1. Even though it was never meant to, the code is rediscovering the source line of the context every time the context changes, not only when it's about to report a failure. This meant that if the object graph had a million items in it, it was parsing the stack trace and then scanning the source code to find the expression at the line/column in the first non-FluentAssertions frame over and over, exactly the same result each time, a million times.

  2. The cyclic reference checking code uses a linear scan for finding matching objects. This causes an O(n^2) performance characteristic, which becomes noticeable with very large object graphs.

I can't remember exactly what DataTable size I did the calculations with, I believe it was 100,000 rows, but with the changes here I observed a 60-fold increase in performance and also confirmed a performance characteristic better than O(n^2).

@logiclrd
Copy link
Contributor Author

The branch behind #1419 includes a profiling fixture based on DataTable (which isn't supported properly without those changes). From that profiling fixture:

|         Method | RowCount |          Mean |       Error |      StdDev |        Gen 0 |        Gen 1 | Gen 2 |   Allocated |
|--------------- |--------- |--------------:|------------:|------------:|-------------:|-------------:|------:|------------:|
| BeEquivalentTo |       10 |      2.445 ms |   0.0322 ms |   0.0269 ms |     289.0625 |       3.9063 |     - |     1.76 MB |
| BeEquivalentTo |      100 |      7.733 ms |   0.1338 ms |   0.2412 ms |    1007.8125 |      23.4375 |     - |     6.08 MB |
| BeEquivalentTo |     1000 |     55.819 ms |   1.0056 ms |   0.8915 ms |    8111.1111 |     222.2222 |     - |       49 MB |
| BeEquivalentTo |    10000 |    539.920 ms |   5.4939 ms |   4.5877 ms |   79000.0000 |   11000.0000 |     - |   478.04 MB |
| BeEquivalentTo |   100000 |  5,586.120 ms | 105.3597 ms | 112.7337 ms |  793000.0000 |  114000.0000 |     - |  4767.03 MB |
| BeEquivalentTo |  1000000 | 54,524.512 ms | 447.0123 ms | 418.1356 ms | 7953000.0000 | 1137000.0000 |     - | 47793.88 MB |

I feel like this could be even faster, but I don't know the codebase well enough to do it. But, this is a heck of a lot better than literally half an hour to compare two large DataTable instances. :-P

@logiclrd
Copy link
Contributor Author

logiclrd commented Nov 17, 2020

I have added a performance benchmark that sets up large object graphs with a range of sizes and passes them to .BeEquivalentTo. I am running it now on the pre-improvements code. I'm not sure how far it'll get, though, because at a graph size of 2,583 objects, it's taking approximately 1.8 seconds per BeEquivalentTo call. This benchmark runs up to object graph size nearing a million objects. :-P

I'll let it run for a bit and then show what results it has managed to achieve, then I'll run it with the performance changes.

@logiclrd
Copy link
Contributor Author

Old code:

N = 16 ; Graph size: 2583 objects

Mean = 1.807 s, StdErr = 0.009 s (0.50%), N = 59, StdDev = 0.070 s
Min = 1.707 s, Q1 = 1.761 s, Median = 1.788 s, Q3 = 1.820 s, Max = 2.059 s
IQR = 0.059 s, LowerFence = 1.672 s, UpperFence = 1.909 s
ConfidenceInterval = [1.775 s; 1.838 s] (CI 99.9%), Margin = 0.031 s (1.74% of Mean)
Skewness = 1.6, Kurtosis = 5.58, MValue = 2

N = 18 ; Graph size: 6764 objects

Mean = 5.152 s, StdErr = 0.026 s (0.51%), N = 19, StdDev = 0.114 s
Min = 4.985 s, Q1 = 5.062 s, Median = 5.140 s, Q3 = 5.220 s, Max = 5.364 s
IQR = 0.158 s, LowerFence = 4.825 s, UpperFence = 5.457 s
ConfidenceInterval = [5.050 s; 5.254 s] (CI 99.9%), Margin = 0.102 s (1.99% of Mean)
Skewness = 0.42, Kurtosis = 1.97, MValue = 2

N = 20 ; Graph size: 17710 objects

WorkloadWarmup   1: 1 op, 16910015000.00 ns, 16.9100 s/op
WorkloadWarmup   2: 1 op, 16080022500.00 ns, 16.0800 s/op
WorkloadWarmup   3: 1 op, 16481630700.00 ns, 16.4816 s/op
WorkloadWarmup   4: 1 op, 16336174700.00 ns, 16.3362 s/op
WorkloadWarmup   5: 1 op, 15959460200.00 ns, 15.9595 s/op
...

At these scales, the O(n^2) characteristic of the cyclic reference check isn't terribly obvious because the results are utterly dominated by the repeated computation of the subject (with FluentAssertions does by locating & inspecting the caller's source code). But, when I crunch the numbers further down, it can be seen to be growing noticeably faster than the number of objects.

New code:

|         Method |  N |         Mean |      Error |     StdDev |        Gen 0 |       Gen 1 |     Gen 2 |   Allocated |
|--------------- |--- |-------------:|-----------:|-----------:|-------------:|------------:|----------:|------------:|
| BeEquivalentTo | 16 |     52.18 ms |   1.018 ms |   0.952 ms |    8000.0000 |   2000.0000 |  571.4286 |    48.59 MB |
| BeEquivalentTo | 18 |    139.02 ms |   2.726 ms |   3.348 ms |   21250.0000 |   5250.0000 | 1000.0000 |   128.38 MB |
| BeEquivalentTo | 20 |    370.64 ms |   6.778 ms |   6.008 ms |   57000.0000 |  15000.0000 | 1000.0000 |   339.28 MB |
| BeEquivalentTo | 24 |  2,603.46 ms |  50.658 ms |  62.212 ms |  394000.0000 | 100000.0000 | 1000.0000 |  2379.35 MB |
| BeEquivalentTo | 28 | 22,344.94 ms | 442.011 ms | 590.072 ms | 2757000.0000 | 691000.0000 | 1000.0000 | 16608.43 MB |

Test sizes:

N = 16 ; Graph size: 2583 objects
N = 18 ; Graph size: 6764 objects
N = 20 ; Graph size: 17710 objects
N = 24 ; Graph size: 121392 objects
N = 28 ; Graph size: 832039 objects

With the new code, the performance proportional to the number of objects is:

|  N | Objects | Time/Object |
|----|---------|-------------|
| 16 |    2583 |     20.2 us |
| 18 |    6764 |     20.5 us |
| 20 |   17710 |     20.9 us |
| 24 |  121392 |     21.4 us |
| 28 |  832039 |     26.9 us |

The average time/object for the old code, for the duration of testing I let it complete, was:

|  N | Objects | Time/Object | Ratio |
|----|---------|-------------|-------|
| 16 |    2583 |    699.6 us |  35x  |
| 18 |    6764 |    761.7 us |  37x  |
| 20 |   17710 |    923.4 us |  44x  |

@logiclrd
Copy link
Contributor Author

I am getting unit test failures on develop currently. The same tests are failing after my changes here, but hopefully the main line will see fixes onto which this branch can be rebased.

@dennisdoomen
Copy link
Member

I am getting unit test failures on develop currently. The same tests are failing after my changes here, but hopefully the main line will see fixes onto which this branch can be rebased.

I've just tried on AppVeyor and locally, but both work without problems. Are you sure? Which one are failing?

@logiclrd
Copy link
Contributor Author

I am getting unit test failures on develop currently. The same tests are failing after my changes here, but hopefully the main line will see fixes onto which this branch can be rebased.

I've just tried on AppVeyor and locally, but both work without problems. Are you sure? Which one are failing?

I'm not currently at the workstation where I've been doing my development. I just ran tests on another machine, and develop completed without failures. So, false alarm I guess. I'll check into it later when I'm at the other computer.

This "clean" run did show some failures inside my branch, though. Sigh. :-P

@logiclrd
Copy link
Contributor Author

There we go :-)

@dennisdoomen
Copy link
Member

I'm afraid to ask, but because of all the review comment and rework, it looks like we need to recombine the relevant changes into dedicated commits.

@logiclrd
Copy link
Contributor Author

logiclrd commented Dec 4, 2020

You'd like to see it squashed into a single commit? Or should separate along logical subdivisions, if I can find them?

@dennisdoomen
Copy link
Member

What I would do is to regroup the relevant changes into a new set of commits. See my blog post if you need some inspiration for that.

- Updated Benchmarks/NestedClass.cs to have two child references of type Nested instead of one. Updated Create to take a ref parameter for counting the objects, since with two recursive calls it is no longer as straightforward. Updated calls from Benchmarks/BeEquivalentTo.cs correspondingly.
- Added new benchmark Benchmarks/LargeObjectGraph.cs which sets up object graphs of various sizes and benchmarks BeEquivalentTo between them.
- Updated Program.cs to run the new benchmark.
…t to avoid linear lookups.

Updated the Equals method in ObjectReference.cs be symmetrical. Added corresponding unit tests.
Updated FluentAssertions.Specs.csproj to be strong-named. Updated its reference to Chill to version 4.1.0 and updated AssemblyA.csproj and AssemblyB.csproj to strong name their output as well. Deleted TypeExtensions.cs rfom FluentAssertions.Specs, since it can now use the version internal to FluentAssertions.
Updated FluentAssertions.csproj to generate an InternalsVisibleTo attribute granting access to its internals from FluentAssertions.Specs.
Reran AcceptApiChanges.ps1, since the API surface area report captures the InternalsVisibleTo attribute.
@logiclrd
Copy link
Contributor Author

logiclrd commented Dec 5, 2020

How does this look? (b7a1fe7 as compared to 954d109)

@dennisdoomen
Copy link
Member

How does this look? (b7a1fe7 as compared to 954d109)

Looks nice. Only the signing of the specs project could have been separate, but that's me nitpicking.

@dennisdoomen dennisdoomen requested a review from jnyrup December 5, 2020 19:20
@logiclrd
Copy link
Contributor Author

logiclrd commented Dec 5, 2020

Looks nice. Only the signing of the specs project could have been separate, but that's me nitpicking.

I saw it as being sort intrinsically a "part of" the other changes -- a change with no motivation of its own, driven by the need to unit test an internal detail. Each of the commits achieves an outcome, and that would be a commit that doesn't actually achieve an outcome of its own, but merely reworks things to allow an actual change.

I could split it though if you think it would make a difference :-)

@dennisdoomen
Copy link
Member

Nah, don't bother. Let's wait for the review from @jnyrup

@logiclrd logiclrd force-pushed the JDG_Performance branch 2 times, most recently from 017232c to b7a1fe7 Compare December 8, 2020 18:32
…ck from the end instead of from the start, removing the need to cache context values immediately.

- Updated DetermineCallerIdentity.cs to search backward from the end. Added helper method IsCompilerServices to ignore async machinery.
- Added a mechanism to allow some stack frames to be skipped, so that constructs that re-entrantly invoke FluentAssertions and which should find the nested call skip over the outer call.
  * Added method OverrideStackSearchUsingCurrentScope() to CallerIdentifier.cs, returning an IDisposable that stores a count of stack frames to skip that remains in effect until it is disposed.
  * Updated call sites in SelfReferencingCollectionAssertions.cs, AsyncFunctionAssertions.cs and DelegateAssertions.cs to use this mechanism.
…until it is actually needed.

- Retyped Context to a Lazy<string> in AssertionScope.cs, and updated assignments to Context in EquivalencyValidator.cs and ExceptionAssertions.cs correspondingly.
…ypes that could not possibly satisfy the checks.

- Updated GenericDictionaryEquivalencyStep.cs and GenericEnumerableEquivalencyStep.cs to fast-path calls where the supplied type has a TypeCode other than TypeCode.Object.
… when the supplied key is already in the list.
@jnyrup jnyrup merged commit a0262e3 into fluentassertions:develop Dec 9, 2020
jnyrup added a commit to jnyrup/fluentassertions that referenced this pull request Jan 9, 2021
`InternalObjectExtensions` and `InternalTypeExtensions` were only public for the sake of unit testing.
Since fluentassertions#1434 FluentAssertions.Specs is also strong named, so these classes can now be truly internal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants