Add benchmarks that measure KeyPath read and write performance.#60383
Add benchmarks that measure KeyPath read and write performance.#60383BradLarson merged 13 commits intoswiftlang:mainfrom
Conversation
|
@swift-ci smoke test |
|
@swift-ci please benchmark |
|
@swift-ci please benchmark |
|
@swift-ci please benchmark |
1 similar comment
|
@swift-ci please benchmark |
… them above 20 us.
|
@swift-ci please benchmark |
|
@swift-ci please smoke test |
|
@eeckstein - You provided great feedback on the PR that was the predecessor to this one. Would you still be the right person to review these keypath benchmark additions? |
| let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e | ||
|
|
||
| // Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess(). | ||
| struct FixedSizeArray100<Element>: Sequence, IteratorProtocol { |
There was a problem hiding this comment.
What's the purpose of 100 elements if only a small subset is used in the benchmarks?
There was a problem hiding this comment.
Hello, thanks for your feedback! We're wanting to introduce a suite of benchmarks that highlight some of the performance issues we've been experience with KeyPaths. What we've seen so far can be largely placed into three categories.
The first category involves a collection of KeyPaths not being inlined if they're spread too far apart in memory. The magic number, after some experimentation, happened to be 14. As soon as you have a 15th KeyPath, the inlining in release mode does not occur, and performance falls off a cliff -- being some 100 times worse or more. This is part of the reason why we chose a struct with 100 elements instead of just 10. (I do agree that asking the optimizer to inline KeyPaths pointing to an arbitrarily-wide range is wishful thinking, but perhaps the value of 14 is smaller than some might expect.) I've added a comment right above run_testKeyPathsInlined() which explains this in a bit more detail.
The second category involves nested structs or classes. Any sort of KeyPath projection operation is O(N) in the depth of the object. The benchmark KeyPathNestedStructs uses this.
The third category involves accessing a constant-sized value within structs of various sizes. Namely, performance decreases as more and more variables are added to the struct. I've added a benchmark called KeyPathsSmallStruct to showcase the difference in performance between a struct with 10 elements and one with 100 (KeyPathReadPerformance).
There was a problem hiding this comment.
You can use @inline(never) to ensure you are testing the non-inlined case.
There was a problem hiding this comment.
Do you still need the 100 elements when using @inline(never)
There was a problem hiding this comment.
I'd like to keep it if possible: I've added the @inline(never) directive to both versions ofgetKeypathToElement() -- for the 10-element struct and also for the 100-element struct -- and compared the times for each iteration. In both Debug and Release mode (i.e., after compiling via swift build and swift build --configuration release), the 100- element struct needs about 1.5X more time per iteration when compared with the 10-element struct.
| tags: [.validation, .api], | ||
| setUpFunction: setupSingleton | ||
| ), | ||
| ] |
There was a problem hiding this comment.
It would also be very useful to add benchmarks for other keypath variants, like getters+setters.
There was a problem hiding this comment.
I'm wondering what is meant by this: does this mean adding benchmarks featuring ReferenceWritableKeyPaths, PartialKeyPaths, and others?
Or, benchmarks for KeyPath syntax like this?
get { ... #keyPath(...) }
There was a problem hiding this comment.
By the way, this benchmark is already getting somewhat long (over 1300 lines). Would it be a good candidate for multi-source?
There was a problem hiding this comment.
Example for a gettable property:
struct S {
var x: Int { return 27 }
}
let kp = \S.x
Would it be a good candidate for multi-source?
No, keeping it in a single file is fine.
There was a problem hiding this comment.
There are several kind of keypath components:
- stored property (what you are testing)
- gettable property (my previous example)
- settable property (the same, just for writing)
- tuple element
- optional chain
- optional force
- optional wrap
Those are handled differently in the runtime. Therefore it would be nice if the benchmarks would cover all of the keypath component types + combinations of them, e.g.
struct T {
var y: Int
}
struct S {
var x: (Int, T?) { return (27, T(y: 42)) }
}
let kp = \S.x.1?.y
There was a problem hiding this comment.
Ah yes, I see that they're handled differently, especially inside _projectReadOnly<CurValue, NewValue, LeafValue>(...) in KeyPath.swift.
I'll create a few new benchmarks and verify I get all the cases listed above (and that all the cases are being hit inside that function). These'll be coming soon.
There was a problem hiding this comment.
I've added benchmarks that'll test the performance of each type of KeyPathComponent. There is now a MarkDown table near the top of the file that indicates which benchmark stresses which type of KeyPathComponent.
I did find another error in the CI output indicating that the setup time for KeyPathWritePerformance was too high; I've moved its KeyPaths, and the ones for KeyPathReadPerformance, into the singleton.
There was a problem hiding this comment.
I've updated the main message associated with the PR to discuss the benchmarks that have been added since then.
What is the convention for resolving conversations (via the Resolve conversation button) during workflows like this?
There was a problem hiding this comment.
As you like. Just let me somehow know when the PR is ready for another round of review.
There was a problem hiding this comment.
Sound good!
…. Removed benchmarks dealing with an inlining issue.
…p and documentation.
|
@swift-ci please benchmark |
…enamed GetSet to Getset.
|
@swift-ci please benchmark |
eeckstein
left a comment
There was a problem hiding this comment.
Looks already pretty good! I have a few more comments.
| let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e | ||
|
|
||
| // Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess(). | ||
| struct FixedSizeArray100<Element>: Sequence, IteratorProtocol { |
There was a problem hiding this comment.
Do you still need the 100 elements when using @inline(never)
| var kp50: WritableKeyPath<FixedSizeArray100<ElementType>, ElementType> | ||
|
|
||
| static let shared = FixedSizeArrayHolder() | ||
| init() { |
There was a problem hiding this comment.
Why are the keypaths stored as class members?
If I understand correctly, for the purpose of this benchmark it would be sufficient to create the relevant keypaths in the benchmark functions - wrapped into identity()
There was a problem hiding this comment.
I've addressed the 100-element issue in another post -- it boils down to 100 elements needing 1.5X more time per iteration that 10, at least as far as reads are concerned.
I've moved these KeyPaths into the class due to a single CI run where it was complaining about a high setup time in KeyPathWritePerformance. It was this one here:
https://ci.swift.org/job/swift-PR-macos-perf/260/console
The error in question was: ERROR runtime: 'KeyPathWritePerformance' has setup overhead of 68 μs (8.3%).
So the idea here was to move everything but the kitchen sink into the class.
| tags: [.validation, .api], | ||
| setUpFunction: setupSingleton | ||
| ), | ||
| ] |
There was a problem hiding this comment.
As you like. Just let me somehow know when the PR is ready for another round of review.
| let elementCount = FixedSizeArrayHolder.shared.mainArrayForNestedStructs.count | ||
| for _ in 1 ... iterationMultipier * n { | ||
| let element = FixedSizeArrayHolder.shared.mainArrayForNestedStructs[index] | ||
| sum += element[keyPath: identity(appendedKeyPath)] |
There was a problem hiding this comment.
You need to add identity outside of the loop, because it adds considerable runtime overhead:
let appendedKeyPath = identity(singleHopKeyPath0.appending(path: singleHopKeyPath1)
.appending(path: singleHopKeyPath2).appending(path: singleHopKeyPath3)
.appending(path: singleHopKeyPath4))
Please also correct this in all other functions.
|
@swift-ci please test |
|
@swift-ci please benchmark |
|
It looks like there's a CI failure caused by this test: |
… long setup overhead errors.
|
@swift-ci please benchmark |
…t() to try to reduce setup time errors.
|
@swift-ci please benchmark |
|
@swift-ci please smoke test |
|
@swift-ci please smoke test Windows platform |
1 similar comment
|
@swift-ci please smoke test Windows platform |
…tlang#60383) * Add benchmarks that measure KeyPath read and write performance. * Added setUpFunctions. Revised number of iterations per benchmark. * Include n as a factor in the number of iterations. * Increased number of iterations for KeyPathDirectAccess by a factor of 25. * One last tweak to the number of iterations on testDirectAccess to get them above 20 us. * Made revisions based on feedback. Added three new benchmarks. * Added benchmarks to exhaustively benchmark all KeyPathComponent types. Removed benchmarks dealing with an inlining issue. * Wrapped additional keypaths with identity() where needed. More cleanup and documentation. * Moved KeyPaths for KeyPathRead and Write into FixedSizeArrayHolder. Renamed GetSet to Getset. * Added inline(never) to both versions of getKeypathToElement(). * Moved identity() wraps so that they're called once per variable per benchmark. * Moving destinationKeyPaths into FixedSizeArrayHolder to try to reduce long setup overhead errors. * Additional moving of the identity() wrapping into the singleton's init() to try to reduce setup time errors.
This is the first in a series of pull requests aimed at improving the speed of read and write operations via KeyPaths.
Intended to supersede #36451.
Although this PR doesn't change anything related to KeyPath performance or functionality, it does feature four benchmarks that can be used to establish a baseline in terms of timing.
The benchmark called
run_testDirectAccess()establishes a baseline for how quickly read and write operations can be performed with direct use of dot notation. (That said,nwould need to be equal inside the benchmark used for comparison.)The benchmark called
run_KeyPathNestedStructs()measures the performance of read operations via KeyPaths that traverse through structs only; it is anticipated that the memory layout of such structs is predictable. One potential optimization in this case would be to jump to the finalValuespecified by the KeyPath without having to perform a full projection from theRootto theValue. (Recalculation of the array offset would also need to be performed during any KeyPathappend()operations, of course).The benchmarks called
run_testKeyPathReadPerformance()andrun_testKeyPathWritePerformance()have 25 keypath read and write operations, respectively, per iteration. Other elements are accessed via dot notation. This is done so that the the performance of reads can be tracked separately from writes. It also enables comparison of relative performance between read and write operations.The benchmark called
run_testKeyPathSmallStruct()is the same asrun_testKeyPathReadPerformance(), except the struct used has 10 elements in it instead of 100. It was found that reading a value of a given size from larger struct takes longer than reading the value from a smaller struct.The remainder of the benchmarks explicitly benchmark the types of
KeyPathComponentsnot featured in any of the other benchmarks. The reason this is done is that the different components get handled slightly differently at runtime. SeeKeyPath.swift.