-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Description
Noticed while working #4495 . When I decided to use sized if constexpr dispatch, instead of using the same version for all element sizes, I observed significant perf degradation for small element sizes. A part of it is due to not inlining the dispatcher.
The benchmark is built with /Ob1. Looks like it is implied due to CMake RelWithDebugInfo configuration, as opposed to Release.
What are our takeaways?
I see the following options:
- Mark vector algorithms dispatchers
inline, consider making other STL functionsinline.- This helps other projects with
RelWithDebugInfoto inline STL, though it would obfuscate the debugger
- This helps other projects with
- Override the option in the benchmark
- Make the benchmark
Releaseby default, instead ofRelWithDebugInfo- I wouldn't like that.
RelWithDebugInfois convenient for profiling
- I wouldn't like that.
- Accept the cost of dispatching as a penalty for vector algorithms that use dispatching
- I don't think this is fair
- Use specializations instead of
if constexpr- Throughput?
- Manually inline the dispatch like for
__std_reverse_copy_trivially_copyable...- Copypasta
Reactions are currently unavailable