Conversation
dotnet/roslyn#37596 includes a breaking change in the 5.0 branch. This function is intended to ensure reads happen from the array, so the test isn't strictly necessary. Fixes dotnet#865
| ret = item; | ||
| } | ||
| return count; | ||
| return ret; |
There was a problem hiding this comment.
I don't know how much of a concern it is but I think Mono+LLVM can eliminate this kind of loops and possibly screw up the results. Wouldn't it be possible to just change the condition to item == ret here? (ie. save the default value into variable and compare with that)
There was a problem hiding this comment.
You can't use the equality operator on Ts. The correct way to compare like that is with EqualityComparer<T>.Default.Equals(), which turns this into a test for EqualityComparer<T>.Default.Equals(). :)
Chatting with @stephentoub the actual goal of this loop is just to ensure the entire array is read. We're pretty sure the CoreCLR compilers are not smart enough to skip the enumeration. I don't have a Mono+LLVM setup handy - can you test this?
I'm going to check this in for now because we need to unblock the lab. If this is unsuitable for Mono we can figure out another strategy.
There was a problem hiding this comment.
The current default optimizations in Mono + LLVM JIT don't eliminate it. However, the proposed ones and the ones used for AOT compilation do eliminate it:
*** ASM for HelloWorld.Program:IterateAll<byte> ***
/var/folders/z9/3dmjh44j775b2_wsv97s8fd80000gn/T/.9W4MZO:
(__TEXT,__text) section
loWorld_Program_IterateAll_byte_:
0000000000000000 movl 0x18(%rdi), %eax
0000000000000003 testl %eax, %eax
0000000000000005 jle 0x11
0000000000000007 addl $-0x1, %eax
000000000000000a cltq
000000000000000c movb 0x20(%rdi,%rax), %al
0000000000000010 retq
0000000000000011 xorl %eax, %eax
0000000000000013 retq
***
There was a problem hiding this comment.
In the future we should consider using Consumer that would prevent from the elimination:
private readonly Consumer _consumer = new Consumer();
foreach (var item in collection)
_consumer.Consume(item);Or just redesign the benchmark. I am not sure if I understand correctly why the ArrayPool benchmark is trying to access every item in the array.
There was a problem hiding this comment.
In the future we should consider using Consumer that would prevent from the elimination
Have you measured the impact of that when running on arm? I suspect you'll find the overhead incurred from the volatile writes impacts the benchmarks.
I am not sure if I understand correctly why the ArrayPool benchmark is trying to access every item in the array.
To simulate what code actually does with the arrays, impact caches accordingly, take a corresponding amount of time between rent and return, etc.
There was a problem hiding this comment.
Have you measured the impact of that when running on arm?
Not yet. BTW @janvorli why volatile writes are so expensive on ARM?
I suspect you'll find the overhead incurred from the volatile writes impacts the benchmarks.
👍
To simulate what code actually does with the arrays, impact caches accordingly, take a corresponding amount of time between rent and return, etc.
Thanks for the explanation. I need to wrap my head around these benchmarks. We should most probably consider adding some ArrayPool benchmarks to @sebastienros aspnet/benchmarks that would allow us to test the perf under high parallel load.
There was a problem hiding this comment.
why volatile writes are so expensive on ARM?
Jan can correct me if something's changed recently, but last I knew we generate a full barrier for volatile writes on arm.
There was a problem hiding this comment.
It looks like we still do that for ARM32. Arm64 uses load-acquire instructions for reading and store-release instructions for writing 1, 4 and 8 bytes. Other sizes of reads / writes use the full memory barrier too (since there are no load-acquire / store-release instructions for other data sizes).
Full memory barrier is represented by the dmb instruction on both arm32 and arm64. It is actually not clear to me why we don't use dmb ld for loads and dmb st for stores on arm64. @sdmaclea do you happen to know that? On arm32, there is also a special flavor of dmb for stores, but no separate one for loads.
There was a problem hiding this comment.
In the future we should consider using Consumer that would prevent from the elimination:
Yeah, I'm aware of the feature. I tried it in this case and the impact to the existing benchmark was quite a lot (even on x64). My goal was to keep the numbers as close to where they were while fixing the problem.
There was a problem hiding this comment.
do you happen to know that?
Looks like https://github.com/dotnet/coreclr/issues/11545.
dotnet/roslyn#37596 includes a breaking change in the 5.0 branch. This function is
intended to ensure reads happen from the array, so the test isn't strictly necessary.
Fixes #865