Wait and lock free alternatives to LongAdder and AtomicLong by qwwdfsad · Pull Request #93 · JCTools/JCTools

qwwdfsad · 2016-03-13T02:10:16Z

As part of #36.

Alternative of LongAdder, which uses static-sized inlined stripes and doesn't try to resize adaptively.
So it benefits from less padding and dereferencing overhead (compared to @contended Cell class from Striped64), no complicated adaptive-resizing control flow and uses much more performant lock xadd (or lock addq) instruction (via Unsafe intrinsic) instead of lock cmpxchg (which is used in LongAdder instead of xadd only to indicate contention fact). Tradeoff between inc/get performance could be tuned via stripes count constructor parameter. Less performant version of counter for Java 6 included.
Assembly of inc method is here.

Some interesting benchmarks results are here (different cores amount/cross-socket modes, but always single reader/multiple writers). Benchmarks were performed on 32 cores Xeon with disabled hyper-threading (so in fact there was 16 cores), 8u72, 32 stripes (not cores * 4 as stated in current benchmark code) and with a little bit slower index method (it was 'dec' + two 'and' ops instead of single 'and' as now). Similar results are reproducible on my i7.

If you are going to benchmark this code manually, then be aware of JDK-6515172 (affects LongAdder measurements)

guidomedina · 2016-03-13T14:10:27Z

Do you think it is worth trying to compete with atomic counters after JDK 6? I know the assembler instructions have changed a lot after JDK 6.

Doug Lea have been doing a great job with CHM which you could witness at JDK 7 & 8, to the point of rendering any other alternative to be a hassle rather than improving anything.

Now queues are a different story, until they aren't anymore.

Interesting article: http://brettwooldridge.github.io/HikariCP/ludicrous.html

NOTE: Not trying to discourage you as this problem for JDK 6 and 7 is still real (Not so sure for JDK 8 after xadd), just be aware in the future and know when to pull the switch, it might be a long time before that ;-)

nitsanw · 2016-03-13T14:47:57Z

@qwwdfsad thanks for your contribution and the considerable effort you put into benchmarking it.
Looking at your results and assuming we care only for the inc() speed the results for the FixedSizeStripedV8 are encouraging. It's reasonable to compare the V6 results with a LongAdder backport perhaps, if one can be copied from somewhere.
A note on the implementation: You can replace the factory with factory methods returning the FixedSizeStripedLongCounter (and constructing the appropriate subclass) and make the implementations package private. This limits the class exposure and allows the users to make their ref type the super class. Because OpenJDK CHA analysis doesn't cover interfaces this is a minor saving for the performance crazed :-). They can always choose the interface if they like.
There's something to be said for resolving this issue in JCTools since it is Java 1.6 compliant and I know there has been some interest in this in the past. I'm happy to give it a home in the experimental asylum for now and think on it (once we've worked through the review comments).

guidomedina · 2016-03-13T14:53:42Z

@qwwdfsad After I read my own comment I think I presented it wrong, my comment only applies for Java 8 and that is a maybe, the problem you are trying to solve is very real and valid for JDK 6 and 7 which is where most users are at the moment so please ignore my previous comment, take it a further research for future of JDK 8, 9 and so on.

…ory new methods added

qwwdfsad · 2016-03-13T16:00:42Z

@nitsanw I've changed return type of CountersFactory and extended it a bit. Factory seems fine (if you don't mind), especially if other counters (thread local / statistical) will be added in the future, plus it's common pattern in JCTools.
Maybe it's a good idea not to expose create..V6/V8 methods and let benchmarks and users use most performant implementation?

qwwdfsad · 2016-03-13T16:09:01Z

@guidomedina as you can see in benchmark results for JDK 8, my counter is already faster. I'm not really competing with Doug Lea: LongAdder is general-purpose counter with dynamic resizing (to avoid excess allocations (256 bytes with @contended per Cell instance) when counter is not contended), that's why xadd can't be used (xadd doesn't indicate contention fact, but cmpxchg fail does) here, and my implementation is much more specific and doesn't try to adapt, so I can use xadd and avoid contention/resize checks.

That's why difference between my counter and LongAdder in JDK 6 shouldn't be very noticeable (both rely on same cmpxchg op), it's added for compatibility with JDK 6 only.

nitsanw · 2016-03-16T08:15:23Z

@qwwdfsad sorry for taking long, I'm having a rather crowded week... I hope to get to reviewing and merging this in a few days

nitsanw · 2016-04-02T12:00:23Z

OK, merged, with some minor rework.
In rudimentary benchmarks the benefit seems marginal, but that's on a laptop with 2 cores(4 threads) and so probably not very meaningful. I'll do more benchmarking and think on making this part of core.
@qwwdfsad Thanks allot for your efforts, please have a look at my changes and comment.

qwwdfsad · 2016-06-20T18:37:07Z

Oh, sorry for such a long pause, I completely missed github notification :(
Changes looks good, thanks for cleanup.

Firstly, I've found article from concurrency freaks with almost the same counter: http://concurrencyfreaks.blogspot.ru/2013/08/concurrency-pattern-distributed-cache.html (and comparison with LongAdder: http://concurrencyfreaks.blogspot.co.uk/2013/09/longadder-and-dclc.html), so my idea isn't new (eh). Their implementation shows significant speed-up on 32 core macnine.

Secondly, note that my benchmark configuration was pretty specific: tickless kernel with disabled RCU and hyper-threading, so it's not very "common case". It was done not to show something, but because at that moment it was the only available linux machine without load I had access to. Benefit on 2/4/8 cores shouldn't be very sufficient, but its worth measuring on larger machines.

Currently I have no multi-socket machine, but I'm ready to invest some time into benchmarking on 8-cores i7 (where difference shouldn't be huge) if you are still interested.

nitsanw · 2016-06-21T08:34:08Z

@qwwdfsad don't worry :-) original ideas are few and far between, I have a single original algorithm in the whole of JCTools (the chunked MPSC). The rest are ports/mashups/variations on existing work.
I'll have a look at that post, certainly it would be good to benchmark on larger machines, I have a large lab to experiment with (but all intels, so if you got an interesting ARM machine that would be cool).
I'm likely to put this into core at some point, since I'm putting in NHBM which also brings in a counter type.

qwwdfsad · 2016-06-24T20:59:00Z

Thank you, I will try not to :) Alas, I have no ARM machines.
Glad to hear about core integration! Hope you will share benchmark results :)

qwwdfsad added 3 commits March 13, 2016 05:06

Introducing Counter interface

c5e19de

Wait and lock free implementations of static sized striped counter

8c37c55

Counters benchmark

14ce5a5

qwwdfsad added 2 commits March 13, 2016 18:34

Don't reload cells field in FixedSizeStripedLongCounterV6

3c94b8c

Striped counters implementations became package-private, CountersFact…

e5a9dd4

…ory new methods added

nitsanw merged commit e5a9dd4 into JCTools:master Apr 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait and lock free alternatives to LongAdder and AtomicLong#93

Wait and lock free alternatives to LongAdder and AtomicLong#93
nitsanw merged 5 commits into
JCTools:masterfrom
qwwdfsad:master

qwwdfsad commented Mar 13, 2016

Uh oh!

guidomedina commented Mar 13, 2016

Uh oh!

nitsanw commented Mar 13, 2016

Uh oh!

guidomedina commented Mar 13, 2016

Uh oh!

qwwdfsad commented Mar 13, 2016

Uh oh!

qwwdfsad commented Mar 13, 2016

Uh oh!

nitsanw commented Mar 16, 2016

Uh oh!

nitsanw commented Apr 2, 2016

Uh oh!

qwwdfsad commented Jun 20, 2016

Uh oh!

nitsanw commented Jun 21, 2016

Uh oh!

qwwdfsad commented Jun 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qwwdfsad commented Mar 13, 2016

Uh oh!

guidomedina commented Mar 13, 2016

Uh oh!

nitsanw commented Mar 13, 2016

Uh oh!

guidomedina commented Mar 13, 2016

Uh oh!

qwwdfsad commented Mar 13, 2016

Uh oh!

qwwdfsad commented Mar 13, 2016

Uh oh!

nitsanw commented Mar 16, 2016

Uh oh!

nitsanw commented Apr 2, 2016

Uh oh!

qwwdfsad commented Jun 20, 2016

Uh oh!

nitsanw commented Jun 21, 2016

Uh oh!

qwwdfsad commented Jun 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants