来源:https://juejin.cn/post/6921595303460241415
| 写在前面本篇文章其实不会间接进入主题讲为什么LongAdder性能好于AtomicLong,而是先介绍一下volatile,一是能够将比来所学理一下,二是我觉得AtomicLong是为领会决volatile不适用的场景,就当是一个铺垫,然后在介绍AtomicLong,最初在介绍LongAdder以及LongAdder和AtomicLong的性能比力 ,若是间接想看原因间接跳转至文末:产素性能差别的原因。| volatilevolatile关键字能够理解为轻量级的synchronized,它的利用不会引起线程上下文的切换和调度,利用成本比synchronized低。但是volatile只包管了可见性,所谓可见性是指:当一线程修改了被volatile润色的变量时,新值对其他线程来说老是立即可知的。volatile不适用于i++如许的计算场景,即运算成果依赖变量的当前值。看个例子:VolatileTest.java。
public class VolatileTest { private static final int THREAD_COUNT = 20; private static volatile int race = 0; public static void increase() { race++; } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } //等所有累加线程都完毕 while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); }}那个办法的功用很简单,就是每个线程对race停止1000次自增操做,20个线程对race施行自增,20 * 1000 = 20000才对,然而无论对法式运行几次,成果都是小于20000的。原因出在increase办法上,固然increase办法只要一行,但是反编译以后会发现只要一行代码的increase办法是由四行字节码指令构成的。
| AtomicLong
固然通过对increase办法加锁能够包管成果的准确性,但是synchronized、ReentLock都是互斥锁,统一时刻只允许一个线程施行其余线程只能期待,施行效率会十分差。还好jdk针对那种运算的场景供给了原子类,将上述被volatile润色的int类型的race变量修改为AtomicLong类型,代码如下:AtomicLongTest.java。
public class AtomicLongTest { private static final int THREAD_COUNT = 20; private static volatile AtomicLong race = new AtomicLong(0); public static void increase() { race.getAndIncrement(); } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } //等所有累加线程都完毕 while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); }}运算后得到了预期成果:20000。
固然AtomicLong能够包管成果的准确性,但是在高并发场景下,利用AtomicLong的性能其实不好。为领会决性能的问题,jdk1.8中引进了LongAdder。| LongAdderLongAdder的利用姿势和AtomicLong类似,将上面代码中的AtomicLong修改为LongAdder,测试代码如下:
public class LongAdderTest { private static final int THREAD_COUNT = 20; //默认初始化为0值 private static volatile LongAdder race = new LongAdder(); public static void increase() { race.increment(); } public static void main(String[] args) { Thread[] threads = new Thread[THREAD_COUNT]; for (int i = 0; i < THREAD_COUNT; i++) { threads[i] = new Thread(new Runnable() { @Override public void run() { for (int i = 0; i < 1000; i++) { increase(); } } }); threads[i].start(); } while (Thread.activeCount() > 1) { Thread.yield(); } System.out.println("race: " + race); }}成果也是预期的20000。
| AtomicLong和LongAdder性能比力领会了volatile关键字,AtomicLong和LongAdder后,来测试一下AtomicLong和LongAdder性能,两者的功用都差不多,若何选择应该用数据说话利用JMH做Benchmark基准测试,测试代码如下:
@BenchmarkMode(Mode.Throughput)@OutputTimeUnit(TimeUnit.MILLISECONDS)public class PerformaceTest { private static AtomicLong atomicLong = new AtomicLong(); private static LongAdder longAdder = new LongAdder(); @Benchmark @Threads(10) public void atomicLongAdd() { atomicLong.getAndIncrement(); } @Benchmark @Threads(10) public void longAdderAdd() { longAdder.increment(); } public static void main(String[] args) throws RunnerException { Options options = new OptionsBuilder().include(PerformaceTest.class.getSimpleName()).build(); new Runner(options).run(); }}申明:
@BenchmarkMode(Mode.Throughput) => 测试吞吐量@OutputTimeUnit(TimeUnit.MILLISECONDS) => 输出的时间单元@Threads(10) => 每个历程中的测试线程数测试成果:线程数为1:Benchmark Mode Cnt Score Error UnitsPerformaceTest.atomicLongAdd thrpt 200 153824.699 ± 137.947 ops/msPerformaceTest.longAdderAdd thrpt 200 124087.220 ± 81.015 ops/ms线程数为5:PerformaceTest.atomicLongAdd thrpt 200 56392.136 ± 1165.361 ops/msPerformaceTest.longAdderAdd thrpt 200 605501.870 ± 4140.190 ops/ms线程数为10:Benchmark Mode Cnt Score Error UnitsPerformaceTest.atomicLongAdd thrpt 200 53286.334 ± 957.765 ops/msPerformaceTest.longAdderAdd thrpt 200 713884.602 ± 3950.884 ops/ms从测试成果来看,当线程数为5时,LongAdder的性能已经优于AtomicLong。
| 产素性能差别的原因阐发性能差别必需深切源码,对源码停止分析,起首先看下AtomicLong的getAndIncrement办法。AtomicLong#getAndIncrement办法阐发//AtomicLong#getAndIncrementpublic final long getAndIncrement() { return unsafe.getAndAddLong(this, valueOffset, 1L);}//Unsafe#getAndAddLongpublic final long getAndAddLong(Object var1, long var2, long var4) { long var6; do { var6 = this.getLongVolatile(var1, var2); } while(!this.compareAndSwapLong(var1, var2, var6, var6 + var4)); return var6;}底层利用的是CAS算法,JVM中的CAS操做是操纵了处置器供给的CMPXCHG指令实现的。自旋CAS实现的根本思绪就是轮回停止CAS操做曲到胜利为行,也恰是因为如许的实现思绪也带来了在高并发下的性能问题。轮回时间长开销大,自旋CAS若是长时间不胜利,会给处置器带来十分大的施行开销。在高并发情况下,N个线程同时停止自旋操做,会呈现大量失败其实不断自旋的情况,所以在上述测试中,当测试线程数十分多时,利用LongAdder的性能优于利用AtomicLong。LongAdder#increment办法阐发public void increment() { add(1L);}public void add(long x) { Cell[] as; long b, v; int m; Cell a; if ((as = cells) != null || !casBase(b = base, b + x)) { boolean uncontended = true; if (as == null || (m = as.length - 1) < 0 || (a = as[getProbe() & m]) == null || !(uncontended = a.cas(v = a.value, v + x))) longAccumulate(x, null, uncontended); }}final void longAccumulate(long x, LongBinaryOperator fn, boolean wasUncontended) { int h; if ((h = getProbe()) == 0) { ThreadLocalRandom.current(); // force initialization h = getProbe(); wasUncontended = true; } boolean collide = false; // True if last slot nonempty for (;;) { Cell[] as; Cell a; int n; long v; if ((as = cells) != null && (n = as.length) > 0) { if ((a = as[(n - 1) & h]) == null) { if (cellsBusy == 0) { // Try to attach new Cell Cell r = new Cell(x); // Optimistically create if (cellsBusy == 0 && casCellsBusy()) { boolean created = false; try { // Recheck under lock Cell[] rs; int m, j; if ((rs = cells) != null && (m = rs.length) > 0 && rs[j = (m - 1) & h] == null) { rs[j] = r; created = true; } } finally { cellsBusy = 0; } if (created) break; continue; // Slot is now non-empty } } collide = false; } else if (!wasUncontended) // CAS already known to fail wasUncontended = true; // Continue after rehash else if (a.cas(v = a.value, ((fn == null) ? v + x : fn.applyAsLong(v, x)))) break; else if (n >= NCPU || cells != as) collide = false; // At max size or stale else if (!collide) collide = true; else if (cellsBusy == 0 && casCellsBusy()) { try { if (cells == as) { // Expand table unless stale Cell[] rs = new Cell[n <<>1]; for (int i = 0; i < n; ++i) rs[i] = as[i]; cells = rs; } } finally { cellsBusy = 0; } collide = false; continue; // Retry with expanded table } h = advanceProbe(h); } else if (cellsBusy == 0 && cells == as && casCellsBusy()) { boolean init = false; try { // Initialize table if (cells == as) { Cell[] rs = new Cell[2]; rs[h & 1] = new Cell(x); cells = rs; init = true; } } finally { cellsBusy = 0; } if (init) break; } else if (casBase(v = base, ((fn == null) ? v + x : fn.applyAsLong(v, x)))) break; // Fall back on using base }}代码很长,能够连系图片理解:
LongAdder性能高的原因是通过利用Cell数组,以空间换效率制止共享变量的合作,在LongAdder中内部利用base变量保留Long值 ,当没有线程抵触时,利用CAS更新base的值,而存在线程抵触时,没有施行CAS胜利的线程将CAS操做Cell数组,将数组中的元素置为1,即cell[i]=1,最初获取计数时管帐算cell[i]的总和在加base,即为最初的计数成果,sum代码如下:public long sum() { Cell[] as = cells; Cell a; long sum = base; if (as != null) { for (int i = 0; i < as.length; ++i) { if ((a = as[i]) != null) sum += a.value; } } return sum;}
| AtomicLong和LongAdder选择高并发下选择LongAdder,非高并发下选择AtomicLong。






还没有评论,来说两句吧...