Buffer alignment #6293

kiril-me · 2017-01-29T21:35:22Z

Motivation:

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.

Modification:
At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files

buffer/src/main/java/io/netty/buffer/PoolArena.java
buffer/src/main/java/io/netty/buffer/PoolChunk.java
buffer/src/main/java/io/netty/buffer/PooledByteBuf.java
buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java
buffer/src/test/java/io/netty/buffer/AbstractByteBufTest.java
buffer/src/test/java/io/netty/buffer/PoolArenaTest.java
buffer/src/test/java/io/netty/buffer/PooledByteBufAllocatorTest.java
microbench/src/main/java/io/netty/microbench/buffer/ByteBufAllocatorBenchmark.java

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

Result:
Benchmark                                       (cacheAlign)   (size)  Mode  Cnt   Score   Error  Units
PooledByteBufAllocatorAlignBenchmark.read                  0    01024  avgt   25   0.013 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    04096  avgt   25   0.050 ± 0.004  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    16384  avgt   25   0.202 ± 0.018  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    65536  avgt   25   0.840 ± 0.065  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0  1048576  avgt   25  23.778 ± 4.068  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    01024  avgt   25   0.012 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    04096  avgt   25   0.047 ± 0.003  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    16384  avgt   25   0.200 ± 0.022  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    65536  avgt   25   0.749 ± 0.079  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64  1048576  avgt   25  13.331 ± 1.396  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    01024  avgt   25   0.013 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    04096  avgt   25   0.050 ± 0.004  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    16384  avgt   25   0.220 ± 0.027  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    65536  avgt   25   0.830 ± 0.067  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0  1048576  avgt   25  16.060 ± 0.484  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    01024  avgt   25   0.012 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    04096  avgt   25   0.045 ± 0.003  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    16384  avgt   25   0.177 ± 0.011  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    65536  avgt   25   0.746 ± 0.076  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64  1048576  avgt   25  14.150 ± 0.619  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    01024  avgt   25   0.023 ± 0.002  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    04096  avgt   25   0.094 ± 0.007  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    16384  avgt   25   0.380 ± 0.028  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    65536  avgt   25   1.477 ± 0.127  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0  1048576  avgt   25  27.154 ± 2.389  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    01024  avgt   25   0.021 ± 0.002  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    04096  avgt   25   0.087 ± 0.009  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    16384  avgt   25   0.353 ± 0.037  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    65536  avgt   25   1.367 ± 0.112  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64  1048576  avgt   25  22.501 ± 1.552  ms/op

Benchmarks show better write and read performance on large buffer size.

normanmaurer · 2017-01-30T07:51:26Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

+    int alignCapacity(int reqCapacity) {
+        int delta = reqCapacity & cacheAlignmentMask;
+        if (delta == 0) {
+                return reqCapacity;


please fix formatting.

normanmaurer · 2017-01-30T07:52:37Z

buffer/src/test/java/io/netty/buffer/AbstractByteBufTest.java

+//        testInternalNioBuffer(128);
+//        testInternalNioBuffer(1024);
+//        testInternalNioBuffer(4 * 1024);
+//        testInternalNioBuffer(64 * 1024);


Why did you do this ?

normanmaurer · 2017-01-30T07:52:53Z

buffer/src/test/java/io/netty/buffer/PooledByteBufAllocatorTest.java

+    	PooledByteBufAllocator pool = new PooledByteBufAllocator(true, 2, 2, 8192, 11, 1000, 1000, 1000, true, 64);
+    	ByteBuf buff = pool.directBuffer(4096);
+    	for(int i = 0; i < 4096; i++) {
+    		buff.writeByte(100);


please fix formatting.

normanmaurer · 2017-01-30T07:53:07Z

buffer/src/test/java/io/netty/buffer/PooledByteBufAllocatorTest.java

+    public void testArenaMetricsCacheAlign() {
+        testArenaMetrics0(new PooledByteBufAllocator(true, 2, 2, 8192, 11, 1000, 1000, 1000, true, 64), 100, 1, 1, 0);
+    }
+    @Test


Add empty line above

normanmaurer · 2017-01-30T07:53:13Z

buffer/src/test/java/io/netty/buffer/PooledByteBufAllocatorTest.java

+    	}
+    	buff.release();		
+    }
+


remove empty line

normanmaurer · 2017-01-30T07:54:28Z

buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java

    public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder,
                                  int tinyCacheSize, int smallCacheSize, int normalCacheSize,
-                                  boolean useCacheForAllThreads) {
+                                  boolean useCacheForAllThreads, int cacheAlignment) {


@kiril-me we also need to keep the old constructor to not break the API.

normanmaurer · 2017-01-30T07:55:26Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

+                return reqCapacity;
+            } else {
+                return alignCapacity(reqCapacity);
+            }


make this:

return cacheAlignment == 0 ? reqCapacity : alignCapacity(reqCapacity);

normanmaurer · 2017-01-30T07:56:48Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

        @Override
        protected PoolChunk<byte[]> newUnpooledChunk(int capacity) {
-            return new PoolChunk<byte[]>(this, new byte[capacity], capacity);
+            return new PoolChunk<byte[]>(this, new byte[capacity], capacity, 0);


so as we only do this for direct buffers why not rename it to directMemoryCacheAlignment or something like this. This also is true for the system property etc.

normanmaurer · 2017-01-30T07:58:30Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

+                memory = allocateDirect(capacity + cacheAlignment);
+                offset = offsetCacheLine(memory, cacheAlignmentMask);
+            }
+            return new PoolChunk<ByteBuffer>(this, memory, capacity, offset);


consider changing this to:

return cacheAlignment == 0 ? new PoolChunk<ByteBuffer>(this, allocateDirect(capacity), capacity, 0) : new PoolChunk<ByteBuffer>(this, allocateDirect(capacity + cacheAlignment), capacity, offsetCacheLine(memory, cacheAlignmentMask)) ;

normanmaurer · 2017-01-30T07:59:22Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

        protected PoolChunk<ByteBuffer> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize) {
+            final ByteBuffer memory;
+            final int offset;
+            if (cacheAlignment == 0) {


same as below

normanmaurer · 2017-01-30T08:01:48Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+            pooledDirectBuffers[i].writeBytes(bytes);
+        }
+    }
+


remove empty line

normanmaurer · 2017-01-30T08:02:53Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+        int block = size / 128;
+        for (int i = 0; i < pooledDirectBuffers.length; i++) {
+            byte[] bytes = new byte[block];
+            rand.nextBytes(bytes);


@kiril-me the allocating and filling of bytes[] should not happen in the benchmark itself, but be part of the @Setup otherwise it will affect the benchmark. Same goes fro everything else. that is not pooledDirectBuffers[i].writeBytes(bytes). Even better would be to also remove the array access here.

normanmaurer · 2017-01-30T08:03:07Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+    }
+
+    @Benchmark
+    public void writeRead() {


same comment as below.

normanmaurer · 2017-01-30T08:03:47Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+
+import java.util.HashMap;
+import java.util.Map;
+/*


move copyright to the top of the file and also change year to 2017

normanmaurer · 2017-01-30T08:38:53Z

buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java

    public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder,
                                  int tinyCacheSize, int smallCacheSize, int normalCacheSize,
-                                  boolean useCacheForAllThreads) {
+                                  boolean useCacheForAllThreads, int cacheAlignment) {


verify cacheAlignment is >= 0

normanmaurer · 2017-01-30T16:37:35Z

@kiril-me also please re-run benchmarks and update here once you are done

normanmaurer · 2017-01-31T08:45:29Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+
+    private PooledByteBufAllocator pooledAllocator;
+
+    private ByteBuf pooledDirectBuffers;


nit: pooledDirectBuffers -> pooledDirectBuffer

normanmaurer · 2017-01-31T08:47:08Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

+    @Benchmark
+    public void write() {
+        pooledDirectBuffers.writeBytes(bytes);
+    }


@kiril-me also add a benchmark which just reads ? For this you will need to write in the doSetup() method tho.

normanmaurer · 2017-01-31T08:49:13Z

@kiril-me also ensure you show the new numbers after the changes are in.

normanmaurer · 2017-01-31T14:13:12Z

@kiril-me please rebase on top of current 4.1 so it only includes your commit.

kiril-me · 2017-01-31T14:54:06Z

@normanmaurer I made changes. Reworked benchmarks. Still need to make research how to make benchmarks stable.

normanmaurer · 2017-01-31T16:10:42Z

@kiril-me let me know once I should check again

kiril-me · 2017-02-01T14:12:17Z

@normanmaurer I changed benchmarks. I have two direct buffers. First is default direct buffer for whom I calculate offset in case it was aligned. Because we want to have the miss-align buffer. The second is 64-byte aligned. The performance is visible for large buffer sizes.

normanmaurer · 2017-02-01T14:58:50Z

@kiril-me please share the new results.

kiril-me · 2017-02-01T15:14:36Z

Result:

Benchmark                                       (cacheAlign)   (size)  Mode  Cnt   Score   Error  Units
PooledByteBufAllocatorAlignBenchmark.read                  0    01024  avgt   25   0.013 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    04096  avgt   25   0.050 ± 0.004  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    16384  avgt   25   0.202 ± 0.018  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0    65536  avgt   25   0.840 ± 0.065  ms/op
PooledByteBufAllocatorAlignBenchmark.read                  0  1048576  avgt   25  23.778 ± 4.068  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    01024  avgt   25   0.012 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    04096  avgt   25   0.047 ± 0.003  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    16384  avgt   25   0.200 ± 0.022  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64    65536  avgt   25   0.749 ± 0.079  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64  1048576  avgt   25  13.331 ± 1.396  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    01024  avgt   25   0.013 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    04096  avgt   25   0.050 ± 0.004  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    16384  avgt   25   0.220 ± 0.027  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0    65536  avgt   25   0.830 ± 0.067  ms/op
PooledByteBufAllocatorAlignBenchmark.write                 0  1048576  avgt   25  16.060 ± 0.484  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    01024  avgt   25   0.012 ± 0.001  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    04096  avgt   25   0.045 ± 0.003  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    16384  avgt   25   0.177 ± 0.011  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64    65536  avgt   25   0.746 ± 0.076  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64  1048576  avgt   25  14.150 ± 0.619  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    01024  avgt   25   0.023 ± 0.002  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    04096  avgt   25   0.094 ± 0.007  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    16384  avgt   25   0.380 ± 0.028  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0    65536  avgt   25   1.477 ± 0.127  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead             0  1048576  avgt   25  27.154 ± 2.389  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    01024  avgt   25   0.021 ± 0.002  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    04096  avgt   25   0.087 ± 0.009  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    16384  avgt   25   0.353 ± 0.037  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64    65536  avgt   25   1.367 ± 0.112  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64  1048576  avgt   25  22.501 ± 1.552  ms/op

normanmaurer · 2017-02-01T15:16:07Z

@kiril-me didnt you say that there is a performance win with cache alignment ? All the numbers in your results suggest otherwise.

kiril-me · 2017-02-01T15:20:29Z

I used average time benchmark. The lower value the better. Here is results for 1048576 buffer.

PooledByteBufAllocatorAlignBenchmark.read                  0  1048576  avgt   25  23.778 ± 4.068  ms/op
PooledByteBufAllocatorAlignBenchmark.read                 64  1048576  avgt   25  13.331 ± 1.396  ms/op

PooledByteBufAllocatorAlignBenchmark.write                 0  1048576  avgt   25  16.060 ± 0.484  ms/op
PooledByteBufAllocatorAlignBenchmark.write                64  1048576  avgt   25  14.150 ± 0.619  ms/op

PooledByteBufAllocatorAlignBenchmark.writeRead             0  1048576  avgt   25  27.154 ± 2.389  ms/op
PooledByteBufAllocatorAlignBenchmark.writeRead            64  1048576  avgt   25  22.501 ± 1.552  ms/op

normanmaurer · 2017-02-01T15:23:08Z

@kiril-me ah doh! I did not notice you used avgt !

kiril-me · 2017-02-01T15:30:30Z

Throughput measurement.

Benchmark                                       (cacheAlign)   (size)   Mode  Cnt   Score   Error   Units
PooledByteBufAllocatorAlignBenchmark.read                  0    01024  thrpt   25  77.005 ± 7.958  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                  0    04096  thrpt   25  19.920 ± 1.757  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                  0    16384  thrpt   25   5.038 ± 0.544  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                  0    65536  thrpt   25   1.170 ± 0.156  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                  0  1048576  thrpt   25   0.052 ± 0.003  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                 64    01024  thrpt   25  81.833 ± 6.673  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                 64    04096  thrpt   25  21.035 ± 1.704  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                 64    16384  thrpt   25   5.465 ± 0.443  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                 64    65536  thrpt   25   1.296 ± 0.112  ops/ms
PooledByteBufAllocatorAlignBenchmark.read                 64  1048576  thrpt   25   0.076 ± 0.007  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                 0    01024  thrpt   25  77.216 ± 7.052  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                 0    04096  thrpt   25  19.165 ± 1.373  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                 0    16384  thrpt   25   4.969 ± 0.332  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                 0    65536  thrpt   25   1.241 ± 0.087  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                 0  1048576  thrpt   25   0.062 ± 0.003  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                64    01024  thrpt   25  85.550 ± 6.341  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                64    04096  thrpt   25  21.650 ± 1.796  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                64    16384  thrpt   25   5.365 ± 0.455  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                64    65536  thrpt   25   1.323 ± 0.096  ops/ms
PooledByteBufAllocatorAlignBenchmark.write                64  1048576  thrpt   25   0.074 ± 0.004  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead             0    01024  thrpt   25  42.563 ± 4.060  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead             0    04096  thrpt   25  10.743 ± 0.958  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead             0    16384  thrpt   25   2.688 ± 0.190  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead             0    65536  thrpt   25   0.670 ± 0.042  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead             0  1048576  thrpt   25   0.040 ± 0.003  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead            64    01024  thrpt   25  44.415 ± 4.095  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead            64    04096  thrpt   25  11.130 ± 0.854  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead            64    16384  thrpt   25   2.896 ± 0.182  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead            64    65536  thrpt   25   0.717 ± 0.047  ops/ms
PooledByteBufAllocatorAlignBenchmark.writeRead            64  1048576  thrpt   25   0.043 ± 0.004  ops/ms

normanmaurer · 2017-02-01T15:25:32Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

couldn't you share almost all the code while the only difference would be the alignOffset in some cases ?

What do you mean? I use alignOffset for the miss-align line. Yes, it will be used in some cases. I didn't find the way to change offset inside buffer once it was created.

never mind...

normanmaurer · 2017-02-01T15:26:47Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

can you add a comment that explains the 1137 ?

normanmaurer · 2017-02-01T15:27:44Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

This can be static final and also please add a comment why you used 4

normanmaurer · 2017-02-01T15:56:42Z

@kiril-me also thanks for all the effort! Looks very good :)

netkins · 2017-02-01T16:30:00Z

buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java

Constructor has 9 parameters, which is greater than 7 authorized.

Scottmitch

changes look good! few small comments.

Scottmitch · 2017-02-02T01:54:41Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

does this have to be a power of 2 for the mask below to work? if so should we enforce that somewhere for example: warn and go to the next positive power of 2, or set to 0?

MathUtil.safeFindNextPositivePowerOfTwo maybe useful here.

I added a check inside PooledByteBufAllocator. Should I add it in PoolArena too?

PooledByteBufAllocator is good enough IMHO

Scottmitch · 2017-02-02T01:57:32Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

nit: could be tertiary for slightly less code:

return delta == 0 ? reqCapacity : reqCapacity + directMemoryCacheAlignment - delta;

Scottmitch · 2017-02-02T01:58:30Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

nit: else is not necessary because you return in the if statement above

Scottmitch · 2017-02-02T01:58:33Z

buffer/src/main/java/io/netty/buffer/PoolArena.java

nit: else is not necessary because you return in the if statement above

Scottmitch · 2017-02-02T02:03:09Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

you make a temporary for size and sizeMask but not for alignOffset ... do we need any temporaries?

yes, I made it temporary as well.

just curious why the temporaries are necessary ... is this just preference or habit from dealing with volatiles/mutable state?

it's habit to be sure that data is mutable

normanmaurer · 2017-02-02T10:39:23Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

normanmaurer · 2017-02-02T10:39:28Z

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java

normanmaurer · 2017-02-03T21:16:14Z

@kiril-me please squash

@normanmaurer

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes. Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code. Modification: At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files Result: Buffer alignment works better than miss-align cache.

normanmaurer · 2017-02-06T07:09:21Z

@kiril-me once you signed the ICLA I can merge this one... Thanks!

http://netty.io/s/icla

kiril-me · 2017-02-06T08:07:14Z

Done. When are you planning to release 4.0.45.Final?

normanmaurer · 2017-02-06T08:50:04Z

@kiril-me thanks... within the next two weeks.

normanmaurer · 2017-02-06T08:53:36Z

Cherry-picked into 4.1 (66b9be3) and 4.0 (2f0b079)

@kiril-me thanks a lot!

normanmaurer requested changes Jan 30, 2017

View reviewed changes

normanmaurer reviewed Jan 31, 2017

View reviewed changes

kiril-me force-pushed the 4.1 branch from 9ce2946 to 6511e8a Compare January 31, 2017 14:51

normanmaurer requested changes Feb 1, 2017

View reviewed changes

normanmaurer requested review from Scottmitch, nmittler and trustin February 1, 2017 15:56

netkins reviewed Feb 1, 2017

View reviewed changes

buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java Outdated

Copy link

netkins Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor has 9 parameters, which is greater than 7 authorized.

Scottmitch requested changes Feb 2, 2017

View reviewed changes

normanmaurer reviewed Feb 2, 2017

View reviewed changes

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java Outdated

Copy link

Member

normanmaurer Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

normanmaurer reviewed Feb 2, 2017

View reviewed changes

microbench/src/main/java/io/netty/microbench/buffer/PooledByteBufAllocatorAlignBenchmark.java Outdated

Copy link

Member

normanmaurer Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

kiril-me mentioned this pull request Feb 2, 2017

ARROW-186 - Make sure alignment and memory padding conform to spec apache/arrow#98

Closed

normanmaurer approved these changes Feb 3, 2017

View reviewed changes

Scottmitch approved these changes Feb 3, 2017

View reviewed changes

normanmaurer self-assigned this Feb 3, 2017

normanmaurer added this to the 4.0.45.Final milestone Feb 3, 2017

kiril-me force-pushed the 4.1 branch from 93962b7 to 9323627 Compare February 4, 2017 12:45

normanmaurer added the feature label Feb 5, 2017

normanmaurer closed this Feb 6, 2017


		private PooledByteBufAllocator pooledAllocator;

		private ByteBuf pooledDirectBuffers;

Uh oh!

Buffer alignment #6293

Buffer alignment #6293

Uh oh!

Conversation

kiril-me commented Jan 29, 2017 • edited by normanmaurer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normanmaurer commented Jan 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normanmaurer commented Jan 31, 2017

Uh oh!

normanmaurer commented Jan 31, 2017

Uh oh!

kiril-me commented Jan 31, 2017

Uh oh!

normanmaurer commented Jan 31, 2017

Uh oh!

kiril-me commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

normanmaurer commented Feb 1, 2017

Uh oh!

kiril-me commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

normanmaurer commented Feb 1, 2017

Uh oh!

kiril-me commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

normanmaurer commented Feb 1, 2017

Uh oh!

kiril-me commented Feb 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normanmaurer commented Feb 1, 2017

kiril-me commented Jan 29, 2017 •

edited by normanmaurer

Loading

kiril-me commented Feb 1, 2017 •

edited

Loading

kiril-me commented Feb 1, 2017 •

edited

Loading

kiril-me commented Feb 1, 2017 •

edited

Loading

kiril-me Feb 4, 2017 •

edited

Loading