Vectored IO and batching#3
Conversation
There was a problem hiding this comment.
We can pick a smaller number as it doesn't make much sense with only 128 as a maximum value. However, I've been testing with 128 events per core without adverse results. The kernel allocates n_cores x 8 events per AIO context. We should look into reducing the number of AIO contexts that we use, for this we have DB-1463.
There was a problem hiding this comment.
If you've verified it then I'm happy to go with your number.
Scylla seems to run a test script that finds the optimal settings per machine.
2017-12-06 22:18:06,032 [FJP:12] INFO server-1 - STDOUT: Recommended --max-io-requests: 68
2017-12-06 22:18:06,034 [FJP:12] INFO server-1 - STDOUT: Recommended --num-io-queues: 17
There was a problem hiding this comment.
We set a maximum number of buffers to avoid allocations and because the kernel code is also optimized to use inline arrays for less than 8 buffers. However, it's probably not that bad to remove this maximum and allocate arrays dynamically, if this is a concern (note the array in netty_iocb).
tjake
left a comment
There was a problem hiding this comment.
Overall I really like it! I left some specific comments but nothing major.
There was a problem hiding this comment.
Kind of a nit, but in general we should be much more defensive in c/jni. Like checking concurrency isn't <= 0 or > X
Also that calloc and malloc return successfully.
There was a problem hiding this comment.
Done. I've set a limit of 1024 since it must fit on the stack. We can revisit this in DB-1463, if we have fewer contexts and a larger concurrency then we should go back to dynamic allocation.
There was a problem hiding this comment.
Don't we know from epoll exactly how many events are ready? Shouldn't we wait until we read that many?
There was a problem hiding this comment.
We save a call to eventfd_read by reading up to concurrency. It's fine at the moment because the concurrency is small, see point above w.r.t. revisiting this for DB-1463. The timeout is zero, so even if there are no events it returns immediately.
There was a problem hiding this comment.
I wanted to verify this was still the case. Technically I think if it's happening then it's a bug because we should never re-use a scratch buffer until the read is finished by the ChunkReader... It would help parallelism if we were able to submit these just after we get the epoll event before we get the io_events?
There was a problem hiding this comment.
I've removed the comment thanks. We shouldn't have multiple outstanding requests on the same buffer, that would be a serious problem and a bug in the chunk cache, or chunk readers, or buffer pool. Technically a buffer could be reused by another thread as soon as the request is completed, but this should be fine and requests are already completed in the loop.
It would help parallelism to submit a request immediately, but then we would no longer be able to exploit batching. By submitting them at the end, we only do one native call for a batch of requests. The tests are inconclusive as to which one is preferable.
There was a problem hiding this comment.
If you've verified it then I'm happy to go with your number.
Scylla seems to run a test script that finds the optimal settings per machine.
2017-12-06 22:18:06,032 [FJP:12] INFO server-1 - STDOUT: Recommended --max-io-requests: 68
2017-12-06 22:18:06,034 [FJP:12] INFO server-1 - STDOUT: Recommended --num-io-queues: 17
|
Thanks for the review, see 4464570.
I've picked it out of thin air for the time being. I think we need DB-1463 first. I don't think having at least 64 requests per core would hurt, I normally run my Fallout benchmarks with 128 per core and they indicate comparable or better results than master. The risk is having too few requests on machines with too many cores, in case for some reason we have several sstables associated to one core. |
|
Approved +1 |
|
Great work! |
- submit multiple buffers per request (vectored IO) - submit multiple requests together (batching) - use fixed size array with slots to avoid mallocs - avoid syscall on eventfd, just query up to maxConcurrency. - increase concurrency factor per loop on machines with many cores
|
Thanks for the review, merging! |
Motivation:
ReferenceCountedOpenSslEngine is careful to lock access to `ssl`
almost everywhere (manually verified) *except* in the constructor.
Since `ssl` is non-final, it does not enjoy automatic thread safety
of the code that uses it. Specifically, that means netty tcnative
code is not thread safe.
Modifications:
Ensure that all ssl engine intialization and variables related to
it are properly synchronized by adding in the constructor.
Result:
Less noisy race detector.
Notes:
The specific racing threads are:
```
Read of size 8 at 0x7b5400019ff8 by thread T52 (mutexes: write M215300):
#0 ssl_do_info_callback .../src/ssl/ssl_lib.c:2602:24 (f077793ecd812aeebb37296c987f655c+0x23c6834)
#1 ssl_process_alert .../src/ssl/tls_record.c:473:3 (f077793ecd812aeebb37296c987f655c+0x23a5346)
#2 tls_open_record .../src/ssl/tls_record.c:338:12 (f077793ecd812aeebb37296c987f655c+0x23a5289)
#3 ssl3_get_record .../src/ssl/s3_pkt.c:146:7 (f077793ecd812aeebb37296c987f655c+0x23a3da0)
#4 ssl3_read_app_data .../src/ssl/s3_pkt.c:388:17 (f077793ecd812aeebb37296c987f655c+0x23a368f)
#5 ssl_read_impl .../src/ssl/ssl_lib.c:722:15 (f077793ecd812aeebb37296c987f655c+0x23c0895)
#6 SSL_read .../src/ssl/ssl_lib.c:743:10 (f077793ecd812aeebb37296c987f655c+0x23c075b)
#7 netty_internal_tcnative_SSL_readFromSSL .../netty_tcnative/openssl-dynamic/src/main/c/ssl.c:946:12 (f077793ecd812aeebb37296c987f655c+0x23827f7)
#8 <null> <null> (0x7fc0760193be)
#9 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.readPlaintextData(Ljava/nio/ByteBuffer;)I (ReferenceCountedOpenSslEngine.java:449)
#10 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:882)
#11 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:985)
#12 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:1028)
#13 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;IILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (SslHandler.java:206)
#14 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;II)Z (SslHandler.java:1162)
#15 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (SslHandler.java:1084)
#16 io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ByteToMessageDecoder.java:489)
#17 io.netty.handler.codec.ByteToMessageDecoder.callDecode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ByteToMessageDecoder.java:428)
#18 io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (ByteToMessageDecoder.java:265)
#19 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:362)
#20 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:348)
#21 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Ljava/lang/Object;)Lio/netty/channel/ChannelHandlerContext; (AbstractChannelHandlerContext.java:340)
#22 io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (DefaultChannelPipeline.java:1334)
#23 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:362)
#24 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:348)
#25 io.netty.channel.DefaultChannelPipeline.fireChannelRead(Ljava/lang/Object;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:926)
#26 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (AbstractNioByteChannel.java:134)
#27 io.netty.channel.nio.NioEventLoop.processSelectedKey(Ljava/nio/channels/SelectionKey;Lio/netty/channel/nio/AbstractNioChannel;)V (NioEventLoop.java:644)
#28 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized()V (NioEventLoop.java:579)
#29 io.netty.channel.nio.NioEventLoop.processSelectedKeys()V (NioEventLoop.java:496)
#30 io.netty.channel.nio.NioEventLoop.run()V (NioEventLoop.java:458)
#31 io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V (SingleThreadEventExecutor.java:858)
#32 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V (DefaultThreadFactory.java:138)
#33 java.lang.Thread.run()V (Thread.java:745)
#34 (Generated Stub)
Previous write of size 8 at 0x7b5400019ff8 by thread T97:
#0 SSL_CTX_set_info_callback .../ssl/ssl_session.c:1136:22 (f077793ecd812aeebb37296c987f655c+0x23bd621)
#1 netty_internal_tcnative_SSL_newSSL .../netty_tcnative/openssl-dynamic/src/main/c/ssl.c:830:5 (f077793ecd812aeebb37296c987f655c+0x2382306)
#2 <null> <null> (0x7fc0760193be)
#3 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.<init>(Lio/netty/handler/ssl/ReferenceCountedOpenSslContext;Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;IZ)V (ReferenceCountedOpenSslEngine.java:237)
#4 io.netty.handler.ssl.OpenSslEngine.<init>(Lio/netty/handler/ssl/OpenSslContext;Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)V (OpenSslEngine.java:31)
#5 io.netty.handler.ssl.OpenSslContext.newEngine0(Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)Ljavax/net/ssl/SSLEngine; (OpenSslContext.java:49)
#6 io.netty.handler.ssl.ReferenceCountedOpenSslContext.newEngine(Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)Ljavax/net/ssl/SSLEngine; (ReferenceCountedOpenSslContext.java:409)
#7 io.netty.handler.ssl.ReferenceCountedOpenSslContext.newEngine(Lio/netty/buffer/ByteBufAllocator;)Ljavax/net/ssl/SSLEngine; (ReferenceCountedOpenSslContext.java:423)
#8 io.grpc.netty.ProtocolNegotiators$ServerTlsHandler.handlerAdded(Lio/netty/channel/ChannelHandlerContext;)V (ProtocolNegotiators.java:133)
#9 io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:597)
#10 io.netty.channel.DefaultChannelPipeline.addLast(Lio/netty/util/concurrent/EventExecutorGroup;Ljava/lang/String;Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:226)
#11 io.netty.channel.DefaultChannelPipeline.addLast(Lio/netty/util/concurrent/EventExecutorGroup;[Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:392)
#12 io.netty.channel.DefaultChannelPipeline.addLast([Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:379)
#13 io.grpc.netty.NettyServerTransport.start(Lio/grpc/internal/ServerTransportListener;)V (NettyServerTransport.java:99)
#14 io.grpc.netty.NettyServer$1.initChannel(Lio/netty/channel/Channel;)V (NettyServer.java:164)
#15 io.netty.channel.ChannelInitializer.initChannel(Lio/netty/channel/ChannelHandlerContext;)Z (ChannelInitializer.java:113)
#16 io.netty.channel.ChannelInitializer.handlerAdded(Lio/netty/channel/ChannelHandlerContext;)V (ChannelInitializer.java:105)
#17 io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:597)
#18 io.netty.channel.DefaultChannelPipeline.access$000(Lio/netty/channel/DefaultChannelPipeline;Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:44)
#19 io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute()V (DefaultChannelPipeline.java:1387)
#20 io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers()V (DefaultChannelPipeline.java:1122)
#21 io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded()V (DefaultChannelPipeline.java:647)
#22 io.netty.channel.AbstractChannel$AbstractUnsafe.register0(Lio/netty/channel/ChannelPromise;)V (AbstractChannel.java:506)
#23 io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(Lio/netty/channel/AbstractChannel$AbstractUnsafe;Lio/netty/channel/ChannelPromise;)V (AbstractChannel.java:419)
#24 io.netty.channel.AbstractChannel$AbstractUnsafe$1.run()V (AbstractChannel.java:478)
#25 io.netty.util.concurrent.AbstractEventExecutor.safeExecute(Ljava/lang/Runnable;)V (AbstractEventExecutor.java:163)
#26 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z (SingleThreadEventExecutor.java:403)
#27 io.netty.channel.nio.NioEventLoop.run()V (NioEventLoop.java:462)
#28 io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V (SingleThreadEventExecutor.java:858)
#29 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V (DefaultThreadFactory.java:138)
#30 java.lang.Thread.run()V (Thread.java:745)
#31 (Generated Stub)
```
…tor (netty#10890) Motivation: A race detector discovered a data race in GlobalEventExecutor present in netty 4.1.51.Final: ``` Write of size 4 at 0x0000cea08774 by thread T103: #0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113 #1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31 #2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113 #3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133 #4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119 #5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106 #6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240 #7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #9 java.lang.Thread.run()V Thread.java:835 #10 (Generated Stub) <null> Previous read of size 4 at 0x0000cea08774 by thread T110: #0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46 #1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263 #2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74 #3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30 #4 java.lang.Thread.run()V Thread.java:835 #5 (Generated Stub) <null> ``` The race is legit, but benign. To trigger it requires a TaskRunner to begin exiting and set 'started' to false, more work to be scheduled which starts a new TaskRunner, that work then needs to schedule additional work which modifies 'scheduledTaskQueue', and then the original TaskRunner checks 'scheduledTaskQueue'. But there is no danger to this race as it can only produce a false negative in the condition which causes the code to CAS 'started' which is thread-safe. Modifications: Delete problematic references to scheduledTaskQueue. The only way scheduledTaskQueue could be modified since the last check is if another TaskRunner is running, in which case the current TaskRunner doesn't care. Result: Data-race free code, and a bit less code to boot.
This is the Netty pull request for DB-1349.
It supports batching, which consists of submitting multiple independent requests via a single
io_submitcall, and vectored IO, which consists of reading multiple sequential buffers in a single request.@tjake, without urgency, would you mind reviewing this?