Skip to content

Vectored IO and batching#3

Merged
stef1927 merged 2 commits intoriptano:dse-netty-4.1.13.Finalfrom
stef1927:db-1349
Jan 2, 2018
Merged

Vectored IO and batching#3
stef1927 merged 2 commits intoriptano:dse-netty-4.1.13.Finalfrom
stef1927:db-1349

Conversation

@stef1927
Copy link
Copy Markdown

@stef1927 stef1927 commented Dec 8, 2017

This is the Netty pull request for DB-1349.

It supports batching, which consists of submitting multiple independent requests via a single io_submit call, and vectored IO, which consists of reading multiple sequential buffers in a single request.

@tjake, without urgency, would you mind reviewing this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pick a smaller number as it doesn't make much sense with only 128 as a maximum value. However, I've been testing with 128 events per core without adverse results. The kernel allocates n_cores x 8 events per AIO context. We should look into reducing the number of AIO contexts that we use, for this we have DB-1463.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've verified it then I'm happy to go with your number.

Scylla seems to run a test script that finds the optimal settings per machine.

2017-12-06 22:18:06,032 [FJP:12] INFO server-1 - STDOUT: Recommended --max-io-requests: 68
2017-12-06 22:18:06,034 [FJP:12] INFO server-1 - STDOUT: Recommended --num-io-queues: 17

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set a maximum number of buffers to avoid allocations and because the kernel code is also optimized to use inline arrays for less than 8 buffers. However, it's probably not that bad to remove this maximum and allocate arrays dynamically, if this is a concern (note the array in netty_iocb).

Copy link
Copy Markdown

@tjake tjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I really like it! I left some specific comments but nothing major.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a nit, but in general we should be much more defensive in c/jni. Like checking concurrency isn't <= 0 or > X
Also that calloc and malloc return successfully.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason being we don't want a segfault

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've set a limit of 1024 since it must fit on the stack. We can revisit this in DB-1463, if we have fewer contexts and a larger concurrency then we should go back to dynamic allocation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we know from epoll exactly how many events are ready? Shouldn't we wait until we read that many?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We save a call to eventfd_read by reading up to concurrency. It's fine at the moment because the concurrency is small, see point above w.r.t. revisiting this for DB-1463. The timeout is zero, so even if there are no events it returns immediately.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to verify this was still the case. Technically I think if it's happening then it's a bug because we should never re-use a scratch buffer until the read is finished by the ChunkReader... It would help parallelism if we were able to submit these just after we get the epoll event before we get the io_events?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the comment thanks. We shouldn't have multiple outstanding requests on the same buffer, that would be a serious problem and a bug in the chunk cache, or chunk readers, or buffer pool. Technically a buffer could be reused by another thread as soon as the request is completed, but this should be fine and requests are already completed in the loop.

It would help parallelism to submit a request immediately, but then we would no longer be able to exploit batching. By submitting them at the end, we only do one native call for a batch of requests. The tests are inconclusive as to which one is preferable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've verified it then I'm happy to go with your number.

Scylla seems to run a test script that finds the optimal settings per machine.

2017-12-06 22:18:06,032 [FJP:12] INFO server-1 - STDOUT: Recommended --max-io-requests: 68
2017-12-06 22:18:06,034 [FJP:12] INFO server-1 - STDOUT: Recommended --num-io-queues: 17

@stef1927
Copy link
Copy Markdown
Author

stef1927 commented Dec 21, 2017

Thanks for the review, see 4464570.

If you've verified it then I'm happy to go with your number.
Scylla seems to run a test script that finds the optimal settings per machine.

I've picked it out of thin air for the time being. I think we need DB-1463 first. I don't think having at least 64 requests per core would hurt, I normally run my Fallout benchmarks with 128 per core and they indicate comparable or better results than master. The risk is having too few requests on machines with too many cores, in case for some reason we have several sstables associated to one core.

@tjake
Copy link
Copy Markdown

tjake commented Dec 29, 2017

Approved +1

@tjake
Copy link
Copy Markdown

tjake commented Dec 29, 2017

Great work!

Stefania Alborghetti added 2 commits January 2, 2018 12:18
- submit multiple buffers per request (vectored IO)
- submit multiple requests together (batching)
- use fixed size array with slots to avoid mallocs
- avoid syscall on eventfd, just query up to maxConcurrency.
- increase concurrency factor per loop on machines with many cores
@stef1927
Copy link
Copy Markdown
Author

stef1927 commented Jan 2, 2018

Thanks for the review, merging!

@stef1927 stef1927 merged commit 9e5580f into riptano:dse-netty-4.1.13.Final Jan 2, 2018
stef1927 pushed a commit that referenced this pull request May 15, 2018
Motivation:
ReferenceCountedOpenSslEngine is careful to lock access to `ssl`
almost everywhere (manually verified) *except* in the constructor.
Since `ssl` is non-final, it does not enjoy automatic thread safety
of the code that uses it.  Specifically, that means netty tcnative
code is not thread safe.

Modifications:

Ensure that all ssl engine intialization and variables related to
it are properly synchronized  by adding in the constructor.

Result:
Less noisy race detector.

Notes:
The specific racing threads are:
```
  Read of size 8 at 0x7b5400019ff8 by thread T52 (mutexes: write M215300):
    #0 ssl_do_info_callback .../src/ssl/ssl_lib.c:2602:24 (f077793ecd812aeebb37296c987f655c+0x23c6834)
    #1 ssl_process_alert .../src/ssl/tls_record.c:473:3 (f077793ecd812aeebb37296c987f655c+0x23a5346)
    #2 tls_open_record .../src/ssl/tls_record.c:338:12 (f077793ecd812aeebb37296c987f655c+0x23a5289)
    #3 ssl3_get_record .../src/ssl/s3_pkt.c:146:7 (f077793ecd812aeebb37296c987f655c+0x23a3da0)
    #4 ssl3_read_app_data .../src/ssl/s3_pkt.c:388:17 (f077793ecd812aeebb37296c987f655c+0x23a368f)
    #5 ssl_read_impl .../src/ssl/ssl_lib.c:722:15 (f077793ecd812aeebb37296c987f655c+0x23c0895)
    #6 SSL_read .../src/ssl/ssl_lib.c:743:10 (f077793ecd812aeebb37296c987f655c+0x23c075b)
    #7 netty_internal_tcnative_SSL_readFromSSL .../netty_tcnative/openssl-dynamic/src/main/c/ssl.c:946:12 (f077793ecd812aeebb37296c987f655c+0x23827f7)
    #8 <null> <null> (0x7fc0760193be)
    #9 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.readPlaintextData(Ljava/nio/ByteBuffer;)I (ReferenceCountedOpenSslEngine.java:449)
    #10 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;II[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:882)
    #11 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap([Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:985)
    #12 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)Ljavax/net/ssl/SSLEngineResult; (ReferenceCountedOpenSslEngine.java:1028)
    #13 io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(Lio/netty/handler/ssl/SslHandler;Lio/netty/buffer/ByteBuf;IILio/netty/buffer/ByteBuf;)Ljavax/net/ssl/SSLEngineResult; (SslHandler.java:206)
    #14 io.netty.handler.ssl.SslHandler.unwrap(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;II)Z (SslHandler.java:1162)
    #15 io.netty.handler.ssl.SslHandler.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (SslHandler.java:1084)
    #16 io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ByteToMessageDecoder.java:489)
    #17 io.netty.handler.codec.ByteToMessageDecoder.callDecode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ByteToMessageDecoder.java:428)
    #18 io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (ByteToMessageDecoder.java:265)
    #19 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:362)
    #20 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:348)
    #21 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(Ljava/lang/Object;)Lio/netty/channel/ChannelHandlerContext; (AbstractChannelHandlerContext.java:340)
    #22 io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (DefaultChannelPipeline.java:1334)
    #23 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:362)
    #24 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Lio/netty/channel/AbstractChannelHandlerContext;Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:348)
    #25 io.netty.channel.DefaultChannelPipeline.fireChannelRead(Ljava/lang/Object;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:926)
    #26 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (AbstractNioByteChannel.java:134)
    #27 io.netty.channel.nio.NioEventLoop.processSelectedKey(Ljava/nio/channels/SelectionKey;Lio/netty/channel/nio/AbstractNioChannel;)V (NioEventLoop.java:644)
    #28 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized()V (NioEventLoop.java:579)
    #29 io.netty.channel.nio.NioEventLoop.processSelectedKeys()V (NioEventLoop.java:496)
    #30 io.netty.channel.nio.NioEventLoop.run()V (NioEventLoop.java:458)
    #31 io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V (SingleThreadEventExecutor.java:858)
    #32 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V (DefaultThreadFactory.java:138)
    #33 java.lang.Thread.run()V (Thread.java:745)
    #34 (Generated Stub)

  Previous write of size 8 at 0x7b5400019ff8 by thread T97:
    #0 SSL_CTX_set_info_callback .../ssl/ssl_session.c:1136:22 (f077793ecd812aeebb37296c987f655c+0x23bd621)
    #1 netty_internal_tcnative_SSL_newSSL .../netty_tcnative/openssl-dynamic/src/main/c/ssl.c:830:5 (f077793ecd812aeebb37296c987f655c+0x2382306)
    #2 <null> <null> (0x7fc0760193be)
    #3 io.netty.handler.ssl.ReferenceCountedOpenSslEngine.<init>(Lio/netty/handler/ssl/ReferenceCountedOpenSslContext;Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;IZ)V (ReferenceCountedOpenSslEngine.java:237)
    #4 io.netty.handler.ssl.OpenSslEngine.<init>(Lio/netty/handler/ssl/OpenSslContext;Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)V (OpenSslEngine.java:31)
    #5 io.netty.handler.ssl.OpenSslContext.newEngine0(Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)Ljavax/net/ssl/SSLEngine; (OpenSslContext.java:49)
    #6 io.netty.handler.ssl.ReferenceCountedOpenSslContext.newEngine(Lio/netty/buffer/ByteBufAllocator;Ljava/lang/String;I)Ljavax/net/ssl/SSLEngine; (ReferenceCountedOpenSslContext.java:409)
    #7 io.netty.handler.ssl.ReferenceCountedOpenSslContext.newEngine(Lio/netty/buffer/ByteBufAllocator;)Ljavax/net/ssl/SSLEngine; (ReferenceCountedOpenSslContext.java:423)
    #8 io.grpc.netty.ProtocolNegotiators$ServerTlsHandler.handlerAdded(Lio/netty/channel/ChannelHandlerContext;)V (ProtocolNegotiators.java:133)
    #9 io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:597)
    #10 io.netty.channel.DefaultChannelPipeline.addLast(Lio/netty/util/concurrent/EventExecutorGroup;Ljava/lang/String;Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:226)
    #11 io.netty.channel.DefaultChannelPipeline.addLast(Lio/netty/util/concurrent/EventExecutorGroup;[Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:392)
    #12 io.netty.channel.DefaultChannelPipeline.addLast([Lio/netty/channel/ChannelHandler;)Lio/netty/channel/ChannelPipeline; (DefaultChannelPipeline.java:379)
    #13 io.grpc.netty.NettyServerTransport.start(Lio/grpc/internal/ServerTransportListener;)V (NettyServerTransport.java:99)
    #14 io.grpc.netty.NettyServer$1.initChannel(Lio/netty/channel/Channel;)V (NettyServer.java:164)
    #15 io.netty.channel.ChannelInitializer.initChannel(Lio/netty/channel/ChannelHandlerContext;)Z (ChannelInitializer.java:113)
    #16 io.netty.channel.ChannelInitializer.handlerAdded(Lio/netty/channel/ChannelHandlerContext;)V (ChannelInitializer.java:105)
    #17 io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:597)
    #18 io.netty.channel.DefaultChannelPipeline.access$000(Lio/netty/channel/DefaultChannelPipeline;Lio/netty/channel/AbstractChannelHandlerContext;)V (DefaultChannelPipeline.java:44)
    #19 io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute()V (DefaultChannelPipeline.java:1387)
    #20 io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers()V (DefaultChannelPipeline.java:1122)
    #21 io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded()V (DefaultChannelPipeline.java:647)
    #22 io.netty.channel.AbstractChannel$AbstractUnsafe.register0(Lio/netty/channel/ChannelPromise;)V (AbstractChannel.java:506)
    #23 io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(Lio/netty/channel/AbstractChannel$AbstractUnsafe;Lio/netty/channel/ChannelPromise;)V (AbstractChannel.java:419)
    #24 io.netty.channel.AbstractChannel$AbstractUnsafe$1.run()V (AbstractChannel.java:478)
    #25 io.netty.util.concurrent.AbstractEventExecutor.safeExecute(Ljava/lang/Runnable;)V (AbstractEventExecutor.java:163)
    #26 io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z (SingleThreadEventExecutor.java:403)
    #27 io.netty.channel.nio.NioEventLoop.run()V (NioEventLoop.java:462)
    #28 io.netty.util.concurrent.SingleThreadEventExecutor$5.run()V (SingleThreadEventExecutor.java:858)
    #29 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()V (DefaultThreadFactory.java:138)
    #30 java.lang.Thread.run()V (Thread.java:745)
    #31 (Generated Stub)

```
thobe pushed a commit that referenced this pull request Jun 29, 2022
…tor (netty#10890)

Motivation:

A race detector discovered a data race in GlobalEventExecutor present in
netty 4.1.51.Final:

```
  Write of size 4 at 0x0000cea08774 by thread T103:
    #0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113
    #1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31
    #2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113
    #3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133
    #4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119
    #5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106
    #6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240
    #7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
    #8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
    #9 java.lang.Thread.run()V Thread.java:835
    #10 (Generated Stub) <null>

  Previous read of size 4 at 0x0000cea08774 by thread T110:
    #0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46
    #1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263
    #2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
    #3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
    #4 java.lang.Thread.run()V Thread.java:835
    #5 (Generated Stub) <null>
```

The race is legit, but benign. To trigger it requires a TaskRunner to
begin exiting and set 'started' to false, more work to be scheduled
which starts a new TaskRunner, that work then needs to schedule
additional work which modifies 'scheduledTaskQueue', and then the
original TaskRunner checks 'scheduledTaskQueue'. But there is no danger
to this race as it can only produce a false negative in the condition
which causes the code to CAS 'started' which is thread-safe.

Modifications:

Delete problematic references to scheduledTaskQueue. The only way
scheduledTaskQueue could be modified since the last check is if another
TaskRunner is running, in which case the current TaskRunner doesn't
care.

Result:

Data-race free code, and a bit less code to boot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants