Skip to content

io: add io_uring wrapper#19339

Merged
wrowe merged 20 commits intoenvoyproxy:mainfrom
rojkov:io-uring-wrapper
Apr 8, 2022
Merged

io: add io_uring wrapper#19339
wrowe merged 20 commits intoenvoyproxy:mainfrom
rojkov:io-uring-wrapper

Conversation

@rojkov
Copy link
Copy Markdown
Member

@rojkov rojkov commented Dec 22, 2021

Commit Message: io: add io_uring wrapper
Additional Description:
The wrapper can be used for asynchronous Block IO and Network IO in Linux
as an alternative to the standard POSIX system calls.

Risk Level: low, new code
Testing: unit tests added
Docs Changes: N/A
Release Notes: N/A, no changes visible to users
Platform Specific Features: io_uring wrapper for Linux

Contributes to #17922

The wrapper can be used for asynchronous Block IO and Network IO in Linux
as an alternative to the standard POSIX system calls.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Dec 22, 2021
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @wrowe

🐱

Caused by: #19339 was opened by rojkov.

see: more, trace.

@rojkov
Copy link
Copy Markdown
Member Author

rojkov commented Dec 22, 2021

This is a smaller part of #19082.

/assign @antoniovicente

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
This reverts commit 60d32a7.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Jan 5, 2022

@wrowe @antoniovicente it looks like this is ready for another round of review?

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting stuff. I have some questions about the ultimate use of this new sophisticated mechanism.

*/
virtual void unregisterEventfd() PURE;

virtual bool isEventfdRegistered() const PURE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

enum class IoUringResult { Ok, Busy, Failed };

/**
* Abstract wrapper around `io_uring`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the threading model assumed by IoUring? Which methods can be called from any thread and which can only be called from the IoRing thread?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io_uring docs don't state any requirement regarding thread-safety though it probably should. In case of Envoy every IoUring is coupled with one libevent loop (it doesn't run its own) which is rarely posted from threads other than the one running it. So, I assumed that making IoUring thread local and calling its methods only from the thread it lives in simplifies everything. But perhaps it makes more sense to extend Dispatcher to own IoUrings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense for the dispatcher to own the io_urings eventually instead of providing a getOrCreate factory method. We can get to that as development continues.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From examples I believe that uring is thread safe. However conceptually there should only one thread emptying the queue. But adding more I/O operations (preparing) into the ring can be done from any thread.

* and IoUringResult::Ok otherwise.
*/
virtual IoUringResult prepareAccept(os_fd_t fd, struct sockaddr* remote_addr,
socklen_t* remote_addr_len, void* user_data) PURE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of void* in the interface seems a bit too low level. What is the ultimate purpose of this interface? Implement an alternate dispatcher.h implementation? I wonder if use of an user_data interface would improve the usability of the resulting class as it would allow for the removal of the CompletionCb argument to forEveryCompletion

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Particularly this interface hides the io_uring syscalls and makes unit testing of the new iohandle (not in this PR) easier a bit. Originally I used a custom type containing the network iohandle instead of *void, but in case of Block IO a different type might be needed. Probably this *void in the interface could be avoided with a template.

I haven't thought of the possibility to implement an alternative Dispatcher, because its interface implies the file readiness model (i.e. Dispatcher::createFileEvent()), whereas io_uring is all about completeness of system operations. CompletionCb is an analog of FileReadyCb, but the former isn't called for a known file descriptor. A system operation and its context (including the arguments) are parts of a completeness event.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to use void* for now, we may revisit this later as our understanding of this module improves.

I'ld like to better understand how io_uring and dispatcher should interact. I could see some threads being exclusively one or the other. I'm not sure about how they would coexist in a thread while avoiding starvation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no problem of multiple io_urings coexisting in one dispatcher: all updates in their completion queues are reflected in their respective event notification file descriptors (man eventfd). So, different callback instances are called for different eventfds, only one registered callback per io_uring. The dispatcher is supposed to call these callbacks in the same manner as in the case of connection callbacks.

I have rewritten the tests to run the callbacks through Dispatcher::run().

Though I fail to come up with a simple enough API for Dispatcher to accommodate multiple io_urings if go this way. If we assume only one io_uring per Dispatcher then extending Dispatcher with something like void Dispatcher::registerCompletionEvent(CompletionCb cb) and IoUring& Dispatcher::getIoUring() would be enough, IoUring::registerEventfd() could be dropped, since all manipulations with eventfd would be incapsulated in Dispatcher.

For the case of multiple io_urings Dispatcher::getIoUring() and Dispatcher::registerCompletionEvent() should distinguish io_urings somehow. Maybe something like the following could work:

using IoUringId = uint32_t;

class DispatcherImpl {
public:
  ...
  IoUringId addIoUring(const IoUringFactory& factory);
  void registerCompletionEvent(IoUringId id, CompletionCb cb);
  IoUring& getIoUring(IoUringId id);
  ...

private:
  std::vector<IoUring> io_urings_;
};

* `io_uring_enter()` system call.
* Returns IoUringResult::Ok in case of success and may return
* IoUringResult::Busy if we over commit the number of requests. In the latter
* case the application should wait for some completions and try again.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should the application do that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a clarification

"//source/server:server_lib",
"//source/server:listener_hooks_lib",
] + envoy_all_core_extensions() +
# TODO(rojkov): drop io_uring dependency when it's fully integrated.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this TODO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is this wrapper doesn't get linked in Envoy's binary. It's not an extension and is not used by the new iohandle currently. If dropped it results in

Invalid build graph structure. deps(//source/...) != deps(//source/exe:envoy_main_common_with_core_extensions_lib) union deps(//source/extensions/...)

in ci/run_envoy_docker.sh 'ci/do_ci.sh deps'. The original PR didn't need this dependency added.

}
}

static thread_local const IoUringFactoryImpl factory_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does IoUringFactoryImpl need to be thread local? It seems to me that it doesn't need to be given the thread local behavior of its getOrCreate accessor.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was a mistake. Fixed.

res = uring.submit();
EXPECT_EQ(res, IoUringResult::Ok);

int ret = epoll_wait(epoll_fd, events, 10, -1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of epoll_wait to decide when to drive the IoUring seems like generic code that most uring users will needed. Should this use of epoll_wait be moved to IoUringImpl ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users don't need epoll_wait if they rely on libevent for event handling (via Dispatcher).

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente Jan 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intended use is together with the dispatcher, perhaps this test should use the dispatcher for event handling and thus serve as an example of how to use this class.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The test doesn't include sys/epoll.h now, event handlers are run by dispatcher instances.

@@ -0,0 +1,239 @@
#include <sys/epoll.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the test failures in CI:

[2021-12-28 13:52:10.547][15][critical][assert] [source/common/io/io_uring_impl.cc:34] assert failure: ret == 0. Details: unable to initialize io_uring: Operation not permitted

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is complicated.

The test passes perfectly when run with Azure's managed workers: https://dev.azure.com/dmitryrojkov/meta-helloworld/_build/results?buildId=13&view=logs&j=8c169225-0ae8-53bd-947f-07cb81846cb5&t=6b0ace90-28b2-51d9-2951-b1fd35e67dec
It fails if RBE is active. I suspect that io_uring syscalls are not allow-listed on the workers used as RBEs. Do you happen to know how to reach out to people running those builders?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yanavlasov who may have some thoughts.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

@antoniovicente if this isn't waiting can you ping on what happens next or do a review pass? :-)

@antoniovicente
Copy link
Copy Markdown
Contributor

Blocked on tests on RBE. Could use some help from @yanavlasov

/wait

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

enum class IoUringResult { Ok, Busy, Failed };

/**
* Abstract wrapper around `io_uring`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From examples I believe that uring is thread safe. However conceptually there should only one thread emptying the queue. But adding more I/O operations (preparing) into the ring can be done from any thread.

@rojkov
Copy link
Copy Markdown
Member Author

rojkov commented Mar 16, 2022

I tried to use syscalls directly instead of liburing and very soon realized that what I come up with is a copy-paste of the lib basically. And rather complex one which we definitely don't want to maintain. So, I'd propose to proceed with liburing.

@rojkov rojkov removed the waiting label Mar 16, 2022
@htuch
Copy link
Copy Markdown
Member

htuch commented Mar 16, 2022

Sounds like a good faith effort to explore alternatives and this is the least bad option, at least if we want to target performance, so no objections.

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Mar 17, 2022

Last call, it sounds like concerns of @htuch , @moderation, @yanavlasov and @antoniovicente are all addressed? If I don't hear otherwise, I'll stamp the deps and merge this Friday morning.

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Mar 17, 2022

This needs an API stamp from someone.

@@ -0,0 +1,35 @@
syntax = "proto3";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this in the bootstrap extension hierarchy? Looking below, I think this is what it is. This hierarchy is introduced in https://github.com/envoyproxy/envoy/pull/19467/files, it seems out bootstrap extensions have been adhoc scattered so far :(

IoUring& IoUringFactoryImpl::getOrCreate() const { return tls_->getTyped<IoUringImpl>(); }

Server::BootstrapExtensionPtr
IoUringFactoryImpl::createBootstrapExtension(const Protobuf::Message& config,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit strange, this is both in core but also an extension?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, right. Originally it was not an extension. It was turned into an extension in a later commit to avoid static thread_local and to use the TLS API instead as suggested.

Now I'm confused. If I keep the code to be an extension io_uring won't be usable in core. Otherwise the TLS API seems to be unavailable. I'd rather drop TLS and bootstrap extension as it was originally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm confused, so you're not the only one :) Why can't we use TLS in extensions? Why does this need to be used directly by core?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be used directly by core?

This comment is the main reason.

Why can't we use TLS in extensions?

That was about (un)availability of TLS in core, not extensions. This concern is moot probably. Still I'm not sure why using TLS is more beneficial than static thread_local in this case, because I bumped into other problems with integration testing.

Copy link
Copy Markdown
Member

@htuch htuch Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then let's make it core and eliminate the use of the extension stuff. We can use TLS from non-extensions, it's used all over core, e.g. see

ThreadLocal::TypedSlot<ThreadLocalClusterManagerImpl> tls_;
.

I think one of the nice things about using Envoy TLS is you get TLS lifecycle management, which should in theory actually work better for testing etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

So far my problem with Envoy TLS in testing has been that the thread controlling integration tests doesn't initialize its TLS slot hence lacks io_uring which is needed to open a client connection. I'll experiment with enabling TLS in the testing thread.

The bootstrap extension is dropped now.

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Mar 22, 2022

/wait

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@adisuissa
Copy link
Copy Markdown
Contributor

@htuch @yanavlasov PTAL

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than nit on extension, no longer any objection on API / extension side.

Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

@wrowe this has been waiting on your dependency stamp for 8 days?

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Apr 7, 2022

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Apr 7, 2022
@wrowe wrowe enabled auto-merge (squash) April 7, 2022 13:37
@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Apr 7, 2022

@yanavlasov - this is blocked on your review, positive confirmation obtained from moderation, htuch, no further concerns from antoniovicente.

@rojkov - do you want to take a final pass through yanavlasov's feedback and mark what you believe has been addressed as 'resolved'?

@rojkov
Copy link
Copy Markdown
Member Author

rojkov commented Apr 7, 2022

Yep, I've just marked the addressed feedback from Yan as resolved.

@wrowe wrowe requested a review from yanavlasov April 7, 2022 15:59
@wrowe wrowe merged commit 6c12be8 into envoyproxy:main Apr 8, 2022
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
The wrapper can be used for asynchronous Block IO and Network IO in Linux
as an alternative to the standard POSIX system calls.

Commit Message: io: add io_uring wrapper
Additional Description:
The wrapper can be used for asynchronous Block IO and Network IO in Linux
as an alternative to the standard POSIX system calls.

Risk Level: low, new code
Testing: unit tests added
Docs Changes: N/A
Release Notes: N/A, no changes visible to users
Platform Specific Features: io_uring wrapper for Linux

Contributes to envoyproxy#17922
Signed-off-by: Dmitry Rozhkov <dmitry.rozhkov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants