Skip to content

Add async_files library to support filesystem buffering and filesystem caching#20332

Merged
mattklein123 merged 43 commits intoenvoyproxy:mainfrom
ravenblackx:async_files
Apr 19, 2022
Merged

Add async_files library to support filesystem buffering and filesystem caching#20332
mattklein123 merged 43 commits intoenvoyproxy:mainfrom
ravenblackx:async_files

Conversation

@ravenblackx
Copy link
Copy Markdown
Contributor

Signed-off-by: Raven Black ravenblack@dropbox.com

Commit Message: Add async_files library to support filesystem buffering and filesystem caching.
Additional Description: This is a support library for subsequent PRs for an http filter contrib to buffer to local filesystem (issue #19026) and a cache implementation contrib that caches to local filesystem. Being sent for review separately because each part is already large for review.
Risk Level: None on its own, because it's not used yet.
Testing: I believe there's good unit test coverage; relying on the PR CI to tell me how good, as I wasn't able to extract any information via do_ci.sh bazel.coverage and this documentation suggests using the PR.
Docs Changes: There's a readme for the library, no changes for external docs.
Release Notes: No changes to existing behavior.
Platform Specific Features: Current implementation may be Linux-based only; not sure if other filesystems support unnamed files. Shouldn't need guards since the library is going to be used for extensions only - can simply not load the extension where it's unsupported.

@mattklein123 mattklein123 self-assigned this Mar 14, 2022
@ravenblackx
Copy link
Copy Markdown
Contributor Author

Please don't review yet, still trying to get a good coverage report out of it. I'm going to be amend-pushing which would make a pre-emptive review frustrating.

@mattklein123
Copy link
Copy Markdown
Member

OK just ping me when ready, thanks.

/wait

@ravenblackx
Copy link
Copy Markdown
Contributor Author

Turns out CI isn't useful for getting a usable coverage report for a thing in contrib, but I was able to get one by doing:

ENVOY_DOCKER_BUILD_DIR=/home/ravenblack/envoy_artefacts ./ci/run_envoy_docker.sh 'BAZEL_EXTRA_TEST_OPTIONS="--test_env=ENVOY_IP_TEST_VERSIONS=v4only" ./ci/do_ci.sh bazel.coverage //contrib/common/async_files/...'

genhtml $(find /home/ravenblack/envoy_artefacts/ -name "coverage.dat") --output-directory ~/coverage

(Briefly documented here in case it shows up in search results for how to get a useful coverage report, or so I can refer back to it myself later!)

Which turned up 93% coverage, and the only uncovered lines are ones that I don't think it's possible to provoke without running in different environments (the path taken when the filesystem doesn't support O_TMPFILE, and the paths taken when pread and pwrite somehow do only a partial read/write). So I think this is about as good as it's going to get.

@mattklein123
Copy link
Copy Markdown
Member

Turns out CI isn't useful for getting a usable coverage report for a thing in contrib

I think this should go into core as WiP xDS APIs and extensions. These are features that I think the project should widely support and I'm happy to sponsor and help with reviews. Can you move the code over and then I can review in the new location? Thank you.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy[\w/]*/(v1alpha\d?|v1|v2alpha\d?|v2))|(api/envoy/type/(matcher/)?\w+.proto).
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #20332 was synchronize by ravenblackx.

see: more, trace.

@ravenblackx ravenblackx force-pushed the async_files branch 3 times, most recently from a6027fd to 5a76b3e Compare March 16, 2022 17:45
@ravenblackx
Copy link
Copy Markdown
Contributor Author

Sorry for the accidental CCs, some unnecessary files got modified by check_format.py and slipped into the commit without me noticing, one of the 5+ times it happened. (Partial fix for the cause of that in #20372)

…m caching

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@mattklein123
Copy link
Copy Markdown
Member

High level question before getting going with review. Given that we are about to introduce support for io_uring in #19339, should we actually implement this using io_uring and avoid the thread pool? Granted, that will not work on OSX or Windows, though right now this code doesn't work on Windows anyway. Thoughts? cc @rojkov

@ravenblackx
Copy link
Copy Markdown
Contributor Author

/retest

Per discussion in slack, it's designed to be compile-time-switchable (maybe even run-time-switchable) to other implementations later - the API should be compatible without modification for different implementations. I went with the "lowest common denominator" implementation first, in part because at least some of our deployment environments don't support io_uring yet.

Though it is a little weird that this and the io_uring wrapper are essentially aiming to be the same abstraction (an asynchronous file operations wrapper that's implementation-agnostic), just with different initial implementations. Perhaps ideally they should merge into one later, but I'd rather have something that works and then later migrate to something unified, than try to migrate while the implementation is still a bit of a moving target. I think this might be a higher-level abstraction than the io_uring wrapper, in which case it might end up making sense to have both, and the io_uring one just be a dependency of one of the optional implementations of this one.

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20332 (comment) was created by @ravenblackx.

see: more, trace.

@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Apr 13, 2022
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @mattklein123

🐱

Caused by: #20332 was synchronize by ravenblackx.

see: more, trace.

…_files

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the runaround, let's get the build stuff sorted and then we can finish this out.

/wait


// [#protodoc-title: AsyncFileManager configuration]

// [#extension: envoy.common.support.async_files]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the runaround, but I think this is where everything went sideways for you. Essentially something in the docs is requiring that this have an extension category. The thing is, there shouldn't be any category for this message. This is just a helper message that will get included in other real config (filters, etc.). So I don't think there should be any extension here or category as it's not a real thing. Can you remove this tag and the one below? What error do you get if you do so? If something in the docs is causing you issues, I would just mark this [not-implemented-hide] (or whatever the tag is) until something actually includes this message that we need documented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and yes, once it's de-extensioned the whole cascade of requirements goes away).

PPC_SKIP_TARGETS = ["envoy.filters.http.lua"]

WINDOWS_SKIP_TARGETS = [
"envoy.common.support.async_files",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above we shouldn't need this. What you have now is just helper libraries, not a real extension.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,84 @@
load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_extension",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably where things went wrong. Just use envoy_cc_library, the actual extension will be the thing that actually ends up including this library. Bazel will take care of the rest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my fault, I was momentarily thinking of extensions like shared objects / dll style plugins, where if you have a library that two extensions are going to want to share then you need that library to also be a plugin, but yeah, of course envoy is not doing it like that.

"envoy.rbac.matchers", "envoy.access_loggers.extension_filters", "envoy.http.stateful_session",
"envoy.matching.http.input", "envoy.matching.network.input",
"envoy.matching.network.custom_matchers")
"envoy.matching.network.custom_matchers", "envoy.common.support")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove per other comments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20332 (comment) was created by @ravenblackx.

see: more, trace.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM to ship and iterate. Cool stuff. Just a few small random comments. Thank you!

/wait

# Access loggers
#

"envoy.access_loggers.file": "//source/extensions/access_loggers/file:config",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert changes to this file. I don't think they are needed anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Autoformat! [shakefist]

}

std::function<void()> AsyncFileManagerThreadPool::enqueue(std::shared_ptr<AsyncFileAction> action) {
auto cancelFunc = [action]() { action->cancel(); };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cancel_func, same elsewhere if I missed it (snake case for local vars)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Linter! [shakefist]

std::function<void()> AsyncFileManagerThreadPool::enqueue(std::shared_ptr<AsyncFileAction> action) {
auto cancelFunc = [action]() { action->cancel(); };
absl::MutexLock lock(&queue_mutex_);
if (ThreadIsWorker) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here? This is an optimization if this is being called from within a callback running on a worker thread? Technically couldn't you avoid grabbing the lock in this case and just acquire the lock below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the lock saving. It's not just an optimization, it's an intentional behavior difference, the new action doesn't yield the thread, essentially skipping to the head of the queue - added a comment to that effect.

case AsyncFileManager::Mode::WriteOnly:
return O_WRONLY;
case AsyncFileManager::Mode::ReadWrite:
default:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove default so we have a compiler error if something else is added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +60 to +61
default:
throw EnvoyException("unrecognized AsyncFileManagerConfig::ManagerType");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't happen due to the PGV validation. I'm not sure what we do here now since we removed the GCOV NOT_REACHED macros but I would look at other similar cases and see what Alyssa changed them to.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like throwing an exception is the new way (in at least the first two examples I found, e.g. this one.)

Added a comment about it being expected to be unreachable.

Comment on lines +30 to +34
absl::flat_hash_map<std::string, std::shared_ptr<AsyncFileManager>>
managers_ ABSL_GUARDED_BY(mu_);
absl::flat_hash_map<std::string,
envoy::extensions::common::async_files::v3::AsyncFileManagerConfig>
configs_ ABSL_GUARDED_BY(mu_);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just have a single map with a struct containing both the manager and config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was very lazy of me, sorry. Done.

…and explain, snake locals

Signed-off-by: Raven Black <ravenblack@dropbox.com>
mattklein123
mattklein123 previously approved these changes Apr 18, 2022
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
@mattklein123 mattklein123 enabled auto-merge (squash) April 19, 2022 16:21
@mattklein123 mattklein123 merged commit 18c779a into envoyproxy:main Apr 19, 2022
@ravenblackx ravenblackx deleted the async_files branch April 19, 2022 21:05
ravenblackx added a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…m caching (envoyproxy#20332)

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants