filesystem: add Windows implementation#5470
filesystem: add Windows implementation#5470sesmith177 wants to merge 1 commit intoenvoyproxy:masterfrom greenhouse-org:filesystem-split-pr
Conversation
|
@sesmith177 can you merge master and then we can find someone to review this? |
|
@mattklein123 just rebased |
|
@envoyproxy/maintainers anyone interested in taking a look at this? |
This commit adds a Windows implementation for code that directly interacts with the filesystem. It does this by replacing the static methods in Envoy::Filesystem with an abstract class with separate Windows / POSIX implementations. This object is then passed through to where it is needed Signed-off-by: Sam Smith <sesmith177@gmail.com> Signed-off-by: Yael Harel <yharel@pivotal.io> Signed-off-by: Arjun Sreedharan <asreedharan@pivotal.io> Signed-off-by: Yechiel Kalmenson <ykalmenson@pivotal.io>
|
Not surprisingly no takers. 😉 I can take a look. |
|
I can look if you like; just fell off my radar. |
|
@jmarantz thanks please do |
lizan
left a comment
There was a problem hiding this comment.
Some high level comments:
Did we consider use std::experimental::filesystem (and std::filesystem when we upgrade to c++17) for some of the interface? I think we're already using toolchains with support of them?
| /** | ||
| * @return Filesystem::Instance& a reference to the filesystem instance | ||
| */ | ||
| virtual Filesystem::Instance& fileSystem() PURE; |
There was a problem hiding this comment.
Can Filesystem::Instance registered via SingletonManager so they don't need to added to every context, we only have one Instance for whole process right?
There was a problem hiding this comment.
Sure, we can try to rework this PR to use a singleton
There was a problem hiding this comment.
Can we discuss? It's not super-obvious that there should only be one file-system instantiated per process (as opposed to threads, which are fundamentally owned by a process). In particular in tests, in the past, I've seen a memory-file-system instantiated per-test-method.
I could also imagine a server wanting to talk to more than one file-system.
Moreover we already access Filesystem via the API so it may not be necessary to have drastic plumbing. I need to look deeper into this PR.
There was a problem hiding this comment.
We don't have a strong opinion on singleton vs. passing through and are happy to implement this PR in whichever direction is decided to be best
When we were working on this, it looked like some usages (e.g. callers of https://github.com/greenhouse-org/envoy/blob/4734f68d354551dc1e0ded3bc835a5c970622d3a/source/common/config/datasource.cc#L9) didn't appear to have a path to api object, hence passing the filesystem through everywhere. But if we missed something, it would be great to reduce this size of this PR
There was a problem hiding this comment.
In this case I think what we want to do is provide an API& here, and from that we can get the file-system (and thread-system, and probably also the stats symbol table when I eventually merge that).
There was a problem hiding this comment.
Wait I still don't understand. Isn't Filesystem already plumbed wherever it needs to go? Or were we instantiating a new file-system some places where it needed to be plumbed?
There was a problem hiding this comment.
Prior to this PR, all the Filesystem methods were static (and not attached to a class), so no objects had to be passed through anywhere. The Filesystem::Instance object only had the createFile method which was exposed through the api object. This is only ever called by the AccessLogManagerImpl object. Everywhere else, the static methods are being used
There was a problem hiding this comment.
Ah I see, makes sense. Maybe a simpler intermediate PR is just to make those non-static, plumb stuff through, and not make any other changes. In this PR you have:
- making Filesystem static methods non-static
- splitting the Raw vs stats/flusher decorators out
- windows impl
Just doing the first of those (via API) would be large but fairly mindless and easy to review
There was a problem hiding this comment.
That plan works for us. It will probably be easiest just to start fresh with a new PR to make Filesystem static methods non-static as opposed to trying to disentangle the current state of this branch / PR
There was a problem hiding this comment.
sg; btw that's how I typically try to get big changes done; I do the whole thing in one PR but I then break it up into reviewable pieces
| return info.st_size; | ||
| } | ||
|
|
||
| std::string InstanceImplWin32::fileReadToEnd(const std::string& path) { |
There was a problem hiding this comment.
some functions like this is same as the one in posix, extract?
There was a problem hiding this comment.
This one is almost the same -- the difference is that the file needs to be opened with std::ios_base_binary on Windows. If I understand that option correctly, it won't have any effect on posix, so we can probably pull this one out
|
Most of this interface can be implemented using |
jmarantz
left a comment
There was a problem hiding this comment.
flushing comments, only 10% done.
| * @return the opened file. | ||
| */ | ||
| virtual Filesystem::FileSharedPtr createAccessLog(const std::string& file_name) PURE; | ||
| virtual Filesystem::StatsFileSharedPtr createAccessLog(const std::string& file_name) PURE; |
There was a problem hiding this comment.
just to reduce diff explosion I might have kept FileSharedPtr as is, as it's used everywhere., and made the os-level one be OSFilePtr.
| #ifdef _M_AMD64 | ||
| typedef int64_t ssize_t; | ||
| #elif _M_IX86 | ||
| typedef int32_t ssize_t; |
There was a problem hiding this comment.
do we actually build on 32-bit systems? If so we use uint64_in a lot of places where size_t should probably be used. If not, then maybe we should #error out here?
However I'd probably write this anyway as:
#if sizeof(size_t) == 8
....
#elif sizeof(size_t) == 4
...
for less dependence on specific processor types.
| * Open the file with O_RDWR | O_APPEND | O_CREAT | ||
| * The file will be closed when this object is destructed | ||
| */ | ||
| virtual void open() PURE; |
There was a problem hiding this comment.
not returning a boolean here indicating success? Or does it throw? Oh I see there's an isOpen() method; maybe just mention that you need to call it after open() even if it's somewhat obvious :)
There was a problem hiding this comment.
Right now, open throws if it fails. I think right now the semantics of this RawFile class are a bit odd:
- constuctor calls open(), though "closed" is a valid state
- open() throws on failure
- write() returns an error code on failure
- close() swallows any errors
I think it might make more sense for:
- constructor doesn't do anything
- open/write/close return the error codes
And then the user of this class (the file object with stats+flushing) can handle the return codes / throw exceptions as before
There was a problem hiding this comment.
+100 but I also appreciate you don't want to change semantics in a giant PR.
Can you leave a TODO?
There was a problem hiding this comment.
No, I think we should get the semantics correct here -- this is a new object. Since it is really only used by the File object that has stats + flushing at the moment, the important thing to is to maintain those semantics, which should be straightforward with the proposed RawFile semantics
| /** | ||
| * Close the file. | ||
| */ | ||
| virtual void close() PURE; |
There was a problem hiding this comment.
mention in the doc the behavior if the file is not opened?
|
|
||
| /** | ||
| * @return full file content as a string. | ||
| * @throw EnvoyException if the file cannot be read. |
There was a problem hiding this comment.
do we need to throw here? https://github.com/envoyproxy/envoy/blob/master/STYLE.md says:
Apply caution when using exceptions on the data path for general purpose error handling. Exceptions are not caught on the data path and they should not be used for simple error handling, e.g. with shallow call stacks, where explicit error handling provides a more readable and easier to reason about implementation.
There was a problem hiding this comment.
Throwing on error here is the existing behavior, we did not want to change that if possible
| /** | ||
| * @return bool whether a file exists on disk and can be opened for read. | ||
| */ | ||
| virtual bool fileExists(const std::string& path) PURE; |
There was a problem hiding this comment.
consider using absl::string_view at the interface here.
| * @param file_flush_interval_msec Number of milliseconds to delay before flushing. | ||
| */ | ||
| virtual StatsFileSharedPtr | ||
| createStatsFile(const std::string& path, Event::Dispatcher& dispatcher, |
There was a problem hiding this comment.
Yeah I'm doubling down on not calling this a Stats file. Basically there are 2 Envoy semantics being added to raw OS files here: stats and auto-flushing.
I think it's too messy to have them both as wrapper classes, so I think we should just call this part of the Envoy file semantics, rather than biasing toward Stats.
And the raw file can be RawFile, whicih will pretty much only be used in the implementation of File, but makes it easier to support multiple OSs.
There was a problem hiding this comment.
Slightly more thoughts on this: I think it might make sense to have separate decorator classes for file-flushing and stat-collecting, to be determined when constructing the Filesystem that goes into the API, but those don't change the interface; we should still be working with Filesystem everywhere.
Then if we want a new mix-in we can add it, and it can just have one job.
Or I'm fine having them all in Envoy::Filesystem::Instance if @mattklein123 wants to stick with that :)
There was a problem hiding this comment.
I considered that briefly in the course of this PR, but it looks like it may be tricky pull the stats and the file-flushing apart (e.g. the File object keeps a flushed_by_timer_ stat)
There was a problem hiding this comment.
OK simpler to leave them together then. But the API exposed to the rest of the system can be agnostic to that, right?
There was a problem hiding this comment.
To be sure I understand: you're suggesting RawFile and File should share a common interface, and the API filesystem will return that interface (with the implementation being a File)?
| @@ -0,0 +1,39 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
I'm not sure what's possible, but is there any chance to split some of these changes out into a separate PR? Meaning, can we do all the interface work in one PR, and then come back and add Windows? I think that would be easier to review.
There was a problem hiding this comment.
We can give it a try, though the vast majority of the changes here are interface-related. I'm not sure if it will make it that much smaller
There was a problem hiding this comment.
I think it's worth it to do the interface changes separate from the windows stuff, if you don't mind.
There was a problem hiding this comment.
sure, we can back the windows changes out of this PR
@envoyproxy/maintainers any thoughts? We already use them in tests, so the toolchains we use already have support for them, I'm in favor of using it if it simplifies this PR significantly. |
Do we have an idea of what toolchains support it? People using RHEL derivatives use ancient toolchains, and it's possible some of them don't compile tests. I'm not sure. I would hate to exclude people before we need to. I guess if we were to just use it on the Windows side of things for now that would be fine also. |
|
@mattklein123 Based on my quick research, GCC >= 5.3, LLVM/clang >= 3.9, MSVC >= 2017. In RHEL7, devtoolset-7 works (tested on CentOS7), cc @bdecoste? |
This sounds reasonable to me to use if it helps us. |
|
We might have std::experimental limits at Google as well, last time we tried std::experimental::filesystem we had to use the fallback implementation. I'd suggest using std::filesystem for new platforms/code and retain existing code elsewhere, with some deprecation schedule. |
|
std::filesystem is part of c++17 fwiw. |
| /** | ||
| * Add a file watch. | ||
| * @param path supplies the path to watch. | ||
| * @param events supplies the events to watch. |
There was a problem hiding this comment.
give a pointer to where the possible event values are found.
| /** | ||
| * @return Filesystem::Instance& a reference to the filesystem instance | ||
| */ | ||
| virtual Filesystem::Instance& fileSystem() PURE; |
There was a problem hiding this comment.
In this case I think what we want to do is provide an API& here, and from that we can get the file-system (and thread-system, and probably also the stats symbol table when I eventually merge that).
|
|
||
| ssize_t InstanceImplPosix::fileSize(const std::string& path) { | ||
| struct stat info; | ||
| if (::stat(path.c_str(), &info) != 0) { |
There was a problem hiding this comment.
if the file is open it would be better to use fstat or the iostream equivalent. Maybe we just need this API too on the File object.
| // TODO(htuch): When we are using C++17, switch to std::filesystem::canonical. | ||
| char* resolved_path = ::realpath(path.c_str(), nullptr); | ||
| if (resolved_path == nullptr) { | ||
| throw EnvoyException(fmt::format("Unable to determine canonical path for {}", path)); |
There was a problem hiding this comment.
I realize this code is just moved, but this seems like a good time to use the pattern from Api::SysCallSizeResult and return pair<std::string, errno> here; IMO the throw/catch is not making the code much clearer, and we are not capturing the error message.
At a minimum let's not eat the error message from realpath().
| * @param file_flush_interval_msec Number of milliseconds to delay before flushing. | ||
| */ | ||
| virtual StatsFileSharedPtr | ||
| createStatsFile(const std::string& path, Event::Dispatcher& dispatcher, |
There was a problem hiding this comment.
Slightly more thoughts on this: I think it might make sense to have separate decorator classes for file-flushing and stat-collecting, to be determined when constructing the Filesystem that goes into the API, but those don't change the interface; we should still be working with Filesystem everywhere.
Then if we want a new mix-in we can add it, and it can just have one job.
Or I'm fine having them all in Envoy::Filesystem::Instance if @mattklein123 wants to stick with that :)
| /** | ||
| * @return Filesystem::Instance& a reference to the filesystem instance | ||
| */ | ||
| virtual Filesystem::Instance& fileSystem() PURE; |
There was a problem hiding this comment.
Actually, Filesystem is already part of the API. Are you sure you need any of this extra plumbing in this PR?
| * Open the file with O_RDWR | O_APPEND | O_CREAT | ||
| * The file will be closed when this object is destructed | ||
| */ | ||
| virtual void open() PURE; |
There was a problem hiding this comment.
+100 but I also appreciate you don't want to change semantics in a giant PR.
Can you leave a TODO?
|
I issued #5660 which adds some Api::Api plumbing, which might help this effort as well. |
|
In addition to #5692, we have https://github.com/greenhouse-org/envoy/commit/11a43b89c480b446039205919af4d4f92aec3f28 (split RawFile / RawInstance out from the decorator class) ready to PR once #5692 is merged |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
@sesmith177 if you are going to come back to this I can re-open. Please ping if so. |
|
no need, once #5772 is merged, we'll submit a fresh PR to add the Windows support |
Description:
This PR adds a Windows implementation for code that directly interacts with the filesystem. It does this by replacing the static methods in Envoy::Filesystem with an abstract class with separate Windows / POSIX implementations. This object is then passed through to where it is needed.
We decided to use the strategy of passing a Filesystem::Instance& through to where it is needed (as opposed to keeping separate static implementations and having Bazel choose the right one) due to feedback received in #5072
It also looks like passing the Filesystem::Instance& through every where may help with #5126. The names
StatsInstanceandStatsFilewere taken from discussion in that PR, but we would be willing to accept a better suggestion for the names of these objects.If we feel that the switch from static methods to passing an object around is too much of a change, we could rework this PR.
Risk Level:
Medium - no change in behavior, but this touches a whole bunch of files
Testing:
bazel build //source/... && bazel test //test/...Docs Changes:
N/A
Release Notes:
N/A