Conversation
When a connect() is performed on a non-blocking socket, it'll return -1/EINPROGRESS immediatly and when select()'ed for writability it'll succeed or fail but when it fails the reason is hidden. `getsockopt(,,SO_ERROR,,)` solves this issue on some sytems but on some cases such as FreeBSD getsockopt returns 0 even on `EAGAIN`. The following trick issues a `getpeername()` which will return -1/ENOTCONN if the socket is not connected and read(,,1) will produce the right errno.
Note that this implementation doesn't follow RFC4122. It's a fully random generated UUID instead of time-based.
osx: invoke fcntl to set O_NONBLOCK (cannot be set via socket(2))
htuch
left a comment
There was a problem hiding this comment.
Just a quick skim so far. This is great, thanks for the contribution.
patches/freebsd11_osymlink.patch
Outdated
| @@ -0,0 +1,95 @@ | |||
| --- ./lib/libc/sys/open.2.orig 2016-09-28 16:25:57.000000000 -0700 | |||
There was a problem hiding this comment.
It was in @Reflejo's original branch. I gather it's a FreeBSD path required for the kqueue WatcherImpl to work. I'm fine to remove it.
There was a problem hiding this comment.
It's a patch for freebsd11's kernel to enable O_SYMLINK
There was a problem hiding this comment.
Can't we watch the directory (similarly to the inotify() implementation on Linux)? I don't think patching FreeBSD kernel and adding custom features to it is a viable prerequisite for Envoy on FreeBSD.
There was a problem hiding this comment.
Yea I didn't intended this to go to upstream. Honestly I don't think there is a way to do what envoy is doing for the symlink observation without this. But I didn't investigate this longer enough for a conclusive answer.
There was a problem hiding this comment.
To spin this differently... @mattklein123: now, that we have LDS and overall better xDS support, do you think that we still need the runtime configuration (and therfore Watcher)?
There was a problem hiding this comment.
Yes, we need it. We will also support loading xDS APIs off of filesystem. Either way, I think we can ignore BSD for the purpose of this PR. Let's just get OSX working and move in. If someone like @Reflejo wants to add in BSD support after and can test on current master, that's fine.
| #include "common/filesystem/watcher_impl.h" | ||
| // NOLINT(namespace-envoy) | ||
| #if defined(__linux__) | ||
| #include "common/filesystem/watcher_impl_linux.cc" |
There was a problem hiding this comment.
Can we use select based Bazel target srcs/hdrs to avoid this kind of compile time conditioning?
| int InstanceBase::flagsFromSocketType(SocketType type) const { | ||
| #if defined(__FreeBSD__) | ||
| int flags = O_NONBLOCK; | ||
| #elif defined(__APPLE__) |
There was a problem hiding this comment.
In various places you condition on both FreeBSD and APPLE. Why is this? Are you adding FreeBSD and OS X support? Is the FreeBSD part of the OS X support (but then why special APPLE conditions?)?
There was a problem hiding this comment.
I think we should keep it since I got freeBSD to pass all tests as well. I'd double check that it's still the case. The delta between FreeBSD support and OSX is actually pretty small
There was a problem hiding this comment.
going to remove the partial FreeBSD support per gitter conversation
| int fd = ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
| #ifdef __APPLE__ | ||
| if (fd != -1) { | ||
| RELEASE_ASSERT(fcntl(fd, F_SETFL, O_NONBLOCK) != -1); |
There was a problem hiding this comment.
Can you add a comment explaining why this is a special case for APPLE?
| // reason is hidden. `getsockopt(,,SO_ERROR,,)` solves this issue on some systems but on some | ||
| // cases such as FreeBSD getsockopt returns 0 even on `EAGAIN`. The following trick issues a | ||
| // `getpeername()` which will return -1/ENOTCONN if the socket is not connected and read(,,1) | ||
| // will produce the right errno. |
There was a problem hiding this comment.
This seems sketchy. Is there some reference to where this is shown to be accurate for all errnos?
There was a problem hiding this comment.
There is a good summary here: https://cr.yp.to/docs/connect.html
There was a problem hiding this comment.
I believe that connection error will be available in kq_errno on OSX & FreeBSD and it should be used instead of getsockopt(SO_ERROR) on those platforms, so this might not be necessary at all.
There was a problem hiding this comment.
I can only offer that it works, empirically. This case is triggered regularly in testing on OS X. I think it makes sense to keep it disabled under Linux since it'll only add overhead and doesn't seem to be a problem there.
| ASSERT(0 == rc); | ||
| UNREFERENCED_PARAMETER(rc); | ||
|
|
||
| #if !defined(__linux__) |
There was a problem hiding this comment.
Is this true for all non-Linux or only for OS X?
There was a problem hiding this comment.
I believe it's true for FreeBSD and OS X. I'll just change it to #ifdef __APPLE__.
| uuid_unparse(uuid, generated_uuid); | ||
| #else | ||
| sprintf(generated_uuid, "%08lx-%04x-%04x-%04x-%04x%08lx", | ||
| static_cast<unsigned long>(arc4random()), static_cast<unsigned>(arc4random() & 0xffff), |
There was a problem hiding this comment.
Probably best to generate the minimum number of random numbers required here before the sprintf. Also, would prefer snprintf if using C formatters for safety.
There was a problem hiding this comment.
I'll just remove this since it's not used for Linux or OS X.
| #else // __linux__ | ||
|
|
||
| /** | ||
| * No-op implementation of HotRestart for everybody else. |
There was a problem hiding this comment.
We (@ Google) would find this useful even on Linux. @mrice32. Maybe there is some way to condition this at build time.
There was a problem hiding this comment.
On this point, can we potentially just revert all the changes to hot_restart_impl.h/cc, and just add a new hot_restart_nop_impl.h/cc, and do select() compile? This will keep existing code cleaner and make it possible to compile out hot restart support. (I think MSFT people will want this too initially).
lizan
left a comment
There was a problem hiding this comment.
I still see build errors like this:
(on my merged HEAD from 1346 and 1348).
source/common/http/async_client_impl.cc:17:73: error: default initialization of an object of const type 'const AsyncStreamImpl::NullRetryPolicy' without a user-provided default constructor
const AsyncStreamImpl::NullRetryPolicy AsyncStreamImpl::RouteEntryImpl::retry_policy_;
^
{}
| # TODO(mattklein123): It's not great that we universally link against the following libs. | ||
| # In particular, -latomic is not needed on all platforms. Make this more granular. | ||
| linkopts = ["-pthread", "-latomic"], | ||
| linkopts = ["-pthread", "-latomic", "-luuid"], |
There was a problem hiding this comment.
You will need to install libuuid-dev for this in build container, and also add this to doc.
| #include "tclap/CmdLine.h" | ||
|
|
||
| #if defined(__APPLE__) | ||
| #define uint64_t unsigned long |
There was a problem hiding this comment.
Why do you need this? uint64_t should just exists.
There was a problem hiding this comment.
This should be in header <cstdint> included above. You might need to do using std::uint64_t; however.
There was a problem hiding this comment.
uint64_t exists but is typedef unsigned long long. Without this you get this error:
In file included from source/server/options_impl.cc:12:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/CmdLine.h:27:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/SwitchArg.h:30:
In file included from external/envoy_deps/thirdparty/tclap/include/tclap/Arg.h:54:
external/envoy_deps/thirdparty/tclap/include/tclap/ArgTraits.h:80:22: error: type 'unsigned long long' cannot be used prior to '::' because it has no members
typedef typename T::ValueCategory ValueCategory;
^
external/envoy_deps/thirdparty/tclap/include/tclap/ValueArg.h:403:37: note: in instantiation of template class 'TCLAP::ArgTraits<unsigned long long>' requested here
ExtractValue(_value, val, typename ArgTraits<T>::ValueCategory());
^
external/envoy_deps/thirdparty/tclap/include/tclap/ValueArg.h:363:5: note: in instantiation of member function 'TCLAP::ValueArg<unsigned long long>::_extractValue' requested here
_extractValue( args[*i] );
^
source/server/options_impl.cc:31:29: note: in instantiation of member function 'TCLAP::ValueArg<unsigned long long>::processArg' requested here
TCLAP::ValueArg<uint64_t> base_id(
^
1 error generated.
Swapping out the typedef I added for using std::uint64_t; produces an identical error.
There was a problem hiding this comment.
Hrrm, this is unfortunate. I don't understand why the define doesn't cause the same problem.
If we can't figure out a better solution please do add a detailed comment explaining why it's done this way.
There was a problem hiding this comment.
Finally figured this out. TCLAP provides implementations of ArgTraits<T> for various types, but the ones for unsigned long long and long long are gated by #ifdef HAVE_LONG_LONG, which isn't set. I think they were expecting users of libraries to use autoconf which would detect this and make the necessary #define. I'll post it with a few other things tomorrow (split out in a new PR).
dnoe
left a comment
There was a problem hiding this comment.
A few comments across stuff I have specific familiarity with.
source/exe/signal_action.cc
Outdated
| #ifdef REG_RIP | ||
| // x86_64 | ||
| error_pc = reinterpret_cast<void*>(ucontext->uc_mcontext.gregs[REG_RIP]); | ||
| #elif defined(__APPLE__) |
There was a problem hiding this comment.
Do you want to add a conditional for x86 32 bit here? No need to implement it now, but I'm guessing a 32 bit compile will fail because there's no __rip.
| } | ||
|
|
||
| #ifdef __APPLE__ | ||
| #define MAP_STACK (0) |
There was a problem hiding this comment.
Does it make sense to add #ifndef MAP_STACK around this in case Darwin adds real map_stack in the future?
|
|
||
| void SignalAction::removeSigHandlers() { | ||
| #if defined(__APPLE__) | ||
| // ss_flags contains SS_DISABLE, but Darwin still checks the size, contrary to the man page |
| SignalAction() | ||
| : guard_size_(sysconf(_SC_PAGE_SIZE)), altstack_size_(guard_size_ * 4), altstack_(nullptr) { | ||
| : guard_size_(sysconf(_SC_PAGE_SIZE)), | ||
| altstack_size_(std::max(guard_size_ * 4, static_cast<size_t>(MINSIGSTKSZ))), |
There was a problem hiding this comment.
I think if the MINSIGSTKSZ case is hit then protections for the immediately adjacent pages still works OK, there's just a portion at the end which is not protected. The second mprotect call above could have the size adjusted to cover this case.
There was a problem hiding this comment.
I think it's correct. We ultimately mmap altstack_size_ + 2 * guard_size_ bytes (see mapSizeWithGuards). altstack_size_ doesn't include the guard pages. The second mprotect calls protects guard_size_ bytes at an address that's guard_size_ + altstack_size_ bytes from the start of the mapped memory.
We call sigaltstack with altstack_ + guard_size_ (to skip the first guard page) and altstack_size_ (which doesn't count the guard pages, and so stops just before the second guard page).
Did I miss something?
There was a problem hiding this comment.
Ah, yes, all of this looks correct. Guard pages are always outside the altstack_size_, and MINSIGSTKSZ should always be a multiple of page size.
| #include "tclap/CmdLine.h" | ||
|
|
||
| #if defined(__APPLE__) | ||
| #define uint64_t unsigned long |
There was a problem hiding this comment.
This should be in header <cstdint> included above. You might need to do using std::uint64_t; however.
|
|
||
| set -e | ||
|
|
||
| if [[ `uname` != "Linux" ]] |
There was a problem hiding this comment.
I think there's probably a way this test could be completely disabled with a Bazel rule instead.
There was a problem hiding this comment.
Yes, but I don't know how.
Normally, I could use select to choose either no test script or a different one. However, the output of select is only usable by bazel rules (because only at the point of rule execution is it possible to inspect the configuration to choose which subset of parameters to select).
envoy_sh_test is a macro and it attempts to access members of its srcs parameter to produce inputs for several rules. This doesn't work if srcs is assigned from select (whose output has no [] operator). To make it work, envoy_sh_test would have to become a rule and at that point I'm in over my head.
|
|
||
| #include "spdlog/spdlog.h" | ||
|
|
||
| #ifdef __APPLE__ |
There was a problem hiding this comment.
This will end up getting used again elsewhere. Can we put this into common/common/byte_order.h or something?
mattklein123
left a comment
There was a problem hiding this comment.
Thanks a ton for doing this. Some quick comments from a skim. Will do a more full review after the next pass of comment fixes once everyone else has looked.
| // Since we are using SOCK_NONBLOCK, there are cases (on FreeBSD for example) where the immediate | ||
| // connect response is EINPROGRESS but the socket failed and the calling code might not know about | ||
| // it yet. | ||
| if (-1 == rc && errno == ECONNRESET) { |
There was a problem hiding this comment.
Can we guard this with free bsd only then?
There was a problem hiding this comment.
I've removed it since it doesn't seem to happen on OSX.
| return ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
| int fd = ::socket(AF_INET, flagsFromSocketType(type), 0); | ||
| #ifdef __APPLE__ | ||
| if (fd != -1) { |
There was a problem hiding this comment.
is there any point in doing fd != -1 check here? (Same in other places). Will just crash anyway in RELEASE_ASSERT.
There was a problem hiding this comment.
I did the check because the original code didn't assert (for AF_INET and AF_UNIX). I'll go ahead and just assert on the fd and then assert on the fcntl call.
| return Address::InstanceConstSharedPtr{ | ||
| new Address::Ipv4Instance(reinterpret_cast<sockaddr_in*>(&orig_addr))}; | ||
|
|
||
| case AF_INET6: |
There was a problem hiding this comment.
Do these new cases and the throw have code coverage on linux CI? If not will need to refactor somehow to get coverage.
| #else // __linux__ | ||
|
|
||
| /** | ||
| * No-op implementation of HotRestart for everybody else. |
There was a problem hiding this comment.
On this point, can we potentially just revert all the changes to hot_restart_impl.h/cc, and just add a new hot_restart_nop_impl.h/cc, and do select() compile? This will keep existing code cleaner and make it possible to compile out hot restart support. (I think MSFT people will want this too initially).
| TEST(Ipv6CidrRange, StringAndLengthCtor) { | ||
| CidrRange rng; | ||
| rng = CidrRange::create("::ffff", 122); // Assignment operator. | ||
| rng = CidrRange::create("ff::ffff", 122); // Assignment operator. |
There was a problem hiding this comment.
Are we losing any linux test coverage w/ these changes? cc @hennna
There was a problem hiding this comment.
Not off the top of my head. I don't think these tests are dependent on the address value. Is the issue with OSX the leading ::?
There was a problem hiding this comment.
Sort of. Wikipedia tells me an old version of the IPv6 spec considered ::/96 addresses (that is addresses with 96 bits of leading zeros) to be IPv6-mapped IPv4 addresses. It's been deprecated for a decade, but when OSX converts, say ::ffff, to a string it comes out as "::0.0.255.255" and the tests fail because they do string comparisons. Linux only provides the "mapped IPv4" style addresses with trailing dotted quads for the ::ffff:0:0/96 range (which replaced ::/96). It seems to me that these test are concerned with how the low order bits of the address are manipulated and that using a different prefix doesn't change the meaning of the tests.
(I wrote a little program to round trip IP address strings to binary and back. https://gist.github.com/zuercher/0d0bc5a433fda423124c71afa2eb29e1)
| TestLoadBalancerContext context(0); | ||
| EXPECT_EQ(cluster_.hosts_[5], lb_.chooseHost(&context)); | ||
| } | ||
| #else |
There was a problem hiding this comment.
Not sure if it's really worth doing this. This is already not great and related to my TODO above. Seems like it might be better to just not run these tests on Apple until someone fixes the TODO to compile murmur3 or city into the code (which is not difficult).
… to select between real and no-op HotRestartImpl code
…P_STACK existing in the future
| void free(Stats::RawStatData& data) override; | ||
|
|
||
| private: | ||
| SharedMemory& shmem_; |
There was a problem hiding this comment.
IMO the NOP hot restart impl should not use shared memory or the process shared mutex stuff. Basically, I would just do exactly what https://github.com/lyft/envoy/blob/master/test/integration/server.cc#L93 does which is to use process local locks, the heap allocator, etc. I think you can probably revert all the changes related to shared memory and just move the test impl into prod code somewhere and use that.
|
@zuercher general comment. Can we start to split out some of this change into self contained PRs? Some of the test fixes can go in and get reviewed independently, as can some of the trivial header changes, etc. That way we can start merging the not controversial stuff and trim down the size of the main PR. Thank you. |
|
Going to close this out in lieu of the other PRs that are ongoing. |
Some highlights:
source/common/filesystem: adds a kqueue-based implementation of WatcherImpl. The conditional compiliation is hacky.
source/common/network: handle odd-ball FreeBSD/osx errors on get/setsockopt. Also, SOCK_NONBLOCK doesn't exist on FreeBSD or OSX. OSX has an extra byte field in
struct sockaddrand derivatives that needs to be accounted for in some places.source/common/runtime: use libuuid to generate UUIDs on Linux and OSX. Freebsd gets a hand-rolled implementation.
source/exe: hot restart is completely disabled for osx and freebsd
test/common/http: on OSX,
struct tmconfigured for a date before 1901-12-13 20:45:52 becomes atime_tthat represents 1969-12-31 23:59:59.test/common/network: OSX doesn't like some of IPv6 addresses used in testing. By default only a single IPv4 loopback address exists in OSX (127.0.0.1).
Various and sundry explicit #include statements to pick up definitions that are obtained transitively on Linux.