Add platform agnostic socket error macros and error details helper#11565
Add platform agnostic socket error macros and error details helper#11565mattklein123 merged 27 commits intoenvoyproxy:masterfrom greenhouse-org:io-error-macros
Conversation
- WSAE* error codes are a disjoint set from POSIX error codes (which are still defined on Windows for non winsock2 errors) so we define macros that map to these error codes on Windows and the usual POSIX error codes on other platforms - do not use strerror to get error details on Windows, instead use FormatMessage and add utility function to replace sterror usage - replace usual error code macros with new SOCKET_ERROR_ macros in applicable locations where they come from socket operations - any differences in error code returned by socket operations continue to be handled with an #ifdef to make the differences between platforms clear (e.g. when SOCKET_ERROR_AGAIN is returned by Windows but SOCKET_ERROR_IN_PROGRESS is returned by other platforms) Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
|
We experimented with and considered other options to address the general issue described below: Problems:
Options:
|
| } else { | ||
| ASSERT(result.rc_ == -1); | ||
| if (result.errno_ == EINPROGRESS) { | ||
| #ifdef WIN32 |
There was a problem hiding this comment.
This is an example of an error code that is different on Windows vs POSIX, personally I think it may be valuable to keep the #ifdef to ensure developers understand the differences between connect() on different platforms, but we could potentially abstract this error check in a different way, open to feedback about this
There was a problem hiding this comment.
I like the approach as long as we don't have too many instances of the same syscall
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
|
Should we also start removing the fail on windows tag with this change to have a sense of where we are at with getting all the tests running? |
|
When you resolve a test failure, remove the corresponding fails_on_windows tag. You can simply change your bazel test command to ignore that flag to review newly passing tests, or only build tests with that flag present to discover newly passing tests. In any case removing that flag without fixing the bug will break CI, so please don't do that. |
|
@junr03 Can you take a pass on this? |
- pre-allocating error message buffer we may fail to fetch error details for errors with long descriptions - revert IoFileError.getErrorDetails() to use strerror, we will take another PR to refactor Filesystem Impl on Windows to use Win32 APIs not POSIX Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
|
|
||
| const std::string errorDetails(int error_code) { | ||
| #ifndef WIN32 | ||
| return strerror(error_code); |
There was a problem hiding this comment.
Can we document why strerror cannot be used?
There was a problem hiding this comment.
added a comment describing below
| * @param int error code | ||
| * @return const std::string error detail description | ||
| */ | ||
| const std::string errorDetails(int error_code); |
There was a problem hiding this comment.
can we somehow enforce that new code does not use strerror? clang-tidy rule?
There was a problem hiding this comment.
I'm not super experienced with clang-tidy and couldn't find a check that could do this, do you know of one? We could add to the format checks potentially instead?
There was a problem hiding this comment.
we're thinking of doing a format check instead, since we also have a couple places where we intentionally left strerror b/c there is another refactor required before we get rid of it (the filesystem impl on Windows uses the POSIX subsystem which is problematic for error handling purely Windows errors), we can use clang-format off/on to guard around those blocks that need it still
There was a problem hiding this comment.
/wait
adding checks for strerror in clang format
There was a problem hiding this comment.
added a new format check, please take a look!
| os_syscalls.recv(socket.ioHandle().fd(), buf_, Config::MAX_INSPECT_SIZE, MSG_PEEK); | ||
| ENVOY_LOG(trace, "http inspector: recv: {}", result.rc_); | ||
| if (result.rc_ == -1 && result.errno_ == EAGAIN) { | ||
| if (SOCKET_FAILURE(result.rc_) && result.errno_ == SOCKET_ERROR_AGAIN) { |
There was a problem hiding this comment.
out of curiosity, do we have any more places with the old-style comparison that should be converted to SOCKET_FAILURE?
There was a problem hiding this comment.
I found a few more and converted them, it was tough to think of a format or other automation to detect it as it is dependent on if the return code comes from a socket function, lmk if you think there is a good way to auto detect this
| #define ENVOY_SHUT_RDWR SD_BOTH | ||
|
|
||
| // Mapping winsock2 errors to common error names | ||
| #define SOCKET_ERROR_AGAIN WSAEWOULDBLOCK |
There was a problem hiding this comment.
I am wondering if we should document this decision a bit more inline, i.e., expand on the thinking about defining envoy-neutral socket errors and maping them to platform specific definitions?
There was a problem hiding this comment.
Added some more content here, lmk if this is sufficient
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
|
/retest envoy-presubmit for DNS failure |
|
🐴 hold your horses - no failures detected, yet. |
Signed-off-by: William A Rowe Jr <wrowe@vmware.com>
cc @junr03 @ggreenway if you have thoughts on this No longer the last clang-tidy error (we will fix those others in a commit soon), but still relevant |
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com> Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
|
envoy/test/integration/integration.cc Line 470 in d0b4b4f The ubsan error in this failure could be fixed by grabbing the counter once instead of each time it is needed, see https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/43031/logs/104 for a raw log that hopefully does not go away with future pushes (AZP build: https://dev.azure.com/cncf/envoy/_build/results?buildId=43031&view=logs&jobId=930b2ee4-0ea7-5b74-a645-6bc26962bc06) |
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com> Co-authored-by: Sunjay Bhatia <sunjayb@vmware.com> Signed-off-by: William A Rowe Jr <wrowe@vmware.com> Co-authored-by: William A Rowe Jr <wrowe@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
Signed-off-by: Sunjay Bhatia <sunjayb@vmware.com>
|
@junr03 @ggreenway can we get this PR merged with master 🚀 This PR unblock various tests that are failing on windows and I have been working on top of it for some time now and it would be awesome if we have it in master. |
junr03
left a comment
There was a problem hiding this comment.
lgtm. @ggreenway do you mind giving a final look?
| __pragma(pack(pop)) | ||
|
|
||
| using ssize_t = ptrdiff_t; | ||
| typedef ptrdiff_t ssize_t; |
There was a problem hiding this comment.
nit: I believe we bias to using using.
There was a problem hiding this comment.
We are aware of this and wrote the original solve. We've run across an edge case where using was insufficient, and these are global C declarations of posix to be emulated correctly on Windows (We did joke about making it a #define, obviously not seriously:)
There was a problem hiding this comment.
I believe we changed those to typedef instead of using to appease clang-tidy in this commit: 6ab3de8, if there is a clang-tidy trick we don't know of (besides just turning tidy off for that section) we could, change to, that would be great to know
| #define ENVOY_SHUT_RDWR SD_BOTH | ||
|
|
||
| // Mapping winsock2 errors to common error names | ||
| #define SOCKET_ERROR_AGAIN WSAEWOULDBLOCK |
I'm a bit backed up with some end-of-quarter stuff. I'm going to unassign myself. @envoyproxy/senior-maintainers PTAL. |
Commit Message:
Add platform agnostic socket error macros and error details helper
still defined on Windows for non winsock2 errors) so we define macros
that map to these error codes on Windows and the usual POSIX error
codes on other platforms
FormatMessage and add utility function to replace strerror usage
applicable locations where they come from socket operations
to be handled with an #ifdef to make the differences between platforms
clear (e.g. when SOCKET_ERROR_AGAIN is returned by Windows but
SOCKET_ERROR_IN_PROGRESS is returned by other platforms)
Additional Description:
Risk Level: Low (should be a 1-1 replacement for Linux etc.)
Testing: Adds a unit test for Windows and modifies some to use the new macros etc.
Docs Changes: N/A
Release Notes: N/A
Fixes #11298