Skip to content

Discard out_channel buffered data on permanent I/O error#12314

Merged
xavierleroy merged 4 commits intoocaml:trunkfrom
xavierleroy:io-error-handling
Jul 12, 2023
Merged

Discard out_channel buffered data on permanent I/O error#12314
xavierleroy merged 4 commits intoocaml:trunkfrom
xavierleroy:io-error-handling

Conversation

@xavierleroy
Copy link
Copy Markdown
Contributor

This PR explores the third approach mentioned here: #12300 (comment) . Implementing it led me to refactor some I/O code in the runtime system, but the new code is an improvement, if I may say so myself.

This PR is best reviewed commit-by-commit.

The third commit changes the flush operation over out_channel to discard the buffered data if the actual write fails with an EBADF ("bad file descriptor", including "the file descriptor was closed") or EPIPE ("pipe was closed at the other end") error. In both cases, retrying the write will fail, it's not a transient error condition like ENOSPC ("no space left on device"). So, before raising Sys_error, we just remove the buffered data.

This way, if the channel is finalized later, it will appear as empty (not containing unflushed data) and it will be freed immediately. This avoids the memory leak reported in #12300. It also avoids keeping the channel around and trying to flush it again when the program exits, which can lead to writing to the wrong file, as shown in #12300.

To implement this behavior, I had to change the interface of caml_read_fd and caml_write_fd. Currently, they return an error code for EINTR and directly raise a Sys_error exception for any other error. The second commit changes them to return -1 on any error and set errno to the error code, like a POSIX system call. It's now the caller's responsibility to act on the error, either by retrying, or by raising Sys_error, or by emptying an out_channel then raising Sys_error.

But there's a snag in the Windows implementation of caml_{read,write}_fd. They either call CRT functions, which leave POSIX-style error codes in errno, or Win32 socket functions, which produce Win32 error codes via WSAGetLastError(). So we need to convert from Win32 error codes to POSIX error codes. There's a _dosmaperr function in the CRT that does just this but is not exported. There's a caml_win32_maperr function in otherlibs/unix that does just this but we need it in the runtime. So, the first commit bites the bullet and moves the core of the caml_win32_maperr function from otherlibs/unix/unixsupport_win32.c to runtime/win32.c, where it can be used by caml_{read,write}_fd and also by caml_win32_rename. Phew!

Comment on lines -827 to -832
case ERROR_CURRENT_DIRECTORY: case ERROR_BUSY:
errno = EBUSY; break;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code maps ERROR_CURRENT_DIRECTORY to EACCES, not EBUSY, for consistency with _dosmaperr and the Win32 Unix library. It's a small change to map it back to EBUSY, but I'm not sure it's worth departing from _dosmaperr here.

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass over the code, could only spot one potential issue; question below.

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could not spot any other issues.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jun 22, 2023

One question is whether there are other error codes that should be handled in the same way as EBADF and EPIPE, but these are probably the most common ones.

@xavierleroy
Copy link
Copy Markdown
Contributor Author

I went through man 2 write and other error codes look like

  • transient errors, just retry: EAGAIN EWOULDBLOCK EIO
  • recoverable conditions, just fix the real issue and retry: EDESTADDRREQ EDQUOT (barely), EFBIG (barely too), ENOSPC EPERM
  • a bug in the runtime system (EFAULT) or in the user's program (EINVAL).

{ ERROR_LOCK_FAILED, 0, EACCES},
{ ERROR_ALREADY_EXISTS, 0, EEXIST},
{ ERROR_FILENAME_EXCED_RANGE, 0, ENOENT},
{ ERROR_NESTING_NOT_ALLOWED, 0, EAGAIN},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning EAGAIN on ERROR_NESTING_NOT_ALLOWED doesn't seem to fit the "resource temporarily unavailable" meaning. Wouldn't this potentially cause an infinite loop, since retrying won't fix the fact that there are nested LoadModule invocations earlier in the callchain. But Python's win32_to_errno function also does the same mapping to EAGAIN, so I'm probably wrong (but not sure why).

(I read through these mappings while reading the rest of the diff; possibly better as a followup PR if a change is needed in these mappings)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _dosmaperr function from the CRT maps ERROR_NESTING_NOT_ALLOWED to EAGAIN. As with ERROR_CURRENT_DIRECTORY, we could choose a different mapping, but do we really want to diverge from the CRT?

@xavierleroy
Copy link
Copy Markdown
Contributor Author

@dra27 do you have an opinion on the Windows part of this PR, esp. the mapping from Win32 error codes to POSIX error codes?

xavierleroy added a commit to xavierleroy/ocaml that referenced this pull request Jun 28, 2023
In case of error, they just return -1 after setting errno to an appropriate
POSIX error code.  The caller is responsible for processing the error.
Typically, EINTR handles the signal and retries, other errors raise Sys_error.
… I/O error

So that the out_channel can be reclaimed by finalization.
@xavierleroy xavierleroy merged commit 007c040 into ocaml:trunk Jul 12, 2023
@xavierleroy
Copy link
Copy Markdown
Contributor Author

Re-based, re-tested, and merged. The Win32 part, and especially the mapping of Win32 error codes, can still be discussed, I'm all ears.

@xavierleroy xavierleroy deleted the io-error-handling branch July 12, 2023 14:31
NickBarnes pushed a commit to NickBarnes/ocaml that referenced this pull request Jul 14, 2023
@jmid
Copy link
Copy Markdown
Member

jmid commented Jan 19, 2024

I believe this PR introduced a regression, see #12898 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants