Skip to content

switch to strerror_r for reentrant error string conversion#10969

Closed
avsm wants to merge 3 commits intoocaml:trunkfrom
avsm:strerror_r
Closed

switch to strerror_r for reentrant error string conversion#10969
avsm wants to merge 3 commits intoocaml:trunkfrom
avsm:strerror_r

Conversation

@avsm
Copy link
Copy Markdown
Member

@avsm avsm commented Jan 30, 2022

We currently use strerror in the runtime, which isn't reentrant in the presence of multiple threads (noted by @edwintorok in #10861). This PR switches over to strerror_r, which isn't quite as straightforward as I expected.

In glibc, there are two entirely separate variations of the function: a GNU one and a POSIX one. The memory management of both are entirely distinct, as the GNU one can sometimes return a static string, and sometimes write into the supplied buffer. This PR adopts (I think) the POSIX one, but opts to have an empty error message string rather than figuring out how to deal with a failing strerror_r call (which itself varies across old glibc<2.13).

Then there is the question of "how big should the strerror_r buffer be?". Old glibc used to use a 1024-byte buffer, but now it's all intertwined in locale-specific logic. I just opted for a simple 1024 byte stack buffer.

There are a few places left in the runtime that still use strerror, notably in platform.h which looks a bit performance-sensitive and inline:

Caml_inline void check_err(char* action, int err)
{
  if (err) {
    caml_fatal_error_arg2(
      "Fatal error during %s", action, ": %s\n", strerror(err));
  }
}

Is it ok to use dynamic stack allocation here in the if (err) branch to have a scratch buffer without stack-allocating it for every call of check_err?

@avsm
Copy link
Copy Markdown
Member Author

avsm commented Jan 30, 2022

It looks like strerror should continue to be used on Windows, as it returns a pointer to a thread-local buffer.

Copy link
Copy Markdown
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a nettle to have grasped 🙂

Various thoughts:

  • We pepper around _GNU_SOURCE at various points, so it unnerves me that there's a reliance on the definition of strerror_r that we get in a header. That said, it looks like the whole of sync.h should be guarded with CAML_INTERNALS anyway, which would lessen that worry.
  • Given that paranoia, can we do #ifdef _GNU_SOURCE tests before string.h is included with an #error if it is defined?
  • For platform.h, I think a small refactoring can sort that - caml_fatal_error_arg2 is defined solely for its use (I think that's a multicore addition, but I didn't check the git-blame) - we could instead define caml_fatal_strerror or something which takes the int argument and remove the strerror part from platform.h - the definition of caml_fatal_strerror can then use strerror_r as normal.
  • Nice to see an unusual instance of the Windows CRT function being ahead of the game 🤪

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jan 30, 2022

Also linking in #10505, as the POSIX 2001 part of that might therefore be relevant for 5.00.0

@edwintorok
Copy link
Copy Markdown
Contributor

edwintorok commented Jan 30, 2022

Thanks for tackling strerror.

  • We pepper around _GNU_SOURCE at various points, so it unnerves me that there's a reliance on the definition of strerror_r that we get in a header. That said, it looks like the whole of sync.h should be guarded with CAML_INTERNALS anyway, which would lessen that worry.

How about defining a caml_safe_strerror in a minimal C file where you have more tight control over headers and macros, and then using that?
Something like (might need some ifdef WIN32 here or in a header to just use strerror with the per-thread buffer):

#undef _GNU_SOURCE

#include <string.h>
#if _POSIX_C_SOURCE < 200112L
#error "string.h is too old"
#endif

#include <errno.h>
#include <stdio.h>

void caml_safe_strerror(int errnum, char* strerrbuf, size_t buflen)
{
    int rc;
    if (!strerrbuf) return;
    if ((rc = strerror_r(errnum, strerrbuf, buflen))) {
        int err = rc == -1 ? errno : rc;
        snprintf(strerrbuf, buflen, "errno (original: %d, strerror_r: %d)", errnum, err);
    }
}

Some advantages:

  • it is a void function that always succeeds (falling back to just printing errno in case strerror_r fails for whatever reason
  • it doesn't crash if strerrbuf is NULL (although this should never happen, one could sprinkle some GCC attributes in the header defining caml_strerror to warn if strerrbuf is NULL or buflen is not the right size
  • it avoids some header potentially defining _GNU_SOURCE and getting the wrong function here (or perhaps some defining _GNU_SOURCE in CFLAGS for some reason)

Disadvantage:

  • an extra function call when errors are encountered (though OCaml will likely end up raising an exception in this case anyway, so likely not on a performance critical path)
  • it doesn't handle failures from snprintf itself (should it just strncpy a static string in that case?)

It is very unlikely that strerror_r itself would fail in practice, so maybe the fallback code could just print errno %d?

@avsm
Copy link
Copy Markdown
Member Author

avsm commented Jan 30, 2022

Many thanks for the rapid comments, @dra27 @edwintorok. I do prefer the idea of a caml_safe_strerror function rather than peppering the source with ERANGE checks.

One thought I had to simplify the caml_safe_strerror function is that we could give it a __thread local buffer to copy the results into. This would allow us to use the GNU variant (which might return a pointer to some other buffer) and also the Windows strerror would just work. I think this is safe since the only thing the runtime does with the strerror result is to blit it into some OCaml value right after.

@edwintorok
Copy link
Copy Markdown
Contributor

I like the idea of a thread local buffer, but if you use the GNU variant we should have some fallback code for non-Linux and non-Windows OSes (I'm assuming the POSIX strerror_r is the only thing available there?)

Comment on lines 77 to 78
Caml_inline void sync_check_error(int retcode, char * msg)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: why is this (big) function inlined and declared in a .h while it's obviously not performance critical and could be static to sync.c ? Something needs to be done.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the check_sync_error is used in more places than just sync.c, so it looks better to sort this out in a followup PR.

otherlibs/systhreads/st_posix.h
388:  sync_check_error(retcode, "Thread.sigmask");
408:  sync_check_error(retcode, "Thread.wait_signal");

otherlibs/systhreads/st_stubs.c
587:    sync_check_error(err, "Thread.create");
592:    sync_check_error(err, "Thread.create");
638:    sync_check_error(err, "caml_register_c_thread");
729:  sync_check_error(rc, "Thread.join");
764:  sync_check_error(st_event_create(&ts), "Thread.create");

runtime/sync.c
100:  sync_check_error(sync_mutex_create(&mut), "Mutex.create");
120:  sync_check_error(retcode, "Mutex.lock");
130:  sync_check_error(retcode, "Mutex.unlock");
140:  sync_check_error(retcode, "Mutex.try_lock");
199:  sync_check_error(sync_condvar_create(&cond), "Condition.create");
218:  sync_check_error(retcode, "Condition.wait");
226:  sync_check_error(sync_condvar_signal(Condition_val(wrapper)),
233:  sync_check_error(sync_condvar_broadcast(Condition_val(wrapper)),

runtime/caml/sync.h
77:Caml_inline void sync_check_error(int retcode, char * msg)

@xavierleroy
Copy link
Copy Markdown
Contributor

It looks like strerror should continue to be used on Windows, as it returns a pointer to a thread-local buffer.

I'm pretty sure many C libraries implement strerror like that too, but nobody wants to say it in the docs.

@xavierleroy
Copy link
Copy Markdown
Contributor

Re caml_fatal_error_arg_2, #10966 already proposes to get rid of it.

@avsm avsm force-pushed the strerror_r branch 2 times, most recently from 8963b9b to 363648e Compare January 31, 2022 16:00
This uses the glibc or POSIX variants of strerror_r on *nix,
and strerror on Win32 (which is thread-safe).
avsm added a commit to avsm/ocaml that referenced this pull request Jan 31, 2022
@avsm
Copy link
Copy Markdown
Member Author

avsm commented Jan 31, 2022

This is ready for re-review now. It implements a caml_strerror which use a TLS buffer and doesn't do any unnecessary copying now (hopefully). Tested on Linux glibc, FreeBSD, macOS. Still need to do OpenBSD and Linux musl.

/* platform dependent best-effort thread naming */
extern void caml_thread_setname(const char* name);

extern char *caml_strerror(int errnum);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick: would const char* be better? That'll make it obvious it shouldn't be free()-d.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It don't think this can be const, since the pointer returned by the underlying strerror implementation is a char * (for historical reasons) and that might point to other storage than our TLS buffer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prototype of strerror itself is a char*, so lets keep it that way then.
Although I thought you could always return a char* as const char*, i.e. advertise the intention to not modify it, but not the other way around (if you were given something with the promise not to modify you can't go ahead and modify it), but its been a while since I've dabbled in C code.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I don't like the use of thread-local variables here, esp. when declared using __thread, which is not standard C as far as I know. Can I please have some time to think about an alternate implementation? Thank you.

@avsm
Copy link
Copy Markdown
Member Author

avsm commented Jan 31, 2022

Yes, of course. It is unsatisfying to have to cart around 1K of extra TLS, especially when most of the time the underlying implementation returns a pointer to a statically declared string (but might not).

Regarding __thread, we'll have to fix the use of that in a number of other places in the runtime. My understanding of the more standard thread_local in C11 is that it comes with a potential runtime performance penalty due to differing static initialisation semantics from __thread.

@avsm
Copy link
Copy Markdown
Member Author

avsm commented Feb 1, 2022

I've been peering inside the implementation of strerror functions, and the only real case where corruption might result seems to be if another thread changes the locale while an strerror() is running and so the returned pointers to error strings will mutate. I guess we can't control the case where we're embedded in a C library and that changes the locale, so I'm discounting this line of thinking unless someone else can spot a way.

@xavierleroy
Copy link
Copy Markdown
Contributor

Apologies for leaving you hanging. My trivial variation on this PR is here: #11010.

@avsm avsm closed this Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants