Skip to content

Don't clobber GetLastError in Win32 systhreads caml_leave_blocking_section_hook#10220

Merged
shindere merged 1 commit intoocaml:trunkfrom
dra27:fix-8857
Feb 24, 2021
Merged

Don't clobber GetLastError in Win32 systhreads caml_leave_blocking_section_hook#10220
shindere merged 1 commit intoocaml:trunkfrom
dra27:fix-8857

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Feb 13, 2021

There are a few choices on how to do this:

Fixes #8857


static void caml_thread_leave_blocking_section(void)
{
#ifdef _WIN32
Copy link
Copy Markdown
Contributor

@yawaramin yawaramin Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no other uses of #ifdef _WIN32 in this file (other than the initial include on L51). Perhaps it would be better to follow the existing convention and factor out the caml_thread_leave_blocking_section function itself into the st_posix.h and st_win32.h files?

EDIT: oh, is that the 'introducing other hooks'? Sorry, I just noticed you might have alluded to that...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! My concern is that either it introduces two hooks, basically called do_windows_only_thing_before_lock_acquire and do_windows_only_thing_after_lock_acquire or the entire function logic is duplicated in both headers 🙂

@xavierleroy
Copy link
Copy Markdown
Contributor

In retrospect, I see no strong reason why caml_leave_blocking_section should preserve the last error code, whether errno or GetLastError(). A great many library functions clobber these error codes, and well-written code checks GetLastError() or errno just after the failing system call, not after calling library functions. So, I would rather fix bad code like #8857 to call GetLastError() earlier.

Now, I understand that the opposite decision (save errno in caml_leave_blocking_section) was taken in response to #5982, so I'm not against the present PR.

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 14, 2021

Two questions:

  • I don't understand why errno is saved in signals.c and LastError() in st_stubs.c. (David says that the reason is that "it is the hook that causes the problem", but I still don't understand why the two error codes are handled differently.)
  • The errno-saving logic is duplicated in the handle_signal functions as well (in signals_{byt,nat}.c). Is this duplication not necessary here, why are the two mechanisms different in this respect?

@xavierleroy
Copy link
Copy Markdown
Contributor

The errno-saving logic is duplicated in the handle_signal functions as well (in signals_{byt,nat}.c). Is this duplication not necessary here, why are the two mechanisms different in this respect?

I'm not sure asynchronous signals exist at all under Windows...

On the other hand, now that signal handling is clean and no longer runs arbitrary OCaml code, and assuming POSIX signal semantics, errno should not be clobbered by signal handling, and doesn't need saving and restoring.

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Feb 14, 2021

I don't understand why errno is saved in signals.c and LastError() in st_stubs.c. (David says that the reason is that "it is the hook that causes the problem", but I still don't understand why the two error codes are handled differently.)

caml_leave_blocking_section in signals.c does two things:

  1. some work on pending signals that is independent of the OS and of the presence of threads;
  2. call a hook that is dependent on the OS and on the presence of threads.

I suspect that in the old implementation of signal handling, part 1 could run nearly arbitrary code and therefore clobber errno. Hence it made sense to save and restore errno in the caml_leave_blocking_section function instead of in the hook function.

With the new implementation of signals, part 1 could quite possibly preserves errno and GetLastError. So it could make sense to push the code that preserves errno / GetLastError in the hooks. But I agree that, in this case, the generic caml_leave_blocking_section should not save and restore errno.

To summarize: maybe we need to rework errno handling in runtime/signal*.c to account for the new implementation of signal handling, before or at the same time we add code to preserve GetLastError.

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Feb 14, 2021

In retrospect, I see no strong reason why caml_leave_blocking_section should preserve the last error code, whether errno or GetLastError().

Indeed, my first thought was to fix utimes and then check the others... except that almost every single call to caml_leave_blocking_section in the Unix library assumes that it preserves errno!

I'm not sure asynchronous signals exist at all under Windows...

As far as I'm aware, there's nothing as equivalently horrific! The Windows way of doing these things is to give a function pointer for your handler and then the caller (which may be the kernel) uses CreateRemoteThread to create a new thread in your process to run the handler. Sounds horrific, although when compared with signal handling........!

To summarize: maybe we need to rework errno handling in runtime/signal*.c to account for the new implementation of signal handling, before or at the same time we add code to preserve GetLastError.

However, we'd be removing the code which preserves errno because we know that errno is already preserved, so does that have to be done strictly before extending the preservation to GetLastError? Our own code - for better or worse - is clearly written with the assumption that caml_leave_blocking_section doesn't clobber errno... would it be better to document that (and preservation of GetLastError) as part of this, and view the removal of the code in signal*.c as a (mildly scary) performance improvement?

@stedolan
Copy link
Copy Markdown
Contributor

As an alternative, could we consider removing stl_tls_get/set and using static __thread / static __declspec(thread) variables for its two uses, thread_descriptor_key and last_channel_locked_key? That way there are no library calls at all.

I'm not aware of any platform that supports pthreads / Win32 threads but lacks static global TLS support. (Things are a little dicier if we want non-static variables, linkable across files, and much dicier if we want symbols linkable across shared libraries, but neither of those worries applies here since the variables are only used in st_stubs.c)

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Feb 15, 2021

That appears to work on mingw32, mingw64 and msvc64. The disassembly of caml_thread_leave_blocking_section for msvc is very obviously directly accessing TLS. The disassembly for gcc isn't as obvious (to me), but it's not generating calls to [FT]lsGetValue either.

@xavierleroy
Copy link
Copy Markdown
Contributor

As an alternative, could we consider removing stl_tls_get/set and using static __thread / static __declspec(thread) variables for its two uses, thread_descriptor_key and last_channel_locked_key? That way there are no library calls at all.

When this code was written, MSDN warned that declspec(thread) doesn't work within DLLs and the Tls API must be used instead. Remember that the systhreads library is built as a DLL so that it can be loaded dynamically in ocamlrun. I need to be convinced that MSVC, Mingw and Flexlink fully support thread-local storage in DLLs.

@stedolan
Copy link
Copy Markdown
Contributor

Hmm, I see. I think I found the relevant bit of MSDN:

Windows Server 2003 and Windows XP: The Visual C++ compiler supports a syntax that enables you to declare thread-local variables: _declspec(thread). If you use this syntax in a DLL, you will not be able to load the DLL explicitly using LoadLibrary on versions of Windows prior to Windows Vista. If your DLL will be loaded explicitly, you must use the thread local storage functions instead of _declspec(thread). For an example, see Using Thread Local Storage in a Dynamic Link Library.

(see also this writeup)

Are Server 2003 and XP still supported?

@xavierleroy
Copy link
Copy Markdown
Contributor

However, we'd be removing the code which preserves errno because we know that errno is already preserved, so does that have to be done strictly before extending the preservation to GetLastError?

It does not have to be done strictly before. It's just something that 1- we should consider, and 2- justifies your choice of preserving GetLastError at the level of the systhread hook, and not at the level of caml_leave_blocking_section in runtime/signals.c.

@xavierleroy
Copy link
Copy Markdown
Contributor

Are Server 2003 and XP still supported?

XP, apparently yes: The native ports require Windows XP or later.

I'm also curious about Flexlink support for thread-local storage.

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Feb 15, 2021

Are Server 2003 and XP still supported?

XP, apparently yes: The native ports require Windows XP or later.

I think this may already be out-of-date (I'm starting a list of OCaml 5.00 housekeeping, and minimum version of Windows and the compiler are already on it). It's not been easy to build OCaml on Windows XP for a while... I think programs compiled with it may still run (mostly), but I'm fairly sure we now have unguarded Vista-specific header use. Both are very much unsupported OSes.

(Multicore requires completely dropping XP "support", incidentally)

I'm also curious about Flexlink support for thread-local storage.

I don't think flexlink is relevant here, because we're strictly talking about static variables, so flexlink won't "see" them.

@yawaramin
Copy link
Copy Markdown
Contributor

Windows XP is end-of-lifed for a while now: https://support.microsoft.com/en-us/windows/windows-xp-support-has-ended-47b944b8-f4d3-82f2-9acc-21c79ee6ef5e

In fact everything below Windows 8 is out of even extended support period: https://www.google.ca/amp/s/www.zdnet.com/google-amp/article/when-will-microsoft-pull-the-plug-on-your-version-of-windows-or-office/

I think it would make sense for OCaml to follow suit.

@xavierleroy
Copy link
Copy Markdown
Contributor

It's not been easy to build OCaml on Windows XP for a while... I think programs compiled with it may still run (mostly), but I'm fairly sure we now have unguarded Vista-specific header use. Both are very much unsupported OSes.

I have no strong opinion on which versions of Windows OCaml should support. But let's make our choice clear and document it.

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as a quick fix for an irritating issue.

In an ideal world, we could access thread-local storage without clobbering LastError, and C stub code would be written in a way that captures LastError early. But this will do fine for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unix.utimes on a missing file results in EUKNOWNERR instead of ENOENT on Win32

6 participants