Skip to content

More systhreads fixes for 5.0#11403

Merged
gasche merged 2 commits intoocaml:trunkfrom
gadmm:systhread_simpl_and_fixes0
Jul 22, 2022
Merged

More systhreads fixes for 5.0#11403
gasche merged 2 commits intoocaml:trunkfrom
gadmm:systhread_simpl_and_fixes0

Conversation

@gadmm
Copy link
Copy Markdown
Contributor

@gadmm gadmm commented Jul 6, 2022

Through digging into the new systhreads implementation here are a few more fixes to look into for 5.0:

  1. Replace explicit tls with thread_local: an already-accepted change from a previous PR, done first for rebasing convenience.
  2. Guard against arbitrary code running before threads init: the domain threads initialization code runs from OCaml code. Thus there is an interval during which arbitrary OCaml code can run before initialization (e.g. GC, async callbacks and at_each_spawn callbacks).
  3. Quick fix for the multiplication of tick threads: fixes one problem reported at Simplifications and fixes to multicore systhreads implementation (2/3) #11385 (comment)
  4. Do not reinitialize mutex and cond var: removes needless initializations which cause UB.

Bugs fixed at 2. and 4. are likely sources of UB (deadlocks or crashes) in practice, which might explain some CI failures on other systhreads PRs, though it is unclear yet.

No change entry needed.

(cc @Engil, @kayceesrk, @gasche)

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 6, 2022

@gasche can you please run precheck on it?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 6, 2022

@abbysmal
Copy link
Copy Markdown
Contributor

abbysmal commented Jul 7, 2022

@gadmm thank you for this.
I will need to backlog your work and provide you with the adequate reviews, sorry for the very long processing delays on my end.
Is this PR the first one I should be reviewing or should I consider an earlier one first?

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 7, 2022

Thanks @gasche.

CI is failing on Windows with access violations, which is actually good news because the root of the Windows failure had not yet been identified (not for lack of trying). Now, it is unlikely that it is caused by the three last commits because they only change the behaviour in corner cases and were not there before. This makes the first one (using thread_local) a primary suspect. The commit looks trivial but keep in mind that the commit could be correct while suffering from side-effects of pre-existing bugs (I know for instance that Windows is picky about threads that outlive the main thread).

@gasche could you please run precheck on 084a512? (or do you need a named branch?) (sorry to ask again, everything would be easier if I managed to reproduce these bugs)

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 7, 2022

Is this PR the first one I should be reviewing or should I consider an earlier one first?

Thank you, this PR is the first one on the stack. I also pushed a small addition to #11272 to make it independent from all the systhreads PR, so this can also be reviewed now as well.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 7, 2022

(sorry to ask again, everything would be easier if I managed to reproduce these bugs)

I don't mind, but I think you should follow the process to get precheck access rights yourself, which is documented in HACKING.adoc: running INRIA's CI on a publicly-available git branch.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 7, 2022

(Yes, it seems that precheck needs a named branch.)

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 8, 2022

I created the branch systhread_use_thread_local consisting in the first commit.

According to the procedure in HACKING.adoc I should already have these rights, so it seems that I will have to report the problem to the anonymous e-mail address mentioned.

That being said I have already spent too much time on the systhread issues. At the start I simply wanted to have systhread stop accessing Caml_state without holding the lock. As a famous fictional character once said, “This wasn't my war!”. I think the issues here are serious and need to be fixed in 5.0 but I am looking for a volunteer to take over the systhreads PRs (@Engil perhaps?).

@gadmm gadmm force-pushed the systhread_simpl_and_fixes0 branch from aed6d4d to 7916f46 Compare July 9, 2022 19:53
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 9, 2022

Thanks to @dra27's help I managed to reproduce the issue on Windows. It was indeed caused by the first commit (084a512ce). Perhaps Windows experts would know why (incompatibility with dynamic linking? does not like it when non-detached threads are not joined?).

In the meanwhile for 5.0 I rebased without the first commit.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jul 10, 2022

Thanks to @dra27's help I managed to reproduce the issue on Windows.

Thanks for fighting through the frustrations - I'm glad the build behaved in the end!

It was indeed caused by the first commit (084a512ce). Perhaps Windows experts would know why (incompatibility with dynamic linking? does not like it when non-detached threads are not joined?).

_Thread_local will ultimately compile down to __declspec(thread) on Windows. In terms of history, that's how systhreads originally did this on Windows. It was removed and switched to explicit TLS when dynamic loading was added to OCaml in ddd99c7 (the posix layer has always used explicit TLS, I expect because at the time compiler support couldn't be relied on across all the various C compilers in use). That's certainly because back in 2001 __declspec(thread) didn't work with dynamically loaded DLLs on any Windows platform1. That was fixed (just about) in Windows Vista, but I expect that what we're seeing here is a flexlink issue with implicit TLS.

In the meanwhile for 5.0 I rebased without the first commit.

It would certainly be interesting to confirm that it's an implicit TLS issue and potentially fix flexlink, but given the size of the diff (i.e. the fact we don't use that much explicit TLS), I'm not sure it'd be worth devoting time to?

Footnotes

  1. As with so many of these obscure Windows things, Raymond Chen to the rescue in The Old New Thing

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 10, 2022

In the meanwhile for 5.0 I rebased without the first commit.

New precheck run: https://ci.inria.fr/ocaml/job/precheck/739/

Copy link
Copy Markdown
Contributor

@abbysmal abbysmal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty nice and lean PR, with more than welcome fixes.
Made one comment, I'd like to understand this specific point, but otherwise I think it is good for merging.

st_tls_set(caml_thread_key, new_thread);

Active_thread = new_thread;
Tick_thread_running = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice catch, I wonder how it got in there

/* Restore the runtime state from the curr_thread descriptor */
caml_thread_restore_runtime_state();
} else {
caml_leave_blocking_section_default();
Copy link
Copy Markdown
Contributor

@abbysmal abbysmal Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that caml_thread_leave_blocking_section (and its enter counterpart) would not be called if initialization isn't done for Thread.
(as these are set as callback only if caml_thread_initialize is called)
Is this extra handling required?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe there's a weird interaction in case we are talking about another domain's initialization? (not domain 0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for domain ≠ 0, there is a gap between spawning and threads init during which arbitrary OCaml code can run. The solution is not really nice.

I do not really like this solution (look at the removal of [@noalloc] on Thread.self). It would be simpler to avoid this by calling the domain threads init with a hook on the C side. Would you or someone else would be willing to look into such an alternative solution?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be quite frank I think the solution here works and we should merge the branch as is.

I agree this is not quite elegant, and I've never been a huge fan of the initialisation process for threads to be called from OCaml.

Maybe we should just open an issue and address this later?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should merge the branch as is

it needs an approval first :-)

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 11, 2022

Thanks for the infos @dra27.

It would certainly be interesting to confirm that it's an implicit TLS issue and potentially fix flexlink, but given the size of the diff (i.e. the fact we don't use that much explicit TLS), I'm not sure it'd be worth devoting time to?

For systhreads, no. For flexdll users, it is probably worth investigating/documenting the issue given that _Thread_local is now an official part of the language.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 12, 2022

@gadmm this is good to go if you confirm that you are happy to merge now

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 12, 2022

I think the API change (Thread.self no longer being noalloc) needs an additional opinion.

@abbysmal
Copy link
Copy Markdown
Contributor

Right, the more I think about it, the less I'm comfortable with it. (Thread.self not being noalloc anymore)
Maybe moving over to an initialization step outside of OCaml is fine, but the problem here is that there's no reasonable way of doing so at the moment, as systhreads is an independent library.
I can't come up at the moment with a decent design, do you have any idea?

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 12, 2022

I don't think there should be a problem of having a C-side hook caml_domain_at_each_spawn (similarly to other hooks).

To further lift limitations on init, I even wonder if the initialization of systhreads could not be made fully per-domain (using thread-local hooks). (But that's beyond the scope.)

@abbysmal
Copy link
Copy Markdown
Contributor

abbysmal commented Jul 12, 2022

Don't you think there could be a timing issue there:
To register the hook, one would need to go through caml_thread_initialize, which relies on the Thread topevel entrypoint. (it is being called from OCaml).
It would fix the problem for other domains, but the situation remains murky for the time span between the program start and the call of caml_thread_initialize in Thread's top level.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 12, 2022

I don't think there is a timing issue for systhreads initialization on domain 0. The systhreads init is done at the same time as the domain 0 threads init.

Either way I agree that this could go in and one can open an issue to ask that the Thread.self situation is fixed, to let someone have a look independently from me. Then depending on how severe this is deemed, the issue can be targeted to 5.0 or later. (One motivation to fix it for 5.0 is that I am not sure that Domain.at_each_spawn is a good API and so it could be removed.)

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 12, 2022

Thread.self was marked noalloc in 2015 by myself ( e91bb93 ), merging a change proposal by Clark Gaebel for performance reasons (the Async scheduler calls Thread.self at each tick): #196. As far as I can tell, this patch was incorrect because caml_thread_self could in fact raise an exception:

CAMLprim value caml_thread_self(value unit)         /* ML */
{
  if (curr_thread == NULL) invalid_argument("Thread.self: not initialized");
  return curr_thread->descr;
}

(In the current trunk, the function does not raise an exception but, if I understand correctly, it reads an uninitialized field instead, which is not much nicer.)

Marking the function noalloc was incorrect in 4.x and I think it would be fine to not do it in 5.0.

Looking for code that uses Thread.self on github, most of the use-cases are Thread.id (Thread.self ()). We could make this faster by provoding a self_id function (still not noalloc, it fails when called incorrectly), and we could offer unsafe_self_id that returns -1 when the thread is not initialized, and is noalloc.

You can decide to change the runtime implementation in significant ways to make the function truly noalloc, but i think that it could/should go to a separate PR.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 12, 2022

This confirms that removing noalloc is a perf regression. I am not sure that this was incorrect though. If you access this function via the Thread module, then thread init has been done and it cannot raise any exception (right?). The check is for other potential callers, but those are not affected by the noalloc annotation.
This is different from here where it might indeed be called from OCaml code via Thread before threads are initialized on this domain.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 12, 2022

Could we define let self () = ... as a wrapper in thread.ml that checks for initialization on the OCaml side before calling a noalloc function?

(Then I guess caml_thread_self would return Val_unit in that case and other functions accepting a Thread.t need to guard against this value.)

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 12, 2022

(The suggestion above might be a low-effort approach before a more invasive solution to thread initialization is proposed. I agree that it's worth pulling the thread (no pun intended) of what API Thread needs from the runtime to be correct.)

@xavierleroy
Copy link
Copy Markdown
Contributor

I agree Thread.self and Thread.id are performance-critical (e.g. for implementing thread-local storage) and should be declared "noalloc".

The simplest solution is to do a fatal error instead of raising an Invalid_argument exception.

Justification: the only case where Thread.self can fail is if the program 1) uses Thread.self but none of the functions declared val in the Thread interface, and 2) forgot to link with threads.cm{,x}a. This is a bug, and it's unlikely, so there's no obligation to report it by a nice exception, a nasty fatal error will do.

@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 12, 2022

Justification: the only case where Thread.self can fail is if the program 1) uses Thread.self but none of the functions declared val in the Thread interface, and 2) forgot to link with threads.cm{,x}a. This is a bug, and it's unlikely, so there's no obligation to report it by a nice exception, a nasty fatal error will do.

Not in multicore. There is also the case where domain-local thread init has not yet been done on the current domain. This is concrete, for instance memprof-limits calls Thread.self from asynchronous callbacks, which can be called in this interval. (So I realize now that even raising is a bad solution, since the code will have to be adapted to be ready for this in multicore.)

To focus on this PR, I see two solutions:

  • Merge it as-is, accept to remove noalloc for the moment (since I am not convinced by the low-effort alternatives proposed), and open an issue to reintroduce noalloc it via the C-side hook approach (for 5.0 or later).
  • Let me remove the controversial commit from this PR, and open an issue for someone to fix it with the C-side hook approach (for 5.0).

@xavierleroy
Copy link
Copy Markdown
Contributor

So I realize now that even raising is a bad solution, since the code will have to be adapted to be ready for this in multicore

Right. And very little code is ready to handle an Invalid_argument exception... which is what Thread.self raises before initialization. So, yes, a different approach to domain-local thread initialization seems required.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 12, 2022

Even if we remove the "offending commit", we have the problem that the return value of Thread.self is garbage unless the thread has been initialized, right? So we are not fixing the issue, just turning a performance bug into a correctness bug. If this understanding is correct, I would rather be in favor of merging the correctness fix.

@xavierleroy
Copy link
Copy Markdown
Contributor

Raising Invalid_argument from Thread.self even when the Threads library was properly linked with the program is already a correctness bug. So I'd like to see it addressed before anything from this PR is merged.

@gadmm gadmm force-pushed the systhread_simpl_and_fixes0 branch from 7916f46 to 756e4c9 Compare July 13, 2022 10:24
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 13, 2022

The message was unclear so I took the second solution of removing the commit from this PR. I will open an issue instead.

Even if we remove the "offending commit", we have the problem that the return value of Thread.self is garbage unless the thread has been initialized, right?

This does a NULL pointer dereferencing (or a use-after-free in the case of a recycled domain). Idem if caml_enter_blocking_section is called.

There was a risk for caml_thread_yield, but it (accidentally) works even when the threads have not yet been initialized on the domain. Otherwise it is possible that the bug would have been noticed already via spurious failures.

@abbysmal
Copy link
Copy Markdown
Contributor

I think leaving this out of this PR and merging the fixes already in there is a good idea, should we just move on and merge this and move the discussion over to #11434 ?
I am definitely fine with the current changes in the PR.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 21, 2022

@gadmm this conflicts with your Caml_state=NULL changes, would you mind rebasing before merge?

gadmm added 2 commits July 21, 2022 22:48
This is undefined behaviour; instead they are intended to be reused.
@gadmm gadmm force-pushed the systhread_simpl_and_fixes0 branch from 756e4c9 to fe5e5aa Compare July 21, 2022 21:00
@gadmm
Copy link
Copy Markdown
Contributor Author

gadmm commented Jul 21, 2022

Rebased.

@gasche gasche merged commit 05114fb into ocaml:trunk Jul 22, 2022
gasche added a commit that referenced this pull request Jul 22, 2022
More systhreads fixes for 5.0

(cherry picked from commit 05114fb)
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 22, 2022

Cherry-picked in 5.0 as 9bf9c52.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants