Skip to content

Fix stack-overflow in scheduler#3

Merged
avsm merged 1 commit intoocaml-multicore:mainfrom
talex5:stack-overflow
Mar 11, 2021
Merged

Fix stack-overflow in scheduler#3
avsm merged 1 commit intoocaml-multicore:mainfrom
talex5:stack-overflow

Conversation

@talex5
Copy link
Copy Markdown
Collaborator

@talex5 talex5 commented Mar 11, 2021

There's no need to call the scheduler again manually after continue. The coroutine will return to the scheduler anyway when done.

There's no need to call the scheduler again manually after `continue`.
The coroutine will return to the scheduler anyway when done.
@talex5 talex5 requested a review from avsm March 11, 2021 16:48
@talex5
Copy link
Copy Markdown
Collaborator Author

talex5 commented Mar 11, 2021

@avsm : I'm not entirely sure how this is supposed to work. Does this look right to you?

@avsm
Copy link
Copy Markdown
Contributor

avsm commented Mar 11, 2021

This looks right; I'd planned to make complete_rw_req and the schedule function mutually recursive anyway, since there needs to be a direct control flow line back to the scheduler in this case. We could do with a test case that spawns a few billion fibres to test the stack safety...

@avsm avsm merged commit a40ab62 into ocaml-multicore:main Mar 11, 2021
@talex5 talex5 deleted the stack-overflow branch March 11, 2021 18:29
talex5 added a commit to talex5/eio that referenced this pull request Mar 12, 2021
Before, the continuations all returned `unit`. This makes it easy to get
confused about what it means (as in ocaml-multicore#1 and ocaml-multicore#3). Using a distinct type
for this should prevent such errors in future.

The error in ocaml-multicore#1 now results in:

    Error: This expression has type effect but an expression was expected of type
             [ `Exit_scheduler ]

The error in ocaml-multicore#3 now results in:

    Error: This expression has type [ `Exit_scheduler ]
           but an expression was expected of type unit
           because it is in the left-hand side of a sequence
talex5 added a commit to talex5/eio that referenced this pull request Nov 11, 2021
Before, the continuations all returned `unit`. This makes it easy to get
confused about what it means (as in ocaml-multicore#1 and ocaml-multicore#3). Using a distinct type
for this should prevent such errors in future.

The error in ocaml-multicore#1 now results in:

    Error: This expression has type effect but an expression was expected of type
             [ `Exit_scheduler ]

The error in ocaml-multicore#3 now results in:

    Error: This expression has type [ `Exit_scheduler ]
           but an expression was expected of type unit
           because it is in the left-hand side of a sequence
talex5 added a commit to talex5/eio that referenced this pull request Jul 27, 2023
The `execve` action allocated the arrays in the forked child process.
However, in a multi-threaded program we might have forked while another
thread had the malloc lock. In that case, the child would wait forever
because it inherited the locked mutex but not the thread that would
unlock it. e.g.

    #0  futex_wait (private=0, expected=2, futex_word=0xffff9509cb10 <main_arena>) at ../sysdeps/nptl/futex-internal.h:146
    ocaml-multicore#1  __GI___lll_lock_wait_private (futex=futex@entry=0xffff9509cb10 <main_arena>) at ./nptl/lowlevellock.c:34
    ocaml-multicore#2  0x0000ffff94f8e780 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at ./malloc/malloc.c:3650
    ocaml-multicore#3  0x0000aaaac67cfa68 in make_string_array (errors=errors@entry=37, v_array=281472912006504) at fork_action.c:47
    ocaml-multicore#4  0x0000aaaac67cfaf4 in action_execve (errors=37, v_config=281472912003024) at fork_action.c:61
    ocaml-multicore#5  0x0000aaaac67cf93c in eio_unix_run_fork_actions (errors=errors@entry=37, v_actions=281472912002960) at fork_action.c:19
talex5 added a commit to talex5/eio that referenced this pull request Jul 28, 2023
The `execve` action allocated the arrays in the forked child process.
However, in a multi-threaded program we might have forked while another
thread had the malloc lock. In that case, the child would wait forever
because it inherited the locked mutex but not the thread that would
unlock it. e.g.

    #0  futex_wait (private=0, expected=2, futex_word=0xffff9509cb10 <main_arena>) at ../sysdeps/nptl/futex-internal.h:146
    ocaml-multicore#1  __GI___lll_lock_wait_private (futex=futex@entry=0xffff9509cb10 <main_arena>) at ./nptl/lowlevellock.c:34
    ocaml-multicore#2  0x0000ffff94f8e780 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at ./malloc/malloc.c:3650
    ocaml-multicore#3  0x0000aaaac67cfa68 in make_string_array (errors=errors@entry=37, v_array=281472912006504) at fork_action.c:47
    ocaml-multicore#4  0x0000aaaac67cfaf4 in action_execve (errors=37, v_config=281472912003024) at fork_action.c:61
    ocaml-multicore#5  0x0000aaaac67cf93c in eio_unix_run_fork_actions (errors=errors@entry=37, v_actions=281472912002960) at fork_action.c:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants