Skip to content

Avoid a race condition when the process started to read a configuration file crashes/is not found#1378

Merged
voodoos merged 2 commits intoocaml:masterfrom
antalsz:avoid-configuration-file-process-race
Nov 17, 2021
Merged

Avoid a race condition when the process started to read a configuration file crashes/is not found#1378
voodoos merged 2 commits intoocaml:masterfrom
antalsz:avoid-configuration-file-process-race

Conversation

@antalsz
Copy link
Copy Markdown
Contributor

@antalsz antalsz commented Aug 6, 2021

In Mconfig_dot, there's a race condition around waitpid that can cause ocamlmerlin to crash if the process started to read a configuration file (either dot-merlin-reader or dune) crashes immediately or is not found. This PR fixes that race condition; it also protects the PIDs behind a module abstraction boundary to prevent it from happening again.

The existing code could error in the following case (seven-point framing due to @stedolan):

  1. We call Mconfig_dot.get_config.

  2. On (the former) line 236, get_config calls Mconfig_dot.Configurator.get_process; this function spawns a process and stores its information in the running_processes hashtable. (It doesn't matter which code path it goes down; let's suppose it starts it for the first time. i.e. the with Not_found path.)

  3. The spawned process immediately exits in the background.

  4. On (the former) line 237, get_config calls Unix.waitpid on the PID from get_process, returning the PID (nonzero) because the spawned process crashed. Importantly, this means the PID can never be waited on again.

  5. This raises Process_exited, which get_config catches; get_config then proceeds to exit after reporting an error.

  6. We call Mconfig_dot.get_config again.

  7. As in step 2, we call get_process on (the former) line 236; this looks up the old stale PID in running_processes, finds it, and calls Unix.waitpid on it. Because that PID was already waited on after completion in step 4 above, this crashes with an ECHILD error, terminating messily.

We have actually seen this crash in practice.

To fix this and prevent it from happening again, I made two changes:

  1. I provided a version of get_process that runs the second Unix.waitpid itself, returning a Process.t option. The accuracy of this wait depends on exactly how fast the spawned process fails, but as this is mostly trying to catch dot-merlin-reader not being installed or not being on the PATH, that's fine; it only needs to be best-effort besides that.

  2. I provide a module signature for the local Configurator module, abstracting over the running_processes hashtable so that we can preserve the invariant that it only contains waitable PIDs. To make this abstraction robust, I removed the PID from the Process.t record, and have running_processes store a Process.With_pid.t, which is just a pair of a PID and a process; I never expose this type (or any other way to access the PID) from Configurator's interface.

antalsz added 2 commits August 6, 2021 13:59
Indented some comments and removed some stray trailing spaces
The existing code could error in the following case (seven-point
framing due to @stedolan):

1. We call `Mconfig_dot.get_config`.

2. On (the former) line 236, `get_config` calls
   `Mconfig_dot.Configurator.get_process`; this function spawns a
   process and stores its information in the `running_processes`
   hashtable.  (It doesn't matter which code path it goes down; let's
   suppose it starts it for the first time. i.e. the `with Not_found`
   path.)

3. The spawned process immediately exits in the background.

4. On (the former) line 237, `get_config` calls `Unix.waitpid` on the
   PID from `get_process`, returning the PID (nonzero) because the
   spawned process crashed.  Importantly, this means the PID can never
   be waited on again.

5. This raises `Process_exited`, which `get_config` catches;
   `get_config` then proceeds to exit after reporting an error.

6. We call `Mconfig_dot.get_config` again.

7. As in step 2, we call `get_process` on (the former) line 236; this
   looks up the old stale PID in `running_processes`, finds it, and
   calls `Unix.waitpid` on it.  Because that PID was already waited on
   after completion in step 4 above, this crashes with an `ECHILD`
   error, terminating messily.

We have actually seen this crash in practice.

To prevent this from happening again, I made two changes:

A. I provided a version of `get_process` that runs the second
   `Unix.waitpid` itself, returning a `Process.t option`.  The
   accuracy of this wait depends on exactly how fast the spawned
   process fails, but as this is mostly trying to catch
   `dot-merlin-reader` not being installed or not being on the PATH,
   that's fine; it only needs to be best-effort besides that.

B. I provide a module signature for the local `Configurator` module,
   abstracting over the `running_processes` hashtable so that we can
   preserve the invariant that it only contains waitable PIDs.  To
   make this abstraction robust, I removed the PID from the
   `Process.t` record, and have `running_processes` store a
   `Process.With_pid.t`, which is just a pair of a PID and a process;
   I never expose this type (or any other way to access the PID) from
   `Configurator`'s interface.
Copy link
Copy Markdown
Collaborator

@voodoos voodoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes looks good to me. If I understand well the main issue was that we forgot to remove the pid from the hash-table when raising Process_exited.

It is a good idea to hide this invariant from users of the Configurator module.

@trefis, do you want to have a look too ?

@voodoos
Copy link
Copy Markdown
Collaborator

voodoos commented Nov 17, 2021

@trefis I am going to merge this one if you have no objections

@voodoos voodoos merged commit d685b8b into ocaml:master Nov 17, 2021
voodoos added a commit that referenced this pull request Nov 22, 2021
…s-race

Avoid a race condition when the process started to read a configuration file crashes/is not found
voodoos added a commit that referenced this pull request Nov 22, 2021
…s-race

Avoid a race condition when the process started to read a configuration file crashes/is not found
voodoos added a commit that referenced this pull request Nov 22, 2021
…s-race

Avoid a race condition when the process started to read a configuration file crashes/is not found
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

  + merlin binary
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

Mon Jul 26 11:12:21 PM CET 2021

  + merlin binary
    - Mbrowse.select_leaf: correctly ignore merlin.hide (ocaml/merlin#1376)
    - enable `occurences` to work when looking for locally abstract types
      (ocaml/merlin#1382)
    - handle `-alert` compiler flag (ocaml/merlin#1401)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
    - log the backtrace even when the exception is a Failure (ocaml/merlin#1377, @antalsz)
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - fix handling of record field expressions (ocaml/merlin#1375)
    - allow -pp to return an AST (ocaml/merlin#1394)
    - fix merlin crashing due to short-paths (ocaml/merlin#1334, fixes ocaml/merlin#1322)
  + editor modes
    - update quick setup instructions for emacs (ocaml/merlin#1380, @ScriptDevil)
  + test suite
    - improve record field destruction testing (ocaml/merlin#1375)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

Mon Jul 26 11:12:21 PM CET 2021

  + ocaml support
    - add support for 4.13
    - stopped actively supporting version older than 4.12
  + merlin binary
    - Mbrowse.select_leaf: correctly ignore merlin.hide (ocaml/merlin#1376)
    - enable `occurences` to work when looking for locally abstract types
      (ocaml/merlin#1382)
    - handle `-alert` compiler flag (ocaml/merlin#1401)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
    - log the backtrace even when the exception is a Failure (ocaml/merlin#1377, @antalsz)
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - fix handling of record field expressions (ocaml/merlin#1375)
    - allow -pp to return an AST (ocaml/merlin#1394)
    - fix merlin crashing due to short-paths (ocaml/merlin#1334, fixes ocaml/merlin#1322)
  + editor modes
    - update quick setup instructions for emacs (ocaml/merlin#1380, @ScriptDevil)
  + test suite
    - improve record field destruction testing (ocaml/merlin#1375)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

Mon Jul 26 11:12:21 PM CET 2021

  + merlin binary
    - Mbrowse.select_leaf: correctly ignore merlin.hide (ocaml/merlin#1376)
    - enable `occurences` to work when looking for locally abstract types
      (ocaml/merlin#1382)
    - handle `-alert` compiler flag (ocaml/merlin#1401)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
    - log the backtrace even when the exception is a Failure (ocaml/merlin#1377, @antalsz)
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - fix handling of record field expressions (ocaml/merlin#1375)
    - allow -pp to return an AST (ocaml/merlin#1394)
    - fix merlin crashing due to short-paths (ocaml/merlin#1334, fixes ocaml/merlin#1322)
  + editor modes
    - update quick setup instructions for emacs (ocaml/merlin#1380, @ScriptDevil)
  + test suite
    - improve record field destruction testing (ocaml/merlin#1375)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

Tue Nov 23 11:45:21 PM CET 2021

  + merlin binary
    - Mbrowse.select_leaf: correctly ignore merlin.hide (ocaml/merlin#1376)
    - make `occurences` work when looking for locally abstract types (ocaml/merlin#1382)
    - handle `-alert` compiler flag
    - improve destruct calls on record fields (ocaml/merlin#1375)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
    - log the backtrace even when the exception is a Failure (ocaml/merlin#1377, @antalsz)
    - allow -pp to return an AST (ocaml/merlin#1394)
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - fix handling of record field expressions (ocaml/merlin#1375)
  + test suite
    - improve record field destruction testing (ocaml/merlin#1375)
voodoos added a commit to voodoos/opam-repository that referenced this pull request Nov 23, 2021
CHANGES:

  + merlin binary
    - ignore `-error-style` compiler flag (ocaml/merlin#1402, @nojb)
    - avoid a race condition when the process started to read a configuration
      file crashes/is not found (ocaml/merlin#1378, @antalsz)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants