Skip to content

Convert Windows stubs to Unicode#58

Merged
rgrinberg merged 9 commits intojanestreet:masterfrom
dra27:win32-unicode-spawn
Mar 9, 2024
Merged

Convert Windows stubs to Unicode#58
rgrinberg merged 9 commits intojanestreet:masterfrom
dra27:win32-unicode-spawn

Conversation

@dra27
Copy link
Copy Markdown
Contributor

@dra27 dra27 commented Mar 5, 2024

Subsumes #55. Fixes #57. Fixes ocaml/dune#10180 which then fixes ocaml/opam#5861.

Windows OCaml uses UTF-8 internally for environment variables. The spawn_windows stub was using the ANSI version of CreateProcess (CreateProcessA) which causes the environment to become corrupted if the environment contained any non-ASCII characters (i.e. any UTF-8 sequences).

Support for Windows 98 ended in July 2006, so I think we can risk using the Unicode version of CreateProcess 😉 The code mirrors that for Unix.create_process.

emillon added a commit to emillon/dune that referenced this pull request Mar 5, 2024
This updates our spawn vendored version to include janestreet/spawn#58.

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request Mar 5, 2024
This updates our spawn vendored version to include janestreet/spawn#58.

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request Mar 5, 2024
This updates our spawn vendored version to include janestreet/spawn#58.

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
src/dune Outdated
(run ocaml %{dep:flags.ml})))

(rule
(with-stdout-to flags.ml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this might add the Flags module to the library.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, yes indeed! Fix pushed for the original approach, but actually enabled_if is arguably "better", given that it's then a semi-pure Dune solution.

/* Must come before any other caml/ headers are included */
#define CAML_INTERNALS

#ifdef _WIN32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the dune file:

Suggested change
#ifdef _WIN32
#ifdef _WIN32
#define UNICODE
#define _UNICODE

Based on a comment by @emillon in ocaml/dune#10212 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that UNICODE and _UNICODE must either be defined everywhere or nowhere (OCaml headers can pull in windows.h or parts of it as well, ending up with a similar problem to that which we have already had with CAML_INTERNALS).

The other benefit of putting it in flags is that any additional C files correctly inherit it, but that's really never likely to apply here. Given that we already have the _GNU_SOURCE definition (which is where it should go - i.e. right at the top).

I don't mind, though - how strong opinions against doing it "philosophically" correctly, if more complicatedly, in the dune file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is better to do it in the dune file.

@rgrinberg
Copy link
Copy Markdown
Collaborator

@nojb could you have a look?

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo the tweaks below)

src/spawn.ml Outdated
Buffer.contents buf
if env = [] then
"\000\000"
else
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say a word about this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment block is supposed to be terminated with two NUL characters - if there are any bindings, that obviously in the other branch (and did in the ANSI version too). When there are no bindings, the ANSI version was passing a pointer to the OCaml string "\000" but that will have had an additional NUL terminator so it quietly worked. With the wide version, the issue was instantly revealed by one of the tests where CreateProcessW will have received two NUL bytes (i.e. a NUL wchar_t) but will have then "segfaulted" reading past the end of that buffer (CreateProcessW actually translated that to an invalid argument, which worryingly suggests that it carried on reading random memory and instead got upset that there was no = character anywhere or some such😱)

I'm not sure if it's worth a comment in the code - CreateProcess does indicate that the block should be a series of nul-terminated strings themselves terminated by another nul, it's just hazy on the case of an empty environment (not surprisingly, really).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had that same shock thought! 🙂 Just awaiting a train, and then I’ll flex some environments…

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, OCaml's fine because createprocess.c speculatively adds a nul character. I was speculatively adding that character but not zeroing it. I've fixed it by not adding it, as I think it's nicer that the function in spawn.ml is entirely responsible for producing the UTF-8 version and then the C stub transcodes exactly that string.

/* Must come before any other caml/ headers are included */
#define CAML_INTERNALS

#ifdef _WIN32
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is better to do it in the dune file.

win32_maperr(GetLastError());
caml_win32_maperr(GetLastError());
close_std_handles(&si);
uerror("DuplicateHandle", Nothing);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we error out here, we will leak the strings allocated above. Maybe invert the order?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh blimey, good catch. (Another) revised version incoming...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@emillon
Copy link
Copy Markdown
Contributor

emillon commented Mar 8, 2024

ok, thanks. my only concern with the dune file was that it requires changing a bit our bootstrap system in dune but that's fine with me.

dra27 added 7 commits March 8, 2024 16:45
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
Signed-off-by: David Allsopp <david.allsopp@metastack.com>
If the duplication of the handles failed, then the memory allocated for
the strings wasn't freed. Duplicate the handles first - the only failure
for the strings is OOM, and all bets are then off.

Signed-off-by: David Allsopp <david.allsopp@metastack.com>
@dra27 dra27 force-pushed the win32-unicode-spawn branch from 2ff553a to 5958e51 Compare March 8, 2024 16:53
@dra27
Copy link
Copy Markdown
Contributor Author

dra27 commented Mar 8, 2024

While looking at it again, for some reason the check for nul characters in environment values was only being done in Env_unix so I shifted the function out of there and used it in Env_win32 as well.

Hopefully this is the last version 🙂

Copy link
Copy Markdown
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (after the latest changes). Thanks!

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
_
Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
@rgrinberg rgrinberg merged commit fa620a8 into janestreet:master Mar 9, 2024
emillon added a commit to emillon/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to ocaml/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes #10180

Signed-off-by: Etienne Millon <me@emillon.org>
Leonidas-from-XIV pushed a commit to Leonidas-from-XIV/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes ocaml#10180

Signed-off-by: Etienne Millon <me@emillon.org>
Leonidas-from-XIV added a commit to ocaml/dune that referenced this pull request Mar 11, 2024
This does several things:
- update our spawn vendored version to include janestreet/spawn#58
- pick the dune file
- add new build flags for bootstrapping

Fixes #10180

Signed-off-by: Etienne Millon <me@emillon.org>
Co-authored-by: Etienne Millon <me@emillon.org>
dkalinichenko-js pushed a commit to dkalinichenko-js/opam-repository that referenced this pull request Nov 12, 2024
CHANGES:

- Support older GCC like 4.8.5 (janestreet/spawn#59)

- Fix spawning processes on Windows when environment contains non-ascii
  characters (janestreet/spawn#58)

- Skip calls to pthread_cancelstate on android, as its not available (janestreet/spawn#52)

- Fix compatibility with systems that do not define `PIPE_BUF`. Use
  `_POSIX_PIPE_BUF` as a fallback. (janestreet/spawn#49)

- [haiku] Fix compilation on Haiku OS. The header sys/syscalls.h isn't
  available, neither is pipe2()

- Allow setting the sigprocmask for spawned processes (janestreet/spawn#32)
dkalinichenko-js pushed a commit to dkalinichenko-js/opam-repository that referenced this pull request Nov 12, 2024
CHANGES:

- Support older GCC like 4.8.5 (janestreet/spawn#59)

- Fix spawning processes on Windows when environment contains non-ascii
  characters (janestreet/spawn#58)

- Skip calls to pthread_cancelstate on android, as its not available (janestreet/spawn#52)

- Fix compatibility with systems that do not define `PIPE_BUF`. Use
  `_POSIX_PIPE_BUF` as a fallback. (janestreet/spawn#49)

- [haiku] Fix compilation on Haiku OS. The header sys/syscalls.h isn't
  available, neither is pipe2()

- Allow setting the sigprocmask for spawned processes (janestreet/spawn#32)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants