Skip to content

[RFC] More informative errors when Ccomp.command exits with status 127#1727

Merged
trefis merged 1 commit intoocaml:trunkfrom
trefis:ccomp-sys_error
May 5, 2018
Merged

[RFC] More informative errors when Ccomp.command exits with status 127#1727
trefis merged 1 commit intoocaml:trunkfrom
trefis:ccomp-sys_error

Conversation

@trefis
Copy link
Copy Markdown
Contributor

@trefis trefis commented Apr 16, 2018

I spent a non-zero amount of time recently trying to understand the following output (reproduced in full) from ocamlopt (which exited with -1):

File "caml_startup", line 1:
Error: Error during linking

As it turns out, the command line used by ocamlopt to invoke the linker was bigger than ARG_MAX.
In such cases, Sys.command won't print anything to stderr or stdout, and return 127 on Linux (windows apparently returns -1).

I think it would be a nice to print something when that happens (even though we can't be sure whether our call silently failed, or the command itself exited with 127; I don't think we should be worrying too much about the added noise however).
I initially wanted Ccomp.command to print something along the lines of:

Error: command exited with status 127
Can be due to argument size limit, run with -verbose to get the actual command line.

which seems like a nice hint.
But I noticed that caml_sys_system_command raises Sys_error when -1 is returned and figured that I could do that too in this case as well and that it would probably be enough.

For the record: the invocation of ocamlopt was hidden by a build system, so Sys_error with the faulty cmdline would most likely have made the issue obvious.

Any opinion?

@stedolan
Copy link
Copy Markdown
Contributor

This looks like a good change, although it's a bit annoying that it can't be more specific with the error message. Unfortunately, this isn't fixable without reimplementing system: sh returns 127 when the command isn't found, and if exec fails, then system behaves as though the command returned 127, so there's no way to distinguish an E2BIG error from exec from a shell exiting with "command not found".

I suppose the best option would be to reimplement system and do better error reporting, but failing that this patch is definitely an improvement over the current state of affairs.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented May 4, 2018

Thanks for looking at this @stedolan !

Unless someone has another opinion on the matter I will merge this in a few days.

@xavierleroy
Copy link
Copy Markdown
Contributor

I'm curious: in the end, what error message is shown to the user?

At any rate, I'm in favor of merging this, since it cannot be worse than the error message currently displayed...

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented May 4, 2018

In my case, something like this:

Error: I/O error: gcc -shared -o '/tmp/camlTOP5b12000.so'   '-L/path'  ... '-L/pathN
   '-Lstdlib' '-Lotherlibs/unix' '-Lotherlibs/str' '-Lotherlibs/dynlink'
  '-Lotherlibs/bigarray' '-Lotherlibs/raw_spacetime_lib' '-Lotherlibs/graph'
  '/tmp/camlTOP5b12000.o'

(where "..." is, obviously, fairly long)

You can test it out with

$ make natruntop
# #directory "/some/genuinly/long/path";;
# ...
# #directory "/some/genuinly/long/path";;
# 3 + 5;;

@trefis trefis force-pushed the ccomp-sys_error branch from 4d84b0c to 9525423 Compare May 4, 2018 17:44
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented May 4, 2018

To confirm, the strace output when this happened was:

[pid 10550] execve("/bin/sh", ["sh", "-c", "gcc -shared -o '/tmp/camlTOP5b12"...], [/* 44 vars */]) = -1 E2BIG (Argument list too long)

@xavierleroy
Copy link
Copy Markdown
Contributor

That's a scary error message... but better than nothing. Please go ahead with the merge.

@trefis trefis merged commit 6d4b966 into ocaml:trunk May 5, 2018
@trefis trefis deleted the ccomp-sys_error branch May 7, 2018 14:14
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented May 7, 2018

Any opinion on whether this should be backported to 4.07?

@gasche
Copy link
Copy Markdown
Member

gasche commented May 7, 2018

Meh.

If there was clearly no risk of regression for anyone's workflow, I would say "why not", but the fact that I'm not knowledgeable enough about the horrible things that people do with return codes is giving me pause.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented May 7, 2018

Ack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants