Skip to content

Fix different binaries produced by ocamlopt.byte and ocamlopt.opt with --enable-force-safe-string#8853

Closed
dra27 wants to merge 1 commit intoocaml:trunkfrom
dra27:force-safe-string-sharing
Closed

Fix different binaries produced by ocamlopt.byte and ocamlopt.opt with --enable-force-safe-string#8853
dra27 wants to merge 1 commit intoocaml:trunkfrom
dra27:force-safe-string-sharing

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Aug 2, 2019

#1859 has revealed that three tests fail the "ocamlopt.opt and ocamlopt.byte should produce the same binary" test:

  • lib-bigarray/change_layout.ml
  • lib-scanf/tscanf.ml
  • lib-arg/testarg.ml

The difference is caused by this part of closure (note that flambda is not affected - as witnessed in the Travis run for #1859):

str ~shared:Config.safe_string (Uconst_string s)

Diffing the resulting binaries shows differences in caml_globals_map. A fix is simply to marshal the .cmx file without sharing, but this increases the .cmx files in the tree from 3.3M to 4.5M and a 36% increase a size for a better feature seems too harsh. Furthermore, for each of the tests the implementation digest of the .cmx is different.

In this PR, I've included a new flag to Marshal No_sharing_strings which works by not colouring string blocks blue (is that even sound?!). This at least seems to work, and reduces the .cmx files to 3.5M which is at least only a 6% increase in size.

The better solution would seem to be do the equivalent sharing in the bytecode compiler, but I'm not sure how to go about that, and the few bits of poking I tried didn't work.

Testing this out just requires configuring the tree with --enable-force-safe-string (and the implicit --disable-flambda)

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Aug 2, 2019

(not bothering with CI, since it doesn't test --enable-force-safe-string)

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Aug 2, 2019

Or you could just add compare_programs = false to the tests, like a number of them have already.
I don't think it is a bug for ocamlopt.byte and ocamlopt.opt to produce different files because of sharing, so I would be in favour of not treating differences in binaries as an error.
For ocamlopt at least, comparing the output of the .o files should be more reliable, since as far as I remember they are not dependent on crcs.

@xavierleroy
Copy link
Copy Markdown
Contributor

I agree with @lthis that it is not a bug that ocamlopt.byte and ocamlopt.opt produce different but equivalent binaries, just a fact that we need to be aware of and be able to explain, like @dra27 did.

Hence I don't think the proposed modification to the marshaler is warranted. The marshaler is already complicated enough, and critical enough; let's not add complexity to it.

I would be more interested in re-sharing structured constants in the bytecode compiler just like we do in the Closure pass of the native-code compiler. Why not keep that as a medium-term goal rather than rush a workaround?

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Aug 6, 2019

@lthls - I didn't know about that option, so that allows #1859 to have a route to CI passing.

@xavierleroy - I wasn't proposing this would be merged as-is, it was a cry for help having failed to get the bytecode compiler to do what I wanted 🙂

Is it definitely OK that we just "know" that the ocamlopt.opt and ocamlopt.byte can produce different output? Doesn't that mean that you can potentially end up with checksum mismatches on the same machine, given that it affects the implementation checksum recorded in the .cmx?

It's of course OK that the bytecode and native code runtime can behave differently, but surely we should be mitigating those effects in ocamlopt or insisting that ocamlopt must be a native code program (which obviously we could, at the cost of having to build the standard libary twice)? I can't imagine the reproducible builds people can particularly love that fact, either?

That said, for this particular case, I'd very happily alter the bytecode compiler - I just need a pointer or two, please!

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Aug 6, 2019

Keep in mind that the only differences we've seen so far between binaries outputted by ocamlopt.byte and ocamlopt.opt are in the section that lists all the compilation units that were linked along with their crcs (.cmi and .cmx). We haven't yet seen different code been generated, which would be much more worrying. This is why I suggested comparing the .o files instead, there would be much less false positives. Maybe @shindere (who introduced the test when migrating to ocamltest) can comment on this.

Checksum mismatches are unlikely, because both ocamlopt.byte and ocamlopt.opt can read files outputted by the other just fine (the checksums are used as unique identifiers of the corresponding files, they're never re-computed). You only risk mismatches if you're compiling the same file twice, once with ocamlopt.byte and once later with ocamlopt.opt (or the reverse). You can then get inconsistent assumption errors on compiling further files, just like if you had modified your original file (or your compiler) before the second compilation.

Reproducible builds people should not be affected, because I assume they will not allow the choice between ocamlopt.byte and ocamlopt.opt to be unspecified.

For now, I'd suggest to fix #1859 by adding compare_programs = false to the relevant ocamltest headers. If you're not sure how to proceed, I can propose the patch as a pull request to the pull request. This one can probably be closed, unless @xavierleroy wants to keep a reminder to improve the bytecode compiler.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Aug 27, 2019 via email

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 4, 2019

@shindere Could you comment on whether you would like to keep the binary test or not ? I'm in favour of dropping it because it's very unlikely to catch a bug not caught earlier, but has resulted in a small number of false positives that require non-trivial amounts of work to diagnose.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 4, 2019 via email

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 4, 2019

I wasn't aware that it had actually caught a bug. In this case, the best course might be to simply improve the reporting so that contributors triggering this failure know what to do about it.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 4, 2019 via email

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 4, 2019

If you can insert a sentence to explain the failure, I would opt for something like "This test failing does not always point to a bug. If you do not have reason to suspect a bug, you can disable it using 'compare_programs = false' in the test header."

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 4, 2019 via email

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Sep 4, 2019

I remembered from an early talk on ocamltest which @shindere gave that the reason for the comparison test was because a bug had surfaced - I didn't know that compare_programs had subsequently been added.

My own 2c is that either we should use it always or nowhere - so either ocamlopt.byte and ocamlopt.opt should always produce the same binary output on the same machine or we should eliminate ocamlopt.byte completely.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Sep 10, 2019

I've seen some comments on #1859 about the difference between ocamlc and ocamlopt string sharing being fixable. I'd like to clarify that I believe this is not true; string sharing for ocamlc can be improved, but ocamlopt can benefit from information about other compilation units through cmx files that ocamlc cannot have.
Furthermore, at least with flambda, ocamlopt is expected to do extra work to find optimisation opportunities, and these can include constant sharing or re-use of (immutable) previous allocations, both of which can ultimately change the output of Marshal.output_value in the program being compiled.

If a fix is needed, I think it would be better to look at using a different algorithm for computing crcs that would return the same crc for structurally equivalent data. Not exactly a simple solution, but much more robust than anything involving Marshal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants