Skip to content

make string/bytes distinguishable in bytecode#1617

Closed
hhugo wants to merge 10 commits intoocaml:trunkfrom
hhugo:safe-string2
Closed

make string/bytes distinguishable in bytecode#1617
hhugo wants to merge 10 commits intoocaml:trunkfrom
hhugo:safe-string2

Conversation

@hhugo
Copy link
Copy Markdown
Contributor

@hhugo hhugo commented Feb 17, 2018

Follow up to #596.
This PR move the distinction between bytes and string down to the bytecode.
This should allow the js_of_ocaml compiler to choose different runtime representation for bytes and string (when safe-string is on).

The most important changes are:

  • In bytecode, Bytes.unsafe_to_string & Bytes.unsafe_of_string are no longer no-op and now call into the C runtime. The native compilation is unchanged.
  • Add a new bytecode instruction for %string_unsafe_get, different from the instruction used for %bytes_unsafe_get

@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Feb 17, 2018

For people reading emails only, I've amended the feature description:

In bytecode, Bytes.unsafe_to_string & Bytes.unsafe_of_string are no longer no-op and now call into the C runtime. The native compilation is unchanged.

"%caml_string_set32u", Pstring_set_32(true);
"%caml_string_set64", Pstring_set_64(false);
"%caml_string_set64u", Pstring_set_64(true);
"%caml_string_set16", Pbytes_set_16(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to keep the old names %caml_string_set*? I agree that backwards compatibility is generally good, but am just wondering.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @xclerc pointed out, a library such as ocp-indent will benefit from this backwards compatible change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several packages were broken by a previous version of a similar change by Hongbo that turned %string_safe_set into %bytes_safe_set -- see #596 (comment) . I would be wary of changing the external-referenceable name of a primitive.

byterun/str.c Outdated
/**
* [caml_string_set16] is deprecated,
* use [caml_bytes_set16] instead
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for keeping these caml_string_set* C primitives? AFAICS they are not exposed to user code and not used by the bytecode interpreter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is related to your previous comment, but someone might
reference them through external, e.g. ocp-endian.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they would reference the %caml_string_set* name, not the C primitive, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you meant "make the old OCaml primitive name point to a new C symbol"?

I think there could still be a problem, with an .o file generated by a previous version
of the compiler and then linked with the new runtime; however, if magic numbers are
bumped it should not be a problem (cannot remember whether magic numbers are
automatically bumped nowadays).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bumping the magic number would do the trick; I don't think supporting "old-with-new" linking is a good idea anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change.

@hhugo hhugo force-pushed the safe-string2 branch 2 times, most recently from 7ff7b7b to ef36776 Compare February 20, 2018 19:59
@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Feb 20, 2018

I expect the tests to pass now.
Bootstrapping was required in order to remove primitives from the runtime.

@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Feb 20, 2018

File "/home/travis/build/ocaml/ocaml/toplevel/opttopmain.ml", line 213, characters 24-29:
Error: Unbound value unset
Hint: Did you mean set?
make: *** [toplevel/opttopmain.cmx] Error 2

I'm not sure what this is about..

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Feb 20, 2018 via email

@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Feb 21, 2018

Marshal.to_string and Marshal.to_bytes were calling into the same c function. It is now fixed.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 27, 2018

This patch looks correct to me, and it should be backwards compatible thanks to leaving the name of old externals caml_string_set* untouched.

However, I think the cmo magic number needs to be bumped due to the bytecode change.

@xavierleroy
Copy link
Copy Markdown
Contributor

Warning: this PR contains a bootstrap. (Several, actually.) According to house regulations it will have to be merged manually by a core developer.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Mar 6, 2018

However, I think the cmo magic number needs to be bumped due to the bytecode change.

Indeed, but I think it was agreed recently that we'd change all the magic numbers between each major release anyway.
For example, we've recently merged some changes to types.ml without changing the magic number of cmis just yet.

@damiendoligez
Copy link
Copy Markdown
Member

@hhugo could you reorder the commits (moving up the marshall commit) so we only have to deal with two bootstraps instead of three?

Also, rebase Changes (but not the bootstrap compilers, they are a nice reminder that we need to merge manually).

@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Mar 7, 2018

Not sure what you meant by

Also, rebase Changes (but not the bootstrap compilers, they are a nice reminder that we need to merge manually).

This PR now only include 2 bootstraps

@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Mar 13, 2018

I guess this PR needs a formal approval and a core-dev to manual merge & redo the bootstraps.
@nojb, would you be able to do this ?

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 13, 2018

@hhugo Yes, will do.

nojb added a commit that referenced this pull request Mar 15, 2018
Rebase of #1617 (make string/bytes distinguishable in bytecode)
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Mar 15, 2018

Merged as #1659, thanks!

@nojb nojb closed this Mar 15, 2018
@hhugo
Copy link
Copy Markdown
Contributor Author

hhugo commented Mar 15, 2018

Thanks for all the boostraps

@hhugo hhugo deleted the safe-string2 branch March 15, 2018 19:47
@gasche gasche mentioned this pull request Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants