make string/bytes distinguishable in the underlying compiler implementation by bobzhang · Pull Request #596 · ocaml/ocaml

bobzhang · 2016-05-24T14:03:35Z

This is one part of the patch required for bucklescript compiler, but I think it would be generally helpful to other backends.

The patch will be composed of such components:

All the lambda primtives (Pstring_, Pbytes_)
Runtime c function (for example, caml_blit_[bytes|string]
Adapt the stdlib to pick the right primitives in string/bytes module
Propagate the primitives in native backend, for this revision, string|bytes will be treated in the same way in native backend

It is quite a large patch, but the patch itself is not complex

bobzhang · 2016-05-24T14:30:52Z

@damiendoligez are you happy with this direction?
@chambart would you have a look at the propagation in middle_end/asmcomp? Thanks

bobzhang · 2016-05-25T14:54:44Z

ready for review.

two minor issues:

external caml_create_string : int -> bytes
external caml_fill_string : bytes -> int -> int -> char -> unit

The name is a bit misleading, however, if we rename it into caml_create_bytes, it might break backward compatiblity, @alainfrisch what do you think?

hhugo · 2016-05-25T15:07:28Z

bytecomp/bytegen.ml

        comp_init env sz decl_size
      end
-  | Lprim((Pidentity | Popaque), [arg]) ->
+  | Lprim((Pidentity | Popaque | Pbytes_to_string | Pbytes_of_string), [arg]) ->


Why do you treat Pbytes_to_string & Pbytes_of_string specially ?

this revision does do any change to the code gen, for both native and bytecode gen, but it is a step to propagate it into bytecode or native backend, you can send another PR after this is merged

Ok. Then I don't understand why you have some changes in byterun/io.c and byterun/str.c

the goal is propagating the information while not impacting the semantics of code generator for this revision(it already touched 25 files and needs bootstrapping), once we change the code generation, it might introduce some subtle bugs if it is not complete.
As I said, it is a good step to propagate it into bytecode/native backend once you have confidence that it is complete, but I won't do it in this revision

Ok. Then I don't understand why you have some changes in byterun/io.c and byterun/str.c

I understand that you don't want to change the code generation, but in that case you probably don't want to change the runtime either. that would make this revision touch 23 files only

those two primitives are used in Pervasives.unsafe_output, Pervasives.unsafe_output_string, String.unsafe_blit, Bytes.unsafe_blit
The changes are minimal

I can see there are used.
It just feels inconsistent to only partially add primitives for bytes.

btw, what about external equal : t -> t -> bool = "caml_string_equal" used in bytes.ml ?

good point, I missed it, will add it -- this is the reason I don't want to change the codegen for this revision, in case I missed something

Don't modify the runtime in this feature, and you won't miss anything.
Anyway, it does not really matter.

damiendoligez · 2016-05-26T12:45:42Z

I'm OK with this idea, but I'd like to hear the opinions of back-end experts (@hhugo, @mshinwell @chambart, anyone else).

One remark: I don't understand why you keep Pstringsets and Pstringsetu : the whole point of the bytes/string distinction is to get rid of them, and indeed they are not used in the standard library and not reachable by user code.

bobzhang · 2016-05-26T13:08:50Z

@damiendoligez I will remove Pstringsets and Pstringsetu

bobzhang · 2016-05-26T13:17:38Z

@damiendoligez removing Pstringsets will break some backward compatibility, there might be some code using %string_safe_set, are you okay with that?

damiendoligez · 2016-06-02T13:22:11Z

@damiendoligez removing Pstringsets will break some backward compatibility, there might be some code using %string_safe_set, are you okay with that?

Since it's provided as Bytes.set (and String.set) there's no excuse for using %string_safe_set directly, so I'm OK with breaking such code.

Or you might want to translate it into Pbytessets instead. In any case, a back-end that separates immutable string from mutables bytes will have to fail on Pstringsets (or map it to Pbytessets instead).

bobzhang · 2016-06-02T17:06:35Z

@damiendoligez removed Pstringset(s|u) per suggestions

bobzhang · 2016-06-16T11:47:02Z

ping @damiendoligez , is it good for merge?

hhugo · 2016-06-16T12:06:13Z

stdlib/bytes.ml

-external set : bytes -> int -> char -> unit = "%string_safe_set"
+external get : bytes -> int -> char = "%bytes_safe_get"
+external set : bytes -> int -> char -> unit = "%bytes_safe_set"
 external create : int -> bytes = "caml_create_string"


external create : int -> bytes = "caml_create_bytes" ?

see my comments above:

external caml_create_string : int -> bytes external caml_fill_string : bytes -> int -> int -> char -> unit

indeed, the name is a bit misleading, either we stick to the name for better backward compatibility or we change the c primitive names or we add a duplication, I am happy with either, @hhugo @damiendoligez what do you think

you've started partially duplicating string primitives : caml_ml_output_bytes caml_blit_bytes caml_caml_bytes_equal.
I would duplicate all or none.

bobzhang · 2016-06-17T16:47:50Z

@hhugo replaced caml_fill_string, caml_create_string per suggestions

bobzhang · 2016-06-30T13:26:32Z

ping @damiendoligez @gasche ? I would say this is a safe internal change, let me know anything I can do to help it get merged, thanks!

alainfrisch · 2016-06-30T14:56:54Z

byterun/str.c

+{
+  mlsize_t size = Long_val(len);
+  if (size > Bsize_wsize (Max_wosize) - 1){
+    caml_invalid_argument("String.create");


"Bytes.create"?

bobzhang · 2016-08-03T15:02:19Z

Hi all, I addressed Alain's comments and applied Hugo's patch. (I am not very confident to apply other patches in bytegen, but it is a good step in this direction)
Please review again, if everyone is happy, should I do the rebase against 4.04 or trunk?

damiendoligez · 2016-08-08T15:06:41Z

Reviewed, slightly modified, merged, and cherry-picked to 4.04.

mshinwell · 2016-08-08T15:22:08Z

I thought this was going to be deferred until 4.05?

damiendoligez · 2016-08-09T14:08:41Z

It is quite a low-risk change, useful in the short term for some users and in the long term for js_of_ocaml.

gasche · 2016-08-12T12:11:42Z

This PR breaks the compilation of Flowcaml, that uses

external set : ('a, 'b) t -> 'b int -> 'a char -{'b ||}-> unit 
               with 'b < 'a
= "%string_safe_set"

in a module (the weird types are FlowCaml types, the point is the external name) and include String as the implementation.

opam-builder report: http://opam.ocamlpro.com/builder/html/flowcaml/flowcaml.1.07/c9f316bcefc4331699e41ea036cce35c

One may want to debate whether this is a problematic breakage or not (possibly not). I do wonder whether keeping the legacy primitive names for the String functions, and using the new names for the Bytes function would be acceptable for the backends.

Note that this PR does not seem to have a proper Changes entry (it should be marked as compatibility-breaking), the only change entry in the PR is

+* GPR#442, caml_fill_string, and caml_create_string is deprecated 
+  and will be removed in the future, please use caml_fill_bytes 
+  and caml_create_bytes for migration 
+  (Hongbo Zhang, review by Damien Doligez, Alain Frisch, and Hugo Heuzard)

which does not list this PR number and does not mention the other primitive name changes.

bobzhang · 2016-08-12T13:09:17Z

hi, @gasche I did discuss this issue with @damiendoligez before, shall I create a new PR to enhance Changes entry?
Btw, I thought * means breaking change, did I miss anything?

gasche · 2016-08-17T16:46:49Z

Another package that is broken by this change is rml.

opam-builder report:
http://opam.ocamlpro.com/builder/html/rml/rml.1.09.04/c52b6f5e8a4b821200e5ae80b2e54cc7

gasche · 2016-08-17T21:54:08Z

wget is also broken by this change

opam-builder report: http://opam.ocamlpro.com/builder/html/wget/wget.0.1.0/b51a913249637bfbe31a754953ba2135

gasche · 2016-08-17T22:14:08Z

@damiendoligez, @bobzhang I'm getting concerned with the number of unrelated packages hit by this issue (Batteries was also hit, of course, but that is more to be expected).

Is there anything wrong with keeping the old primitive names caml_string_set and others for mutable string operations, and rewriting them into the corresponding Bytes operation in the compiler code? If I understand correctly this would still be reasonable for alternative backend (there is no confusion on which types these operations should have in a -safe-string world), and it would remove this source of breakage.

gasche · 2016-08-17T22:15:36Z

If you were to both agree that this is a reasonable compromise, I could send a PR against the ocaml codebase reverting this part of the change (or @bobzhang might want to implement it directly), to be integrated in a second beta version for 4.04.

bobzhang · 2016-08-18T00:41:30Z

@gasche If we are moving in the direction of -safe-string, we are going to break the existing code any way, I would see people to fix their code sooner than later.
see the signature https://github.com/samoht/ocaml-wget/blob/master/src/stringext.mli#L19

 external set : string -> int -> char -> unit = "%string_safe_set"

This signature does not make sense under -safe-string, right? How is it different that fix the code in 4.04 vs 4.05?

gasche · 2016-08-18T05:39:33Z

What you say is not unreasonable. I still think it's a pity that software that has been working for years is broken by a change that is not, in itself, necessary, and doesn't come with the advantages of making -safe-string the default, but I guess that at this point in release time in may be too late to change back anyway so maybe we should go with it.

bobzhang · 2016-08-18T13:25:53Z

@gasche That's my argument but I don't have strong opinions here, I can send a PR which treats %string_safe_set the same as %bytes_safe_set so it will not break other people's code. My concerns are that this actually will make the transition from -unsafe-string to -safe-string slower. I think the consensus is that in the long term that we should only support -safe-string . @damiendoligez what do you think?

bobot · 2016-08-18T14:48:27Z

I think the problem is that a compiler primitive is used outside the standard library. It gives a cent to @mshinwell idea in #724 :

However I am interested in pursuing a little this idea of having a flag [(set by default)] to disallow % primitives, and then just having a normal interface. Are there any opinions on that?

I would prefer if we could not break working code, but have a transition plan ("safe-string" and "no-primitive-outside-stdlib") with the release where the defaults will be changed known in advance.

gasche · 2016-08-18T15:41:41Z

One of the reasons why I'm ready to concede this specific point is that the packages that are broken by this change appear to be maintained at a very low frequency (FlowCaml hasn't changed in eight years), so it's not clear that a well-defined, well-announced migration path would actually do a big difference compared to the current breakage.

(Well-handled transitions have the property that maintainers fix the issue before it starts breaking code. Low-frequency maintainers only fix issues when they break code, so transitions don't make much of a difference.)

damiendoligez · 2016-08-18T17:35:32Z

@bobzhang Is is possible to simply make %string_safe_set an alias for %bytes_safe_set, so that the compilers would generate exactly the same code for both? This way, the compatibility with old code is maintained, but we can use the new name in the stdlib.

This means the legacy code won't be compatible with bucklescript or future versions of js_of_ocaml, but in a world where the back-ends need to know the difference between string and bytes, it isn't possible to mutate a string, so both primitives should operate on bytes anyway.

bobzhang · 2016-08-18T19:02:17Z

Is is possible to simply make %string_safe_set an alias for %bytes_safe_set, so that the compilers would generate exactly the same code for both?

@damiendoligez Yes, I will send a pull request

systhreads: do st_thread_id after initializing the thread descriptor

* Export Cfg_with_layout.print_dot * ocamlformat

bobzhang changed the title ~~[WIP] make string/bytes distinguishable in the underlying compiler implementation~~ make string/bytes distinguishable in the underlying compiler implementation May 25, 2016

hhugo reviewed May 25, 2016
View reviewed changes

gasche added the caml-weekly-news label Jun 15, 2016

hhugo reviewed Jun 16, 2016
View reviewed changes

alainfrisch reviewed Jun 30, 2016
View reviewed changes

bobzhang mentioned this pull request Jul 21, 2016

Optimize some constant string operations #703

Merged

bobzhang added 4 commits August 7, 2016 10:51

Prepare primitives required for safe string

2807297

add needed lambda IR for safe string

a36cd4c

bootstrap with new lambda IR

5329853

apply changes to stdlib and test suite

82d2375

bobzhang force-pushed the safe_string branch from ea0e899 to a0e6662 Compare August 7, 2016 15:28

bobzhang added 3 commits August 7, 2016 11:41

bootstrap with new stdlib version

4a7c64e

remove string_(un)safe_set

eb0c655

bootstrap after removing string_set

6766f7c

bobzhang force-pushed the safe_string branch from a0e6662 to 6766f7c Compare August 7, 2016 15:43

fix change logs

be65331

damiendoligez merged commit be65331 into ocaml:trunk Aug 8, 2016

bobzhang mentioned this pull request Aug 19, 2016

Provides better compatibility for primitive %string_safe_set #772

Closed

hannesm mentioned this pull request Sep 26, 2016

OPAM does not compile with recent version of OCaml ocaml/opam#2680

Closed

bobzhang deleted the safe_string branch October 12, 2016 01:36

gasche mentioned this pull request Nov 29, 2016

Synchronized Pervasives implementation between stdlib and threads #943

Merged

dra27 mentioned this pull request Dec 13, 2016

Rebase 4.04.0 #962

Merged

hhugo mentioned this pull request Feb 17, 2018

make string/bytes distinguishable in bytecode #1617

Closed

yallop mentioned this pull request Jun 14, 2021

Use OCaml 4.03-specific features yallop/ocaml-ctypes#674

Open

13 tasks

EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Jul 23, 2021

Merge pull request ocaml#596 from Engil/fix_c_thread_register

4449587

systhreads: do st_thread_id after initializing the thread descriptor

stedolan pushed a commit to stedolan/ocaml that referenced this pull request May 24, 2022

Export Cfg_with_layout.print_dot (ocaml#596)

6a33212

* Export Cfg_with_layout.print_dot * ocamlformat

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Fix link to diskuv-ocaml (ocaml#596)

8dc0202

Conversation

bobzhang commented May 24, 2016

Uh oh!

bobzhang commented May 24, 2016

Uh oh!

bobzhang commented May 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damiendoligez commented May 26, 2016

Uh oh!

bobzhang commented May 26, 2016

Uh oh!

bobzhang commented May 26, 2016

Uh oh!

damiendoligez commented Jun 2, 2016

Uh oh!

bobzhang commented Jun 2, 2016

Uh oh!

bobzhang commented Jun 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobzhang commented Jun 17, 2016

Uh oh!

bobzhang commented Jun 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bobzhang commented Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

damiendoligez commented Aug 8, 2016

Uh oh!

mshinwell commented Aug 8, 2016

Uh oh!

damiendoligez commented Aug 9, 2016

Uh oh!

gasche commented Aug 12, 2016

Uh oh!

bobzhang commented Aug 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Aug 17, 2016

Uh oh!

gasche commented Aug 17, 2016

Uh oh!

gasche commented Aug 17, 2016

Uh oh!

gasche commented Aug 17, 2016

Uh oh!

bobzhang commented Aug 18, 2016

Uh oh!

gasche commented Aug 18, 2016

Uh oh!

bobzhang commented Aug 18, 2016

Uh oh!

bobzhang commented May 25, 2016 •

edited

Loading

bobzhang commented Aug 3, 2016 •

edited

Loading

bobzhang commented Aug 12, 2016 •

edited

Loading