make string/bytes distinguishable in the underlying compiler implementation#596
make string/bytes distinguishable in the underlying compiler implementation#596damiendoligez merged 8 commits intoocaml:trunkfrom
Conversation
|
@damiendoligez are you happy with this direction? |
|
ready for review. two minor issues: external caml_create_string : int -> bytes
external caml_fill_string : bytes -> int -> int -> char -> unitThe name is a bit misleading, however, if we rename it into |
bytecomp/bytegen.ml
Outdated
| comp_init env sz decl_size | ||
| end | ||
| | Lprim((Pidentity | Popaque), [arg]) -> | ||
| | Lprim((Pidentity | Popaque | Pbytes_to_string | Pbytes_of_string), [arg]) -> |
There was a problem hiding this comment.
Why do you treat Pbytes_to_string & Pbytes_of_string specially ?
There was a problem hiding this comment.
this revision does do any change to the code gen, for both native and bytecode gen, but it is a step to propagate it into bytecode or native backend, you can send another PR after this is merged
There was a problem hiding this comment.
Ok. Then I don't understand why you have some changes in byterun/io.c and byterun/str.c
There was a problem hiding this comment.
the goal is propagating the information while not impacting the semantics of code generator for this revision(it already touched 25 files and needs bootstrapping), once we change the code generation, it might introduce some subtle bugs if it is not complete.
As I said, it is a good step to propagate it into bytecode/native backend once you have confidence that it is complete, but I won't do it in this revision
There was a problem hiding this comment.
Ok. Then I don't understand why you have some changes in byterun/io.c and byterun/str.c
I understand that you don't want to change the code generation, but in that case you probably don't want to change the runtime either. that would make this revision touch 23 files only
There was a problem hiding this comment.
those two primitives are used in Pervasives.unsafe_output, Pervasives.unsafe_output_string, String.unsafe_blit, Bytes.unsafe_blit
The changes are minimal
There was a problem hiding this comment.
I can see there are used.
It just feels inconsistent to only partially add primitives for bytes.
There was a problem hiding this comment.
btw, what about external equal : t -> t -> bool = "caml_string_equal" used in bytes.ml ?
There was a problem hiding this comment.
good point, I missed it, will add it -- this is the reason I don't want to change the codegen for this revision, in case I missed something
There was a problem hiding this comment.
Don't modify the runtime in this feature, and you won't miss anything.
Anyway, it does not really matter.
|
I'm OK with this idea, but I'd like to hear the opinions of back-end experts (@hhugo, @mshinwell @chambart, anyone else). One remark: I don't understand why you keep |
|
@damiendoligez I will remove |
|
@damiendoligez removing |
Since it's provided as Or you might want to translate it into |
|
@damiendoligez removed Pstringset(s|u) per suggestions |
|
ping @damiendoligez , is it good for merge? |
stdlib/bytes.ml
Outdated
| external set : bytes -> int -> char -> unit = "%string_safe_set" | ||
| external get : bytes -> int -> char = "%bytes_safe_get" | ||
| external set : bytes -> int -> char -> unit = "%bytes_safe_set" | ||
| external create : int -> bytes = "caml_create_string" |
There was a problem hiding this comment.
external create : int -> bytes = "caml_create_bytes" ?
There was a problem hiding this comment.
see my comments above:
external caml_create_string : int -> bytes
external caml_fill_string : bytes -> int -> int -> char -> unitindeed, the name is a bit misleading, either we stick to the name for better backward compatibility or we change the c primitive names or we add a duplication, I am happy with either, @hhugo @damiendoligez what do you think
There was a problem hiding this comment.
you've started partially duplicating string primitives : caml_ml_output_bytes caml_blit_bytes caml_caml_bytes_equal.
I would duplicate all or none.
|
@hhugo replaced caml_fill_string, caml_create_string per suggestions |
|
ping @damiendoligez @gasche ? I would say this is a safe internal change, let me know anything I can do to help it get merged, thanks! |
byterun/str.c
Outdated
| { | ||
| mlsize_t size = Long_val(len); | ||
| if (size > Bsize_wsize (Max_wosize) - 1){ | ||
| caml_invalid_argument("String.create"); |
|
Hi all, I addressed Alain's comments and applied Hugo's patch. (I am not very confident to apply other patches in bytegen, but it is a good step in this direction) |
|
Reviewed, slightly modified, merged, and cherry-picked to 4.04. |
|
I thought this was going to be deferred until 4.05? |
|
It is quite a low-risk change, useful in the short term for some users and in the long term for js_of_ocaml. |
|
This PR breaks the compilation of in a module (the weird types are FlowCaml types, the point is the external name) and opam-builder report: http://opam.ocamlpro.com/builder/html/flowcaml/flowcaml.1.07/c9f316bcefc4331699e41ea036cce35c One may want to debate whether this is a problematic breakage or not (possibly not). I do wonder whether keeping the legacy primitive names for the Note that this PR does not seem to have a proper Changes entry (it should be marked as compatibility-breaking), the only change entry in the PR is which does not list this PR number and does not mention the other primitive name changes. |
|
hi, @gasche I did discuss this issue with @damiendoligez before, shall I create a new PR to enhance Changes entry? |
|
Another package that is broken by this change is opam-builder report: |
|
opam-builder report: http://opam.ocamlpro.com/builder/html/wget/wget.0.1.0/b51a913249637bfbe31a754953ba2135 |
|
@damiendoligez, @bobzhang I'm getting concerned with the number of unrelated packages hit by this issue (Batteries was also hit, of course, but that is more to be expected). Is there anything wrong with keeping the old primitive names |
|
If you were to both agree that this is a reasonable compromise, I could send a PR against the ocaml codebase reverting this part of the change (or @bobzhang might want to implement it directly), to be integrated in a second beta version for 4.04. |
|
@gasche If we are moving in the direction of external set : string -> int -> char -> unit = "%string_safe_set"This signature does not make sense under |
|
What you say is not unreasonable. I still think it's a pity that software that has been working for years is broken by a change that is not, in itself, necessary, and doesn't come with the advantages of making |
|
@gasche That's my argument but I don't have strong opinions here, I can send a PR which treats |
|
I think the problem is that a compiler primitive is used outside the standard library. It gives a cent to @mshinwell idea in #724 :
I would prefer if we could not break working code, but have a transition plan ("safe-string" and "no-primitive-outside-stdlib") with the release where the defaults will be changed known in advance. |
|
One of the reasons why I'm ready to concede this specific point is that the packages that are broken by this change appear to be maintained at a very low frequency (FlowCaml hasn't changed in eight years), so it's not clear that a well-defined, well-announced migration path would actually do a big difference compared to the current breakage. (Well-handled transitions have the property that maintainers fix the issue before it starts breaking code. Low-frequency maintainers only fix issues when they break code, so transitions don't make much of a difference.) |
|
@bobzhang Is is possible to simply make %string_safe_set an alias for %bytes_safe_set, so that the compilers would generate exactly the same code for both? This way, the compatibility with old code is maintained, but we can use the new name in the stdlib. This means the legacy code won't be compatible with bucklescript or future versions of js_of_ocaml, but in a world where the back-ends need to know the difference between |
@damiendoligez Yes, I will send a pull request |
systhreads: do st_thread_id after initializing the thread descriptor
* Export Cfg_with_layout.print_dot * ocamlformat
This is one part of the patch required for bucklescript compiler, but I think it would be generally helpful to other backends.
The patch will be composed of such components:
It is quite a large patch, but the patch itself is not complex