Skip to content

switch to safe-string by default (related to MPR#7113)#1252

Merged
damiendoligez merged 3 commits intoocaml:trunkfrom
damiendoligez:safe-string-by-default
Sep 15, 2017
Merged

switch to safe-string by default (related to MPR#7113)#1252
damiendoligez merged 3 commits intoocaml:trunkfrom
damiendoligez:safe-string-by-default

Conversation

@damiendoligez
Copy link
Copy Markdown
Member

@damiendoligez damiendoligez commented Jul 19, 2017

I think it's time to switch to safe-string mode by default, the last step before eliminating unsafe strings.

The default can still be overridden at compile time with the -unsafe-string flag.

We should leave ample time for discussion (and changing OPAM packages) before we merge this.

@damiendoligez damiendoligez changed the title switch to safe-string by default switch to safe-string by default (related to MPR#7113) Jul 19, 2017
@alainfrisch
Copy link
Copy Markdown
Contributor

If I remember correctly, if you configure with -safe-string, you are not allowed to compile with -unsafe-string anymore. This is to allow the compiler perform optimizations whose correctness depend on immutable strings.

It would be useful to get an idea of the impact of the change on OPAM. I assume that this PR should make it easy.

@damiendoligez damiendoligez force-pushed the safe-string-by-default branch from 5df3f15 to b28a918 Compare July 19, 2017 14:11
@damiendoligez
Copy link
Copy Markdown
Member Author

@alainfrisch I just realized that and changed my patch to change only the command-line default when configured without -safe-string (i.e. the default). I think it's too early to force safe-string without providing a workaround.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 19, 2017

As the maintainer of a library that is, as of yet, not fully -safe-string, I have come to believe that the changes will happen when maintainers cannot do without it, that is when -safe-string becomes the default, instead of in preparation of this moment. So I support switching to -safe-string now, and I make the easy prediction that essentially all of OPAM will break. This is fine. We will try to fix the ecosystem before the release¹, succeed partially, and the rest maintainers will deal with after the first release with the new default.

¹: this supports the idea of making the change early in the release cycle to give us more time. But we also need the tools to look at the opam ecosystem, the only tool I know how to use is opam-builder and it is not running currently (I'd like to work on that but not before mid-September).

@damiendoligez
Copy link
Copy Markdown
Member Author

As far as tools are concerned, I'm working on a new version of opamcheck, specifically designed to compare versions of OCaml wrt OPAM compilation.

Preliminary version available at https://github.com/janestreet/opamcheck in src directory, but there is no documentation.

@Octachron
Copy link
Copy Markdown
Member

Should we advertise this change starting now to give people more time to react ?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 19, 2017

Sure, as long as we have a story about how people would test their software to check whether they are fine.

Maybe one thing we could do is to backport the change in an OCaml branch branched off 4.05.0 (or 4.04.3 for that matter), so that people can evaluate the impact on their software without having to also port their code to the trunk.

@Octachron
Copy link
Copy Markdown
Member

I was thinking that the "4.05.0+safe-string" opam switch could cover most testing need. Am I missing something?

@damiendoligez
Copy link
Copy Markdown
Member Author

Maybe one thing we could do is to backport the change in an OCaml branch branched off 4.05.0 (or 4.04.3 for that matter), so that people can evaluate the impact on their software without having to also port their code to the trunk.

No need to do any of that, just tell them to

export OCAMLPARAM=safe-string=1,_

before compiling their code. It even does the right thing if they have the -unsafe-string option in some of their command lines.

@damiendoligez
Copy link
Copy Markdown
Member Author

damiendoligez commented Jul 19, 2017

@Octachron you're missing the case where the fix consists in adding -unsafe-string on the compiler command line.

@dbuenzli
Copy link
Copy Markdown
Contributor

As the maintainer of a library that is, as of yet, not fully -safe-string, I have come to believe that the changes will happen when maintainers cannot do without it, that is when -safe-string becomes the default, instead of in preparation of this moment.

I was fortunate enough to be in the dependency cone of @jpdeplaix who was gently insistant and helpful about this. I believe I personally made the transition in my packages. He may have a qualitative assement of the situation to share.

@kit-ty-kate
Copy link
Copy Markdown
Member

I'm super excited by this change ! :)
Here is some things I noticed so far during the migrations I did:

  • The biggest problem for me was the Pervasives.output function which now have the type out_channel -> bytes -> int -> int -> unit. I found rather difficult to port old code that uses this function because when it is applied to a string (it seems to be often the case), we either need to require OCaml >= 4.02 and use output_substring or, if we want to be backward compatible, use a combination of String.sub and output_string which is of course more costly (this is the concern for PPrint for instance). For libraries with similar functions, there is even a choice to make: continue with the old type string -> int -> int -> unit or change to use the Bytes.t type. Depending on what function the user it inclined to give (in the case of a higher-order function) or what type users are likely to use, there is a design choice to make here to limit changes for the users of the library.
  • A "pattern" that I noticed a few time is the following:
    let buf = String.create 3 in
    buf.[0] <- f some_computation;
    buf.[1] <- g some_other;
    buf.[2] <- h another_one;
    buf
    The straightforward translation would be to change all String functions by their Bytes counterpart and replace the final buf by Bytes.to_string buf, but I feel like it could be greatly improved if we had a String.of_list function of some sort. Luckily for us, most of the "external std libraries" have that one.

I found the rest pretty straightforward, and except for some packages, there is not much to change on average (2 or 3 lines here and there). Now, lots of packages are still not safe-string compatible but I think this PR will make a difference. It would be a great help if someone could relaunch the "opam builder" project (or do something like that specifically for this) to see which packages are to be fixed.

@alainfrisch
Copy link
Copy Markdown
Contributor

A "pattern" that I noticed a few time is the following:

One possibility would be:

String.init 3
  (function
     | 0 ->  f some_computation
     | ...
  )

but in practice this is probably slower than going through Bytes (but perhaps faster than going though a list).

@damiendoligez
Copy link
Copy Markdown
Member Author

@jpdeplaix Do you have suggestions of functions that could be added to the stdlib to make the migration easier?

@kit-ty-kate
Copy link
Copy Markdown
Member

@alainfrisch yeah I though of this one. But we can't ensure that all cases are filled like an of_list function would do and we have to make an unused case | _ -> assert false but moreover it's only available since 4.02.0.

@damiendoligez well, not really because it would break backward compatibility. String.of_list would still be nice to have but it doesn't really help here and I just talked about it just because I think it was worth mentioning this "pattern".
However for the first point about output: in my own opinion, changing its type was a mistake as I think it is probably more used with strings than with bytes (and it makes more sense). But maybe I'm wrong about this and it's the contrary. This cannot be changed of course but I think it was still worth mentioning.

Aside from that, the migration is somewhat easy if you don't care about the "performances" (is some precise cases), you just have to be careful with backward compatibility, if the use of the bytes compatibility package is possible (if not you'd have to use Buffer instead), maybe think about the size of the string. It sometimes gets tricky if you have a string filled with '\0' and you are trying to emulate this with Buffer. And maybe other things I didn't think of.

@xavierleroy
Copy link
Copy Markdown
Contributor

How much breakage does this change cause in OPAM packages? Could we get a report from OCamlpro's OPAM builder tool?

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 10, 2017

Could we get a report from OCamlpro's OPAM builder tool?

(cc @damiendoligez @lefessan)

@kit-ty-kate
Copy link
Copy Markdown
Member

kit-ty-kate commented Sep 11, 2017

I just tried to install the more packages I could and here is the result:
307 installed, 212 failed, 679 aborted
Here is the list of packages with errors related to safe-string (94 packages): https://paste.isomorphis.me/0gq
And here is the details: https://paste.isomorphis.me/tS0

I tested this with OCaml 4.05. A lot of packages couldn't be tested because they are not yet compatible with 4.05 and a lot of packages could also not be tested because of missing system dependencies or missing dependencies in the opam package (astring and cppo_ocamlbuild seem to be fairly common in this regard)

@damiendoligez
Copy link
Copy Markdown
Member Author

I tried it on trunk. The patch breaks 367 (versions of) packages. All of them can be "fixed" by building under export OCAMLPARAM=safe-string=0,_.

So if we merge this now, we'll have to change at least 367 package description files before the 4.06.0 release.

@dbuenzli
Copy link
Copy Markdown
Contributor

All of them can be "fixed" by building under export OCAMLPARAM=safe-string=0,_.

Another fix is to simply constrain the OCaml version (which might be actually easy to do in bulk with opam-ed). This may be a better social "fix" since it will push users of these packages to complain upstream for a safe-string version.

Also I'm not sure counting package versions is a very good metric. E.g. you have 4 versions of xmlm in the opam repo and only the last one supports safe-string. Since xmlm itself always worked under the assumption that clients would not mutate the returned strings (which can now be enforced, yay!) the last safe-string version was not a breaking release and there's no reason to use an earlier version (which IIRC are all non-breaking at the API level). I'm sure this also applies to quite a few other of my packages.

I think a more interesting number would be the number of packages whose last version breaks with this patch.

@xavierleroy
Copy link
Copy Markdown
Contributor

@doligez, if you could share your list of 367 packages that failed, we could collectively analyze it some more. I agree with @dbuenzli that old versions of packages are unlikely to work with safe-strings just because they were created when safe-strings didn't exist!

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 13, 2017

@damiendoligez - the Windows Makefiles need amending. For your convenience, a rebased version is at https://github.com/dra27/ocaml/tree/safe-string-by-default which includes a commit making that change too.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 15, 2017

Oops - yes, indeed! My fervour to see the end of unsafe strings went too far.

@hannesm
Copy link
Copy Markdown
Member

hannesm commented Sep 15, 2017

FWIW I'd be in strong favour to have -safe-string enabled by default!

I went through the 133 unique packages mentioned by @damiendoligez above, and 56 unique packages from that list already have newer versions in opam than mentioned (and thus should nicely work with -safe-string). This leaves us with 77 which actually need some code changes (lots of them are barely maintained (there's only a single release in opam)).

@damiendoligez
Copy link
Copy Markdown
Member Author

Are these just the package that fail on their own, or also those for
which some transitive dependency would fail?

Only the ones that fail on their own. Let me see if I can compute the transitive closure.

@damiendoligez
Copy link
Copy Markdown
Member Author

120 dep-failures in 62 unique packages. An unknown number of them will become hard failures when their deps get fixed.

acme  0.1
aifad  2.0.8  2.0.7
alcotest  0.4.2  0.4.1  0.4.0
alt-ergo  0.99.1  0.95.2
altgr-ergo  1.30  1.01  0.99.1  0.95.2
apron  20160125  20160108  20151015  20150930  20150820
atdgen  1.10.0  1.9.1
bddapron  2.3.1  2.3.0  2.2.4  2.2.3
bitcoin  2.0
cmark  0.2.0
cpdf  2.2.1  2.1.1
elf2json  1.0.0
ezjsonm  0.2.0  0.1.0
ezxmlm  1.0.1  1.0.0
flac  0.1.2  0.1.1
frama-c-base  20170501  20161101
frama-c-e-acsl  0.5
fury-puyo  0.5
gasoline  0.5.0  0.4.0  0.3.0
gdal  0.9.0  0.6.1  0.6.0  0.5.0  0.4.0  0.3.0
genspio  0.0.0
graphicspdf  2.2.1  1.1
imap  1.1.1  1.0  0.20.0
jsonm  0.9.1
lastfm  0.3.1  0.3.0
libudev  0.2  0.1.1  0.1
lilis  0.2.1
lipsum  0.2
llvmgraph  0.1
mirari  0.9.2  0.9.1  0.9.0
mtl  1.0.0
netlink  0.2.1
ocamlrss  2.0
ocephes  0.8.1
ocp-index  1.1.5
ocp-ocamlres  0.3  0.1
opencc  0.4.3-0.1.1
oplay  1.0.0
opus  0.1.2  0.1.1  0.1.0
otfm  0.2.0  0.1.0
polyglot  1.0.0
quest  0.1
redis  0.3.3  0.3.2  0.3.1
satML-plugin  0.99.1
schroedinger  0.1.1  0.1.0
solvuu-build  0.3.0  0.2.0  0.1.0
speex  0.2.1
tgls  0.8.3
theora  0.3.1
topology  0.1.0
tptp  0.3.2  0.3.1  0.3.0
tsdl  0.9.0  0.8.2  0.8.1
tyxml  3.6.0  3.5.0  3.4.0  3.3.0
unix-sys-resource  0.1.2
uucd  3.0.0  2.0.0  1.0.0  0.9.0
uunf  0.9.0
uuseg  0.9.0  0.8.0
vorbis  0.7.0
why3-base  0.86.2  0.86
xmldiff  0.5.0  0.2  0.1
xmlplaylist  0.1.4

@damiendoligez
Copy link
Copy Markdown
Member Author

I'm going with @avsm's advice and merging this in 4.06. We'll see how the OPAM work goes...

@damiendoligez damiendoligez merged commit e8d0abb into ocaml:trunk Sep 15, 2017
@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 30, 2017

It is a bit frustrating that there exists no configure-level switch to revert the effect of this patch. I would like an -unsafe-string configure-time option to exist, that would revert the compile-time default to not enabling safe-string.

Right now it seems to me that the number of broken opam package is fairly large. Users, or maintainers of a given package, may be able to update their own build scripts to sprinkle -unsafe-string into all compiler invocations, but they won't know how to fix their dependencies in the same way. Having a configure-time-level escape hatch allows us to create an alternative (and less well-considered) world in which they can stay while the OPAM universe updates itself.

(Incidentally, this would also help triaging before the release, by making it easier to ignore the safe-string issue to find the other 4.06-related regressions. Right now, -safe-string apparently breaks sexplib, which means that half the ecosystem is down.)

@kit-ty-kate
Copy link
Copy Markdown
Member

@gasche -unsafe-string is not available into compiler invocations as configure-time safe-string doesn't allow it. However, the export OCAMLPARAM=safe-string=0,_ escape hatch proposed by @damiendoligez works. Unfortunately it seems quite hard to use it for every broken packages, however you could do your own opam switch using this method: https://github.com/ocaml/opam-repository/blob/master/compilers/4.01.0/4.01.0%2Bprofile/4.01.0%2Bprofile.comp#L18

sexplib/base maintainers are aware of that but maybe a little ping won't harm (cc @trefis)

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 30, 2017

-unsafe-string is not available into compiler invocations as configure-time safe-string doesn't allow it

No, -unsafe-string is available in 4.06 today and will be for the release, see the discussion above.

the export OCAMLPARAM=safe-string=0,_ escape hatch proposed by @damiendoligez works

I believe it is wrong to use OCAMLPARAM for this purpose. I'm preparing a proper configure-time solution, sending a PR later.

@kit-ty-kate
Copy link
Copy Markdown
Member

No, -unsafe-string is available in 4.06 today and will be for the release, see the discussion above.

Oh, right. Sorry.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 30, 2017

No problem; the choices here are a bit tricky, and it's good to discuss them more.

@damiendoligez damiendoligez deleted the safe-string-by-default branch October 2, 2017 11:34
tmcgilchrist pushed a commit to tmcgilchrist/bondi that referenced this pull request May 20, 2018
…ing.

This breaks the code that uses the string type as mutable strings. See ocaml/ocaml#1252
tmcgilchrist pushed a commit to tmcgilchrist/podge that referenced this pull request May 21, 2018
shindere added a commit to shindere/ocaml that referenced this pull request Sep 11, 2022
…uite

PR ocaml#1252 has made the safe-string mode the default so passing -safe-string
explicitly has become useless since this PR has been merged, even more
useless since support for mutable strings was removed in OCaml 5.0.
shindere added a commit to shindere/ocaml that referenced this pull request Sep 11, 2022
…uite

PR ocaml#1252 has made the safe-string mode the default so passing -safe-string
explicitly has become useless since this PR has been merged, even more
useless since support for mutable strings was removed in OCaml 5.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.