Skip to content

Enable force-safe-string by default#1859

Merged
shindere merged 6 commits intoocaml:trunkfrom
kit-ty-kate:force-safe-string
Sep 10, 2019
Merged

Enable force-safe-string by default#1859
shindere merged 6 commits intoocaml:trunkfrom
kit-ty-kate:force-safe-string

Conversation

@kit-ty-kate
Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate commented Jun 24, 2018

This PR forces safe strings globally, effectively ending the possibility for projects or modules to use unsafe strings (previously enabled by -unsafe-string or setting OCAMLPARAM=safe-string=0,_).

Rationales are:

  • Safe strings have been available since 4.02 (4 years ago)
  • Safe strings have been the default since 4.06 (7 months ago, see switch to safe-string by default (related to MPR#7113) #1252)
  • Very few opam packages still use unsafe strings: I'm only aware of haxe.3.4.7 and camlpdf.2.2.1, maybe some others still use it under the table but very few of them remain.
  • Deleting code is always good, it simplifies maintenance.

I think it would be good to have it in 4.08 but if not, at least this PR will remain open to be merged for a later release.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 24, 2018

There is an intermediary knob that we could turn before the one you propose, which is to enable -force-safe-string by default. I think it is an important intermediate step to make. -force-safe-string (the mode where linking with unsafe-string code is impossible) changes some runtime internals as it can now assume that all runtime strings are immutable, and as of today I don't believe that those runtime changes has been widely tested, because they are not enabled by default.

Having -force-safe-string by default but -no-force-safe-string still available is thus a good compromise, where we have the option to turn back if a problem arises -- by changing the default flags in opam and in distributions, for example.

If we release a version of OCaml with your more radical patch, there is no way to revert back to -no-force-safe-string without patching the compiler codebase. That sounds like a step to take later, once -force-safe-string has seen widespread usage for at least a release cycle.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 24, 2018

P.S.: If some people want more context about the configure-time flags for string mutability, the ./configure comment should provide it:

# There are two configure-time string safety options,
# -(no-)force-safe-string and -default-(un)safe-string that
# interact with a compile-time (un)safe-string option.
#
# If -force-safe-string is set at configure time, then the compiler
# will always enforce that string and bytes are distinct: the
# compile-time -unsafe-string option is disabled. This lets us
# assume pervasive string immutability, for code optimizations and
# in the C layer.
#
# If -no-force-safe-string is set at configure-time, the compiler
# will use the compile-time (un)safe-string option to decide whether
# string and bytes are compatible on a per-file basis. The
# configure-time options default-(un)safe-string decide which
# setting will be chosen by default, if no compile-time option is
# explicitly passed.
#
# The configure-time behavior of OCaml 4.05 and older was equivalent
# to -no-force-safe-string -default-unsafe-string. OCaml 4.06
# uses -no-force-safe-string -default-safe-string. We
# expect -force-safe-string to become the default in the future.

@kit-ty-kate
Copy link
Copy Markdown
Member Author

@gasche I don't really understand you "runtime change" arguments. The changes here are really minimal (less than 20 lines there were already there and used anyway) and I don't see how it could go wrong considering the amount of people already using 4.06 (and soon 4.07) already.
If there really is a problem with the runtime changes, then if it's caught after the release, switching back to -no-force-safe-string isn't really helping compared to making a new minor release which fixes it because users would have to reinstall/upgrade their switch anyway.

But I guess I can make another PR for what you are suggesting if you really want to, but apart from a few days delay to fix an hypothetical problem I don't really see the point, personally (but maybe I'm missing something in the picture)

@kit-ty-kate
Copy link
Copy Markdown
Member Author

Apart from that, does anybody have any idea why the CIs are failing ?

@alainfrisch
Copy link
Copy Markdown
Contributor

I encourage the efforts to move towards the removal of the legacy mode. But since the current default still allows for enabling unsafe-string at compile-time, there was no real strong pressure for projects to switch to it, it was just a matter of locally tweaking the build system of projects not ready for safe-strings. A version where the default configuration prevents building projects not prepared for safe-strings will apply a much greater pressure to authors to upgrade their code, while still avoiding the situation where people cannot upgrade to a more recent OCaml because of some dependencies not ready for safe-strings.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 25, 2018

@kit-ty-kate Using -force-safe-string changes the runtime code compared to just using -default-safe-string. These changes are not tested by most users of 4.06 and 4.07, because -force-safe-string is not the default, -no-force-safe-string -default-safe-string is the default.

@alainfrisch
Copy link
Copy Markdown
Contributor

Very few opam packages still use unsafe strings: I'm only aware of haxe.3.4.7 and camlpdf.2.2.1

These are important packages, though. I've submitted a PR for camlpdf. cc: @ncannasse regarding haxe.

@skaller
Copy link
Copy Markdown

skaller commented Jun 25, 2018

my system used OCS Scheme and Dypgen neither of which works with safe strings. I have cut back OCS Scheme to use only safe strings and removed a lot of Scheme due to removal of big nums. I have modified Dypgen's run time library but have been unable to fix the parser generator executable itself.

I do not use opam and will not do so. It doesn't work on Windows and it introduces an unnecessary dependency since my system only builds a single Ocaml executable (which is a compiler which generates C++, the resulting code having no dependency on Ocaml).

The original authors of the two packages I have used have failed to upgrade them or take notice of my changes. Breaking legacy code without an automated upgrade path is a symptom of poor design and a failure to support client base which does not suggest reliability or stability of the Ocaml system.

@damiendoligez
Copy link
Copy Markdown
Member

But since the current default still allows for enabling unsafe-string at compile-time, there was no real strong pressure for projects to switch to it

The pressure comes from the OPAM repo maintainers, who systematically push back against patches that just change the compilation options of a package (in the opam file) and request that the upstream source be fixed instead.

@alainfrisch
Copy link
Copy Markdown
Contributor

and request that the upstream source be fixed instead.

Upstream authors still had the possibility to add -safe-string in their build system; as long as the default configure mode did not prevent that, the pressure to actually modify the code was still gentle. Switching the default configure mode to break such package is probably necessary at some point, but already a bit harsh (users of those package will have to using a special compiler switch, but it's fine with OPAM); then wait for the next release to drop completely support for unsafe strings.

@alainfrisch
Copy link
Copy Markdown
Contributor

Also, @skaller reports that more packages are apparently not ready for safe strings, including some of them part of OPAM. One would need to be more confident in the impact assessment before breaking the default mode, I'd say.

@ncannasse
Copy link
Copy Markdown

I think the next step would be to turn warnings into errors. Regarding haxe we already moved to safe strings in our 4.x version which is currently in development.

@damiendoligez
Copy link
Copy Markdown
Member

@alainfrisch I'm basically in agreement with you, and let's not forget orphaned/unmaintained packages too.

@avsm
Copy link
Copy Markdown
Member

avsm commented Jun 27, 2018

opam ci status -f variants:ss shows a significant number of packages that are still broken in opam due to the move to safe-string. Going for the intermediate option seems essential before we entirely remove the functionality.

@pmetzger
Copy link
Copy Markdown
Member

pmetzger commented Jun 27, 2018

and let's not forget orphaned/unmaintained packages too.

I keep thinking it would be good to have a repository for orphaned packages to add patches for such things as safe strings.

@skaller
Copy link
Copy Markdown

skaller commented Jun 29, 2018

Please lets not forget not everyone uses OPAM.

First it doesn't work on Windows. I mean the real Windows using MSVC tools.

Second, requiring OPAM in a system which has a mix of tools is a major extra burden for non-Ocaml people. My own system is an example: it's a programming language with a compiler written in Ocaml that generates C++, uses a run time library written in C++, and is designed to interface with C++. I expect clients to install a C++ compiler and several C/C++ libraries but they shouldn't need to know about Ocaml at all. As it is, weak to non-existent support on the most common desktop platform, Windows, makes installing Ocaml a nightmare already.

The fact is the attempt to fix the design fault in the Ocaml compiler causing breakages, together with the removal of bignum support, is enough of a pain to seriously consider using Haskell or even C++ or bootstrapping my system to avoid such issues in the future.

In my view unless an automatic upgrade tool can be written, the change has to be deemed ill considered.

And while on the issue Lexing should be fixed so it can take a value of type bytes (not just string).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 29, 2018

Dear @skaller, we are happy to support non-OPAM workflows. It just comes in the discussion as the most commonly-used package manager for community-contributed OCaml libraries, and thus an easy place for us to monitor to understand the impact of language changes. It also happens that patches to OCaml software get applied at the OPAM level in some occasions, but the author/owner/maintainer of the relevant code is notified, and the patch is publicly available for everyone to apply on their own copy of the software. (In fact OPAM maintainers frown upon those opam-only changes and are trying to stop doing that completely.)

I agree with you that an automated code-migration tool would be nice, but so far none exists for OCaml programs to my knowledge. If you know someone willing to invest the effort in building one, please let us know!

Regarding Lexing, are you asking for a Lexing.from_bytes function, or maybe Lexing.lexeme_bytes? Would you care to submit a patchset (hopefully as a github pull request) to implement it?

@skaller
Copy link
Copy Markdown

skaller commented Jun 29, 2018

Hi @gasche, yes I understand OPAM is popular and serves well as a monitoring tool, and is a pretty good package manager for unix like systems. A bit of a pity its not written in Ocaml using Ocaml's platform independent facilities, but then that would be a lot of work to support mixed Ocaml/C packages.

My comment on migration comes at least from a lot of experience with those kind of issues on the C++ committee regarding both retaining C compatibility as well as compatibility with older C++ versions. Of course the installed base is very much larger there and over much weight given to compatibility can inhibit advances requiring restructuring. However generally if you have very large code bases (millions of lines of code using large numbers of third party packages), an upgrade that leaves such a large client in the lurch is not a good way to encourage growth of the client base. The general rule is to leave old stuff alone and add new stuff, and that rule was broken here.

I've seen some pretty sophisticated analysis tools for Ocaml so my feeling is that until someone produces an automatic upgrade tool, enforcing an upgrade simply isn't viable. I don't have the resources to produces such a tool. I also wouldn't have fixed the compiler bug by forcing users to change perfectly good code.

Regarding Lexing, Lexing.from_bytes is what might help when parser input is constructed non-functionally: Dypgen at least does that, adding #line directives after purely functional text construction, seemingly this was a hack added after the original code was written. Anyhow it seems surprising that the primary use of Lexing.from_string is going to be from constructed text or, possibly text loaded into a buffer from a file, suggesting Lexing.from_bytes would be useful.

Lexemes that require modification are rare and lexemes are usually fairly short so manually converting a string to bytes when required is OK in my opinion (i.e. no Lexing.lexeme_bytes seems required).

@damiendoligez
Copy link
Copy Markdown
Member

In OPAM I count 80 packages whose latest version is broken by safe-string, and whose package description is not yet updated (i.e. they fail to compile but are not yet marked incompatible with 4.06+). And 9 other packages that depend on some of these 80.

And that's not even counting the packages that are marked incompatible but don't have a newer version.

Or any software that's not on OPAM.

I think it's too soon to force safe strings on the OCaml world at this time, so I'll mark this PR as work-in-progress.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jul 4, 2018

In OPAM I count 80 packages whose latest version is broken by safe-string, and whose package description is not yet updated (i.e. they fail to compile but are not yet marked incompatible with 4.06+). And 9 other packages that depend on some of these 80.

Is this list available anywhere?

@damiendoligez
Copy link
Copy Markdown
Member

https://damiendoligez.github.io/opamcheck-results/2018-06-29/index.html

Look for a red uppercase X in the first column.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jul 4, 2018

Look for a red uppercase X in the first column.

Thanks!

Many failures due to setup.ml (generated by oasis) not being safe-string compatible...

@pmetzger
Copy link
Copy Markdown
Member

pmetzger commented Jul 6, 2018

Many failures due to setup.ml (generated by oasis) not being safe-string compatible...

So it should be possible to fix many failures at once just by fixing what oasis generates?

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jul 6, 2018

So it should be possible to fix many failures at once just by fixing what oasis generates?

In theory, yes (at least if the setup script is not committed). I am looking into it.

@pmetzger
Copy link
Copy Markdown
Member

pmetzger commented Jul 6, 2018

Even if the setup script is committed, it should still allow an easier update for the maintainer, yes?

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Aug 28, 2019 via email

@kit-ty-kate
Copy link
Copy Markdown
Member Author

See tests/ppx-contexts/test.ml and look for "unsafe-string" there. Note however that if the -unsafe-string command-line flag is removed and the test.compilers.reference file modified accordingly, then the test works again but there may be a deeper modification to do here.

It was already fixed by ef1ae7c, I'm guessing you must have looked at a previous CI run.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Aug 28, 2019 via email

…n ocamlopt.opt and ocamlopt.byte after switching to force-safe-string by default
@kit-ty-kate
Copy link
Copy Markdown
Member Author

I added a commit disabling the binary diff test for the few tests that are actually different with force-safe-string enabled. If you prefer to have this commit as a standalone PR, tell me and I'll do it. If someone is not convinced and would prefer one of the other solutions to the problem as discussed in #8853, I can remove this commit and we can discuss it further.

Copy link
Copy Markdown
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move for this to be in 4.10 (we can look at making --disable-force-safe-string an error in 4.11/4.12)

It's running through precheck

@shindere
Copy link
Copy Markdown
Contributor

@kit-ty-kate, regarding your commit
51e2074:

So, are we sure this can't be fixed? It would be good to document in
each test that requires the comparison to be disables why this is
so, and not just by saying "because the programs are different"
but rather by giving hints about why they are different.

FWIW, it's possible to embed OCaml comments in ocamltest blocks.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 10, 2019

@shindere - it should be fixable by altering ocamlc to share string constants in the same way as clambda does (it's noted in #8853), but I don't think it's necessary to hold up this PR waiting for that to be resolved. @mshinwell and I checked carefully that those 3 failing tests differ only because a marshalled data structure has a slightly different physical representation when force-safe-string is enabled.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 10, 2019 via email

@kit-ty-kate
Copy link
Copy Markdown
Member Author

Then I'd say the comment could (should) refer either to #8853, or to the comment which explains this.

Done. I've added a comment referring to #8853

@shindere shindere merged commit c01dbc1 into ocaml:trunk Sep 10, 2019
@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 10, 2019 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 10, 2019

I'm still worried that the move to use -force-safe-string now is too early. I would have preferred to wait more, for example after 4.06 hits Debian stable -- so that everyone lives in the safe-string world.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Sep 10, 2019 via email

@kit-ty-kate
Copy link
Copy Markdown
Member Author

I'm still worried that the move to use -force-safe-string now is too early. I would have preferred to wait more, for example after 4.06 hits Debian stable -- so that everyone lives in the safe-string world.

Unsafe-strings have been deprecated for the past 5 years. Debian hasn't even upgraded ocaml above 4.05.0 in testing or even unstable. I don't think sticking with debian on this one is going to be any good if we want to go forward, the same way of thinking could apply to any of OCaml's breaking features in any versions since 4.05. Furthermore, we're not disabling unsafe-string completely, it would be still possible to have force-safe-string disabled for people wishing to have a bit more time to make the necessary adjustments.

@dra27
Copy link
Copy Markdown
Member

dra27 commented Sep 10, 2019

By the time 4.10 lands, Debian unstable should be at, or nearly at, 4.08 so they’ll be in the safe string world - and are unlikely to package 4.10 itself for a while (based on current packaging)

dra27 pushed a commit that referenced this pull request Nov 14, 2019
…tring is on (#9117)

Error in #1859 corrected with two follow-on effects:

- lib-unix cloexec test needed updating for force-safe-string
- Windows had clearly never been tested with force-safe-string!
dra27 added a commit that referenced this pull request Nov 17, 2019
Fix error in patch to configure.ac from #1859, with follow-on corrections
@tassig
Copy link
Copy Markdown

tassig commented Jul 29, 2020

For your information, we do not plan on moving our (large) code bases to safe strings at the moment. In my opinion the removal of the compiler option -unsafe-string is premature and a bad decision.

Let me comment on the rationales given for this PR:

Safe strings have been available since 4.02 (4 years ago)
Safe strings have been the default since 4.06 (7 months ago, see #1252)
=> orphaned code will not be fixed, regardless of time span, so this is not relevant

Very few opam packages still use unsafe strings: I'm only aware of haxe.3.4.7 and camlpdf.2.2.1, maybe some others still use it under the table but very few of them remain.
=> opam is a small thing in the ocaml world, "important" software can also be proprietary code in an enterprise context, for example

Deleting code is always good, it simplifies maintenance.
=> this is generally wrong in the case of a legacy or un-maintained feature, by definition. Going a bit off-topic here, this is also wrong in case of bug-free code. For instance, when is the last time you had to "maintain" the List ocaml module? This is bug free code, and has been since caml light and will never need to be maintained. Does the removal of -unsafe-string add a maintenance burden? Please provide a rationale for that.

As far as i can see, the rationales given for this PR are not valid at this stage, and this PR should not be merged.

@XVilka
Copy link
Copy Markdown
Contributor

XVilka commented Jul 29, 2020

@tassig it was already merged and in the latest OCaml already.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Jul 29, 2020

For your information, we do not plan on moving our (large) code bases to safe strings at the moment. In my opinion the removal of the compiler option -unsafe-string is premature and a bad decision.

Note that if you need unsafe strings, you can configure with --disable-force-safe-string. This is not enough for your needs?

As far as i can see, the rationales given for this PR are not valid at this stage, and this PR should not be merged.

My own personal understanding is that the migration towards safe strings is actively pushed for two main reasons:

  • mutable strings are considered to have been a mistake from a design perspective
  • having immutable strings opens the door to some interesting possibilities (different runtime representations, ropes, hash consing, etc)

@dra27
Copy link
Copy Markdown
Member

dra27 commented Jul 29, 2020

having immutable strings opens the door to some interesting possibilities (different runtime representations, ropes, hash consing, etc)

flambda already takes (a small) advantage of immutability in force-safe-string mode

@avsm
Copy link
Copy Markdown
Member

avsm commented Jul 29, 2020

@tassig wrote:

For your information, we do not plan on moving our (large) code bases to safe strings at the moment

You are taking information from the open source project, but not giving much back. What is your timescale for moving to safe strings? A year, a decade, or no firm plans? Are you missing some tooling to help you migrate, or is there some other blocker? Do you remain on older versions of the compiler (in which case you can stick with safe strings for as long as you want), or do you want to keep up with newer OCaml while using older features?

The more information you provide about your constraints, the more practical it is for us to try and help you. As far as I can tell, @nojb's suggestion about using --disable-force-safe-string should solve your immediate concern.

@gadmm
Copy link
Copy Markdown
Contributor

gadmm commented Aug 10, 2020

@tassig I am a researcher interested in backwards-compatibility concerns in programming language evolution, and I would be like to hear more about your problem. You are not the first to try to make such points in the OCaml community, but these feedback from professionals have yet to be summarised as lessons that the hobbyist/academic developer can lean on. Can you please contact me at guillaume.munch-maccagnoni@inria.fr ? I could not find your contact information on your profile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.