Skip to content

A document type for error messages#13169

Merged
gasche merged 7 commits intoocaml:trunkfrom
Octachron:format_doc_for_error_messages
May 30, 2024
Merged

A document type for error messages#13169
gasche merged 7 commits intoocaml:trunkfrom
Octachron:format_doc_for_error_messages

Conversation

@Octachron
Copy link
Copy Markdown
Member

Currently when the compiler needs to build a fragment of an error message while playing nice with Format, the best option is to construct a Format.formatter -> unit closure. This construction is used in many part of the compilers.
However, it has proven to be quite brittle due to the simultaneous uses of many global states when printing error messages, see #13099 for a recent example (which was triggered by a printing closure run in a different environment that the one it was built in).

This PR proposes to remove this brittleness by a introducing a document type which represents a list of formatting instructions

type element =
  | Text of string
  | Open_box of { kind: box_type ; indent:int }
  | Close_box
  | Open_tag of Format.stag
  | Close_tag
  |  ...

to be sent to an actual formatting engine at a later point. Since this document type contains only data and no closures (except for an escape hatch), we are guaranteed that printing this type does not create side-effects without losing any of the formatting abilities of Format. In particular, this document type can be serialized while preserving all the break hints, formatting boxes and tags (which semantics could then be reimplemented by developer tools).

In term of implementation, this PR adds a new Format_doc module which provides:

  • format string interpreters that output this new document type
  • printing functions that mirrors the printing functions in the Format module
  • a conversion function from Format_doc printer to Format printer
  • a "deprecated" conversion function in the other direction which uses a temporary escape hatches
  • A submodule for building document in an immutable way

As a consequence the Format_doc module is source-compatible with the Format module, and for most of the compiler modules replacing open Format by an open Format_doc is enough to switch to the new kinds of printers.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and overall quite reasonable. I asked a few questions. I don't understand the very fine details of the last commit, which I think is the most subtle one. (The rest I believe are correct.)

The fact that the testsuite passes mostly unchanged is a very good test for the PR, so I am confident in its overall correctness.

| Break of { fits : string * int * string as 'a; breaks : 'a }
| Flush of { newline:bool }
| Newline
| If_newline
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised by If_newline which is an odd construct (very imperative). @Octachron tells me that it is not actually used in the compiler so we could get rid of it if we wanted -- same with With_size which is also weird. I don't know if we should rather follow the oddities of Format to the letter, or remove the stuff we don't like and don't need.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC If_newline predates the form of Break here, and I wouldn't be surprised if If_newline is not really needed.

If you ever want to include static unicode text/symbols in diagnostics, then I expect With_size could be useful. If the unicode you want to emit is known ahead of time, the complicated question of computing its width can be worked around by specifying the size. Format itself works well in this way.


end

(** {1 Compatibility API} *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the functions that are not so nice and mostly for compatibility in a Compat submodule. But I am okay with the current approach as well, the functions in fact tend to make sense already.

The universal variable "'a" would escape its scope
The method "f" has type "'a -> int", but the expected method type was
"'a0. 'a0 -> int"
The universal variable "'a0" would escape its scope
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: I liked seeing 'a and ´b better than 'a and 'a0 (personally I am fine with any order of occurrence, but I like the variables to be easy to distinguish). But maybe in some other examples reusing the same radical helps readability?

@Octachron Octachron force-pushed the format_doc_for_error_messages branch from 750d606 to 2032886 Compare May 21, 2024 13:42

(** {1 Compatibility API} *)

(** The functions and types below provides source compatibility with format
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening this module adds the document types, fold, etc.. into the environment. What do you think of moving the compatibility API into a Compat submodule ?
Also, highlighting that it's at the same level as the Core API.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One my design goal was to make the replacement of Format by Format_doc as invisible as possible.
Typically, the Core submodule is only exposed because I though it would be a shame to have a simple immutable API available and hide it away. I am not sure I would encourage people to switch to this API.

From that perspective, I think it is better for now to keep the most used features (the compatibility functions) at the toplevel of the module. But I could hide the core definitions inside the Core submodule?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the type definitions and core functions inside the Core submodule, and renamed it to Doc. Like this, an open Format_doc only opens the compatibility API and the Doc submodule.

| Flush {newline=false} -> Format.pp_print_flush ppf ()
| Newline -> Format.pp_force_newline ppf ()
| If_newline -> Format.pp_print_if_newline ppf ()
| With_size _ -> ()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions like pp_print_substring_as, with_size and if_newline are not used. Is offering an alternative API to Format a goal ?
Otherwise, why not keep it smaller to help maintenance ?

You mentioned localization in #12182 (comment). Could it be built on top of the this API ? Making it as small as needed could help adapting it later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first goal was full source compatibility with Format, since this means compiler contributors don't need to know two APIs.

(And if we were to move away from Format compatibility, I would rather adopt Fmt naming convention)

For basic localization, a simple way to add support for compiler variant would be to add hooks for the format string interpreters since we only want to localize static texts.

@Octachron Octachron force-pushed the format_doc_for_error_messages branch 3 times, most recently from c23a81b to b9b1b82 Compare May 24, 2024 15:25
Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some code duplication between utils/format_doc.ml and stdlib/format.ml, for example the printf-interpretation functions (output_acc and friends), and some higher-level printers such as pp_print_text, which decompose into calls into more basic pp_print_* functions.

I wonder if it would be reasonable to reduce this duplication by introducing a functor that builds this higher-level structure over the pp_print_* functions. One instance would be in stdlib/format.ml, the other in utils/format_doc.ml. One pain point is that the functor should be in the stdlib itself, which is slightly annoying -- camlinternalFormatter.ml?

@Octachron
Copy link
Copy Markdown
Member Author

There certainly some part that could be factorized using a Format-like functor builder, but it doesn't seem that straightforward to find a home for this functor inside the standard library. Moreover, there are also some interesting design choices to discuss in term of functor complexity versus flexibility.

Thus, I would rather report the standard library bikeshedding to another PR.

@Octachron Octachron force-pushed the format_doc_for_error_messages branch from 3fc2f25 to 7a1e00b Compare May 27, 2024 14:15
Octachron added 7 commits May 29, 2024 14:39
This ensures that only the compatibility API is available after an
`open Format_doc` used to replace an `open Format`.
and let the compatibility function at the top of the Location modules
to improve backward compatibility for compilerlibs users.

* add a `Doc.quoted_filename` function too.
@Octachron Octachron force-pushed the format_doc_for_error_messages branch from 7a1e00b to 07128d4 Compare May 29, 2024 14:56
@Octachron
Copy link
Copy Markdown
Member Author

I have updated the PR after some noise clean-up, I plan to merge it tomorrow.

@gasche gasche merged commit 1b09b92 into ocaml:trunk May 30, 2024
punchagan added a commit to ocaml-bench/base that referenced this pull request Jun 4, 2024
ocaml/ocaml#13169 added a document type for error messages and this
change makes base compatible with OCaml trunk.
punchagan added a commit to punchagan/sandmark that referenced this pull request Jun 5, 2024
Due to a [recent change] on `ocaml/ocaml` `trunk`, `base.v0.16.3` fails
to [install]. `base` is a build dependency of `owl`, which only has
sequential benchmarks (no parallel ones). So, the failure happens only
for sequential benchmarks. Parallel ones run alright.

Closes ocaml-bench#470

[recent change]: ocaml/ocaml#13169
[install]: ocaml-bench/sandmark-nightly@197c9bd#diff-c635461b488c1e7d7b7feaf5fcb01945d54f84c512cf4db7e2dc390a2a23c5e8R456-R476
punchagan added a commit to ocaml-bench/sandmark that referenced this pull request Jun 6, 2024
Due to a [recent change] on `ocaml/ocaml` `trunk`, `base.v0.16.3` fails
to [install]. `base` is a build dependency of `owl`, which only has
sequential benchmarks (no parallel ones). So, the failure happens only
for sequential benchmarks. Parallel ones run alright.

Closes #470

[recent change]: ocaml/ocaml#13169
[install]: ocaml-bench/sandmark-nightly@197c9bd#diff-c635461b488c1e7d7b7feaf5fcb01945d54f84c512cf4db7e2dc390a2a23c5e8R456-R476
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 16, 2024

This PR breaks the build of Coq, which contains the following code:

https://github.com/coq/coq/blob/729af0da7a829d08223f54f6249235c888275c80/topbin/coqtop_byte_bin.ml#L11-L17

The problem is that Symtable.report_error has changed type from formatter -> error -> unit to an incompatible type t Format_doc.printer.

@Octachron, do you have an idea of how the user code could be fixed, preferably in a backward-compatible way?

@Octachron
Copy link
Copy Markdown
Member Author

Replacing Format.asprintf by Format_doc.asprintf should be enough. A backward compatible fix would be to select either Format or Format_doc in function of the OCaml compiler library version.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 16, 2024

Okay, I went along and wrote a commit that adds conditional-compilation support to Coq for this one function, to make it work correctly both before and after 5.3 (I haven't submitted it as a PR for reasons that are clarified below):

rocq-prover/rocq@master...gasche:coq:compat-symtable-error-handler

And then I realized that there are probably more projects affected by this sort of breakage.

For example if I run a github search with language:OCaml "report_error", on the first page out of 5 I see two distinct projects that use report_error functions coming from the compiler codebase:

https://github.com/BinaryAnalysisPlatform/bap/blob/95e81738c440fbc928a627e4b5ab3cccfded66e2/src/baptop.ml#L7-L10

https://github.com/rescript-lang/rescript-compiler/blob/7cdb66d5ad09f4ea49af6f3a6a8aa0c08adefab7/jscomp/ml/env.ml#L753-L756

I haven't gone through the rest of the list, I'm sure there are other affected projects.

I am in favor of document types for error messages, but this is too must breakage. Could we rework this to keep the report_error functions with their previous type for backwards-compatibility, and introduces error_doc variants with the new incompatible type?

@Octachron
Copy link
Copy Markdown
Member Author

Rescript is a divergent fork of OCaml 4.06 which is not affected by the issue.
The BAP case could be fixed by using Location.errorf to be backward compatible with all OCaml version.
The only other example in the 5 page github search seems to be ocamleditor.

Thus we have two and half breakages? We could keep the deprecated version, and I would probably submit a PR do so this week; however it is unclear to me if it would not be faster to fix the handful of broken packages.

@jmid
Copy link
Copy Markdown
Member

jmid commented Aug 20, 2025

FTR, I encountered refl.4.1.0 breaking on 5.3.0 due to this change to Location.raise_errorf: ocaml/opam-repository#28364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants