Conversation
gasche
left a comment
There was a problem hiding this comment.
I believe this is a useful feature, thanks for the contribution. I made minor code comments but I think the general shape is good.
On the other hand, I wonder about the interface: there is a flag -json-modules that does the same thing as the -modules option, but with a JSON rather than textual output. If tomorrow people want to get a JSON output corresponding to the non-modules output, presumably they would add a different flag (-json-file-dependencies?). In general the idea is to have (if implemented) a -json-... flag for each different mode of printing. This is the per-output approach.
(Or is the idea that the only JSON-structured output we could possibly want is the -modules one, because the other one is purely make-specific and adds no useful information? I am not sure about this.)
Another approach that I would naively have expected is a -json flag that would just say: whatever you were planning to do textually, do it in JSON. (So -json -modules is equivalent to your -json-modules, and just -json is equivalent to my -json-file-dependencies above or something). This is the global approach.
From a user perspective I find the global approach simpler. But then we have to emit an error at runtime if the JSON output is not supported for a particular mode; in particular, there is no easy way looking at available options to guess which output modes support the JSON output, while with the per-output approach this is clear.
driver/makedepend.ml
Outdated
| open Parsetree | ||
| module String = Misc.Stdlib.String | ||
|
|
||
| module Json = struct |
There was a problem hiding this comment.
I would move this module to utils/misc.ml, because the logic is not generic to makedepend and could be useful in other places eventually.
There was a problem hiding this comment.
@gasche could you please help me with this,
I am always getting the warnings like
Error (warning 60): unused module Json.
Error (warning 32): unused value print. ...
There was a problem hiding this comment.
@muskangarg21 you probably need to export the interface of the Json module in misc.mli. If you don't do this, then the module is not accessible from the outside and the code cannot be used, which would explain the warnings you see.
Changes
Outdated
| Use KEEP_TEST_DIR_ON_SUCCESS=1 to keep all artifacts. | ||
| (Gabriel Scherer, review by Sébastien Hinderer) | ||
|
|
||
| - #7560: Arguement -json-modules added to ocamldep, using this the output |
driver/makedepend.ml
Outdated
| | '\b' -> | ||
| Format.fprintf ppf "\\%c" 'b' | ||
| | '\x00' .. '\x1F' | '\x7F' as c -> | ||
| Format.fprintf ppf "\\u%04X" (Char.code c) |
There was a problem hiding this comment.
We would need someone knowledgeable with JSON escaping rules to review this part.
There was a problem hiding this comment.
Strictly speaking U+007F does not need to be escaped (ref), but it doesn't seems a bad idea to do it. The function looks fine.
driver/makedepend.ml
Outdated
| | `List of t list | ||
| ] | ||
|
|
||
| let comma ppf () = Format.fprintf ppf ",@ " |
There was a problem hiding this comment.
This could be a local function of print, because it is not really useful outside (we wouldn't access Json.comma from the outside). You can use
let rec print ppf =
let comma ... in
function
| `String s -> ...
| ...
driver/makedepend.ml
Outdated
| Clflags.parse_arguments file_dependencies usage; | ||
| Compenv.readenv ppf Before_link; | ||
| if !sort_files then sort_files_by_dependencies !files | ||
| else if !print_json then begin |
There was a problem hiding this comment.
I would prefer if the print_json conditional was close to the place in the code where we decide (or not) to use the -modules output. I believe this is within the print_file_dependencies function.
There was a problem hiding this comment.
@gasche
would it be possible to achieve it without changing the current code structure of else block ?
List.iter print_file_dependencies (List.sort compare !files);
the List.iter is turning out to be a problem for Json printing, so does the code I pushed most recently look good ?
approval was by mistake, I only meant to comment for now
Thank you for your valuable review, I have a few points about why we only require -json-modules flag rather than a -json add-on (in ocamldep).
Yes, the idea is to have a json output only for
I feel that given a user wants to have a json output for a flag they will have to see the man page anyway so per-output approach seems better from programming perspective as |
|
After thinking about this more, I am not so convinced by the local-vs-global argument. I think it relies on some decisions on what are the "good" vs. "less good" parts of the interface, and I think the two discussions could be separate. In addition, I have the impression that when you start working on the compiler, we may rather go for a "global" With the current state of my understanding, my recommendation would therefore be as follows: use a "global" |
|
@gasche |
a4ab640 to
25c89c1
Compare
|
@gasche @Octachron please review |
gasche
left a comment
There was a problem hiding this comment.
Sorry for the delay; I did another round of review. The new state looks good, but I still have some changes to propose.
Changes
Outdated
| (Xavier Leroy, report by Jérémie Dimino, review by David Allsopp) | ||
|
|
||
| - #9552: restore ocamloptp build and installation | ||
| (Florian Angeletti, review by David Allsopp and Xavier Leroy) |
There was a problem hiding this comment.
This changelog entry probably does not belong to the patchset.
driver/makedepend.ml
Outdated
| end else | ||
| Location.error "this output is not supported in json format" | ||
| |> Location.print_report Format.err_formatter; | ||
| Error_occurred.set (); |
There was a problem hiding this comment.
Most errors in this file use a worse approach which is to just use Format.fprintf Format.err_formatter. In theory your approach is better (it prints things in a way that is more consistent with the rest of the compiler tools), but:
- the main advantage would be to print a properly located error, but in this case there is no location to give (other than the filename)
- now the file mixes two different ways to report errors, for no excellent reason
My preference would go for having, first, a small refactoring commit that defines a report_errorf helper function (this is slightly tricky to write if we want to allow format-string arguments, so let me suggest some code below), and uses it for all errors in the existing code. Then you could use this function here, and be consistent with the rest fo the code. I don't have strong opinions on whether this function should use fprintf directly or Location.error -- probably Location.error is better.
let report_errorf fmt =
Error_occurred.set ();
Location.errorf fmt
driver/makedepend.ml
Outdated
| " Print module dependencies in raw form (not suitable for make)"; | ||
| "-json", Arg.Set print_json, | ||
| " Print module dependencies in JSON format when used with \ | ||
| -modules flag."; |
There was a problem hiding this comment.
Rather than "when used with -modules flag" I would write "only supported with -modules for now".
utils/misc.ml
Outdated
| Format.fprintf ppf "\\u%04X" (Char.code c) | ||
| | c -> Format.fprintf ppf "%c" c | ||
|
|
||
| let escape_string ppf s = String.iter (escape_char ppf) s |
There was a problem hiding this comment.
I suspect that an approach that escapes a whole string at once (using a Buffer.t to build the intermediate result), and then formats it as a string, would be much more efficient (I would guess at least 50x). I am not sure whether this matters or not. Probably not, as long as the JSON payloads remain rather small, but what if we later wanted to produce a JSON version of the -dtypedtree output for example, which can get relatively large? I'm a bit nervous about this, and would suggest trying to write a more efficient version now.
There was a problem hiding this comment.
I wrote the following benchmark code:
module FormatVersion = struct
let format_char ppf chr =
match chr with
| ('\"' | '\\') as c ->
Format.fprintf ppf "\\%c" c
| '\n' ->
Format.fprintf ppf "\\%c" 'n'
| '\t' ->
Format.fprintf ppf "\\%c" 't'
| '\r' ->
Format.fprintf ppf "\\%c" 'r'
| '\b' ->
Format.fprintf ppf "\\%c" 'b'
| '\x00' .. '\x1F' | '\x7F' as c ->
Format.fprintf ppf "\\u%04X" (Char.code c)
| c -> Format.fprintf ppf "%c" c
let format_string ppf s = String.iter (format_char ppf) s
end
module BufferVersion = struct
let escape_string str =
let buf = Buffer.create (String.length str * 5 / 4) in
for i = 0 to String.length str - 1 do
match str.[i] with
| '\\' -> Buffer.add_string buf {|\\|}
| '\"' -> Buffer.add_string buf {|\"|}
| '\n' -> Buffer.add_string buf {|\n|}
| '\t' -> Buffer.add_string buf {|\t|}
| '\r' -> Buffer.add_string buf {|\r|}
| '\b' -> Buffer.add_string buf {|\b|}
| '\x00' .. '\x1F' | '\x7F' as c ->
Printf.bprintf buf "\\u%04X" (Char.code c)
| c -> Buffer.add_char buf c
done;
Buffer.contents buf
let format_string ppf s =
Format.fprintf ppf "%s" (escape_string s)
end
module BufferVersionOpt = struct
let escape_string str =
let buf = Buffer.create (String.length str * 5 / 4) in
let first_unescaped = ref 0 in
let no_escape i =
if i < !first_unescaped then
first_unescaped := i
in
let escape i =
let first = !first_unescaped in
if i < first then ()
else begin
Buffer.add_substring buf str first (i - first); (* i is excluded *)
first_unescaped := max_int;
end
in
for i = 0 to String.length str - 1 do
match str.[i] with
| '\\' -> escape i; Buffer.add_string buf {|\\|}
| '\"' -> escape i; Buffer.add_string buf {|\"|}
| '\n' -> escape i; Buffer.add_string buf {|\n|}
| '\t' -> escape i; Buffer.add_string buf {|\t|}
| '\r' -> escape i; Buffer.add_string buf {|\r|}
| '\b' -> escape i; Buffer.add_string buf {|\b|}
| '\x00' .. '\x1F' | '\x7F' as c ->
escape i;
Printf.bprintf buf "\\u%04X" (Char.code c)
| c -> no_escape i
done;
escape (String.length str);
Buffer.contents buf
let format_string ppf s =
Format.fprintf ppf "%s" (escape_string s)
end
module FormatVersionOpt = struct
let format_string ppf str =
let first_unescaped = ref 0 in
let no_escape i =
if i < !first_unescaped then
first_unescaped := i
in
let escape i =
let first = !first_unescaped in
if i < first then ()
else begin
Format.pp_print_string ppf
(String.sub str first (i - first)); (* i is excluded *)
first_unescaped := max_int;
end
in
for i = 0 to String.length str - 1 do
match str.[i] with
| '\\' -> escape i; Format.pp_print_string ppf {|\\|}
| '\"' -> escape i; Format.pp_print_string ppf {|\"|}
| '\n' -> escape i; Format.pp_print_string ppf {|\n|}
| '\t' -> escape i; Format.pp_print_string ppf {|\t|}
| '\r' -> escape i; Format.pp_print_string ppf {|\r|}
| '\b' -> escape i; Format.pp_print_string ppf {|\b|}
| '\x00' .. '\x1F' | '\x7F' as c ->
escape i;
Format.fprintf ppf "\\u%04X" (Char.code c)
| c -> no_escape i
done;
escape (String.length str)
end
let test_ok format_string =
let ppf = Format.str_formatter in
let input, expected_output =
"01'2\\34\"\n5\t6\r\b7\x1A8 9",
{|01'2\\34\"\n5\t6\r\b7\u001A8 9|} in
format_string ppf input;
let result = Format.flush_str_formatter () in
result = expected_output
|| (Printf.eprintf "\"%s\"%!" result; false)
let small_input = {|
let format_char ppf chr =
match chr with
| ('\"' | '\\') as c ->
Format.fprintf ppf "\\%c" c
| '\n' ->
Format.fprintf ppf "\\%c" 'n'
| '\t' ->
Format.fprintf ppf "\\%c" 't'
| '\r' ->
Format.fprintf ppf "\\%c" 'r'
| '\b' ->
Format.fprintf ppf "\\%c" 'b'
| '\x00' .. '\x1F' | '\x7F' as c ->
Format.fprintf ppf "\\u%04X" (Char.code c)
| c -> Format.fprintf ppf "%c" c
let format_string ppf s = String.iter (format_char ppf) s
|}
let large_input = String.concat "\n\n" (List.init 100 (fun _ -> small_input))
let test_perf format_string =
let null = open_out Filename.null in
Fun.protect ~finally:(fun () -> close_out null) @@ fun () ->
let ppf = Format.formatter_of_out_channel null in
for i = 1 to 1_000 do
format_string ppf large_input
done
let versions = [
"Format", FormatVersion.format_string;
"Buffer", BufferVersion.format_string;
"BufferOpt", BufferVersionOpt.format_string;
"FormatOpt", FormatVersionOpt.format_string;
]
let () =
versions |> List.iter @@ fun (name, f) ->
Printf.printf "testing %s: %!" name;
Printf.printf "%s\n%!"
(if test_ok f then "ok" else "fail")
let () =
let usage_error () =
prerr_endline
"command-line argument expected: 'Format' or 'Buffer' or 'BufferOpt'.";
exit 2 in
match Sys.argv.(1) with
| exception _ -> usage_error ()
| v ->
match List.assoc v versions with
| exception Not_found -> usage_error ()
| f ->
test_perf fOn my machine, performance results are as follows:
$ ocamlopt -o escape_json escape_json.ml && time ./escape_json Format
[...] real 0m5.775s
$ ocamlopt -o escape_json escape_json.ml && time ./escape_json Buffer
[...] real 0m0.360s
$ ocamlopt -o escape_json escape_json.ml && time ./escape_json BufferOpt
[...] real 0m0.338s
$ ocamlopt -o escape_json escape_json.ml && time ./escape_json FormatOpt
[...] real 0m0.900s
Taking the Buffer version as a sweet spot (more readable than BufferOpt and almost as efficient), it gives a 16x performance improvement over the char-by-char Format version.
There was a problem hiding this comment.
Probably not worth it here, but having a fast path for strings not needing any escaping (the vast majority of cases) would also give a nice speedup.
There was a problem hiding this comment.
On my machine, the speed-up in the fast-path case is 3x (for the Buffer implementation). That said, I expect that many of the large strings the compiler prints contain \n, so the fast path would be rare there.
There was a problem hiding this comment.
More precisely: on a large fast-path input, Buffer is at 0.30s, BufferOpt at 0.20s, and BufferFastPath at 0.10s. So maybe BufferOpt makes more sense than I thought. I would still use the simple and readable Buffer version.
ae4e729 to
6c5f70e
Compare
|
@gasche |
|
ping @gasche |
Closes #7560
an example of the output format is:
[{"source": "stdlib/arg.ml", "depends_on": ["Array", "Buffer", "List", "Printf", "String", "Sys"] }, {"source": "stdlib/array.ml", "depends_on": ["Seq"]}, {"source": "stdlib/format.ml", "depends_on": ["Buffer", "CamlinternalFormat", "CamlinternalFormatBasics", "Int", "List", "Queue", "Stack", "Stdlib", "String" ] }, {"source": "stdlib/fun.mli", "depends_on": []}, {"source": "stdlib/printf.mli", "depends_on": ["Buffer"]} ]