Skip to content

Added JSON output to ocamldep#9538

Open
muskangarg21 wants to merge 3 commits intoocaml:trunkfrom
muskangarg21:JSON
Open

Added JSON output to ocamldep#9538
muskangarg21 wants to merge 3 commits intoocaml:trunkfrom
muskangarg21:JSON

Conversation

@muskangarg21
Copy link
Copy Markdown
Contributor

Closes #7560
an example of the output format is:

[{"source": "stdlib/arg.ml",
  "depends_on": ["Array", "Buffer", "List", "Printf", "String", "Sys"]
 },
 {"source": "stdlib/array.ml", "depends_on": ["Seq"]},
 {"source": "stdlib/format.ml",
  "depends_on":
     ["Buffer",
      "CamlinternalFormat",
      "CamlinternalFormatBasics",
      "Int",
      "List",
      "Queue",
      "Stack",
      "Stdlib",
      "String"
     ]
 },
 {"source": "stdlib/fun.mli", "depends_on": []},
 {"source": "stdlib/printf.mli", "depends_on": ["Buffer"]}
]

gasche
gasche previously approved these changes May 6, 2020
Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a useful feature, thanks for the contribution. I made minor code comments but I think the general shape is good.

On the other hand, I wonder about the interface: there is a flag -json-modules that does the same thing as the -modules option, but with a JSON rather than textual output. If tomorrow people want to get a JSON output corresponding to the non-modules output, presumably they would add a different flag (-json-file-dependencies?). In general the idea is to have (if implemented) a -json-... flag for each different mode of printing. This is the per-output approach.
(Or is the idea that the only JSON-structured output we could possibly want is the -modules one, because the other one is purely make-specific and adds no useful information? I am not sure about this.)

Another approach that I would naively have expected is a -json flag that would just say: whatever you were planning to do textually, do it in JSON. (So -json -modules is equivalent to your -json-modules, and just -json is equivalent to my -json-file-dependencies above or something). This is the global approach.

From a user perspective I find the global approach simpler. But then we have to emit an error at runtime if the JSON output is not supported for a particular mode; in particular, there is no easy way looking at available options to guess which output modes support the JSON output, while with the per-output approach this is clear.

open Parsetree
module String = Misc.Stdlib.String

module Json = struct
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this module to utils/misc.ml, because the logic is not generic to makedepend and could be useful in other places eventually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche could you please help me with this,
I am always getting the warnings like
Error (warning 60): unused module Json.
Error (warning 32): unused value print. ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muskangarg21 you probably need to export the interface of the Json module in misc.mli. If you don't do this, then the module is not accessible from the outside and the code cannot be used, which would explain the warnings you see.

Changes Outdated
Use KEEP_TEST_DIR_ON_SUCCESS=1 to keep all artifacts.
(Gabriel Scherer, review by Sébastien Hinderer)

- #7560: Arguement -json-modules added to ocamldep, using this the output
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Arguement => Argument

| '\b' ->
Format.fprintf ppf "\\%c" 'b'
| '\x00' .. '\x1F' | '\x7F' as c ->
Format.fprintf ppf "\\u%04X" (Char.code c)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need someone knowledgeable with JSON escaping rules to review this part.

Copy link
Copy Markdown
Contributor

@dbuenzli dbuenzli May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking U+007F does not need to be escaped (ref), but it doesn't seems a bad idea to do it. The function looks fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

| `List of t list
]

let comma ppf () = Format.fprintf ppf ",@ "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a local function of print, because it is not really useful outside (we wouldn't access Json.comma from the outside). You can use

let rec print ppf =
  let comma ... in 
  function
  | `String s -> ...
  | ...

Clflags.parse_arguments file_dependencies usage;
Compenv.readenv ppf Before_link;
if !sort_files then sort_files_by_dependencies !files
else if !print_json then begin
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the print_json conditional was close to the place in the code where we decide (or not) to use the -modules output. I believe this is within the print_file_dependencies function.

Copy link
Copy Markdown
Contributor Author

@muskangarg21 muskangarg21 May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche
would it be possible to achieve it without changing the current code structure of else block ?
List.iter print_file_dependencies (List.sort compare !files);
the List.iter is turning out to be a problem for Json printing, so does the code I pushed most recently look good ?

@gasche gasche dismissed their stale review May 6, 2020 20:51

approval was by mistake, I only meant to comment for now

@muskangarg21 muskangarg21 changed the title Added JSON output to ocamldep [WIP]:Added JSON output to ocamldep May 6, 2020
@muskangarg21
Copy link
Copy Markdown
Contributor Author

I believe this is a useful feature, thanks for the contribution. I made minor code comments but I think the general shape is good.

Thank you for your valuable review, I have a few points about why we only require -json-modules flag rather than a -json add-on (in ocamldep).

(Or is the idea that the only JSON-structured output we could possibly want is the -modules one, because the other one is purely make-specific and adds no useful information? I am not sure about this.)

Yes, the idea is to have a json output only for -modules as the remaining flags such as :
-bytecode, -native, -shared as you mentioned are all make specific so no point in printing a JSON output for them.
whereas the flags like -sort is not really of the form Key: Value itself (it is just a list of modules)
similarly out of all the other flags in ocamldep only -modules makes sense to have a JSON output.

Another approach that I would naively have expected is a -json flag that would just say: whatever you were planning to do textually, do it in JSON. (So -json -modules is equivalent to your -json-modules, and just -json is equivalent to my -json-file-dependencies above or something). This is the global approach.

From a user perspective I find the global approach simpler. But then we have to emit an error at runtime if the JSON output is not supported for a particular mode; in particular, there is no easy way looking at available options to guess which output modes support the JSON output, while with the per-output approach this is clear.

I feel that given a user wants to have a json output for a flag they will have to see the man page anyway so per-output approach seems better from programming perspective as per-output way would imply the new changes do not affect the old work and remain independent.
@gasche @Octachron

@gasche
Copy link
Copy Markdown
Member

gasche commented May 13, 2020

After thinking about this more, I am not so convinced by the local-vs-global argument. I think it relies on some decisions on what are the "good" vs. "less good" parts of the interface, and I think the two discussions could be separate.
Today the "makefile" output contains more information than the "module" output, it says where the modules are (when they are not in the current directory, but in one of the include paths (-I) (see the .depend file at the root of the compiler distribution for example), and it has (ad-hoc) logic to deal with file extensions.

In addition, I have the impression that when you start working on the compiler, we may rather go for a "global" -json setting than a per-output setting (-dparsetree-json, etc.). It's worth thinking about this just a bit (maybe my impression is wrong), but then if we know we will use a global approach for the compiler, maybe using the same approach for ocamldep would make sense.

With the current state of my understanding, my recommendation would therefore be as follows: use a "global" -json flag for ocamldep, but supporting only the output that you have already implemented. (In the other case, if someone asks for the -json output, you fail with an error message.)

@muskangarg21
Copy link
Copy Markdown
Contributor Author

muskangarg21 commented May 18, 2020

@gasche
I agree with this approach of globalisation of the -json flag.
What error message should we throw when -json is called with flags that are incompatible?
something like Json_Printing_Error ?

@muskangarg21 muskangarg21 force-pushed the JSON branch 2 times, most recently from a4ab640 to 25c89c1 Compare May 20, 2020 09:06
@muskangarg21
Copy link
Copy Markdown
Contributor Author

@gasche @Octachron please review

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay; I did another round of review. The new state looks good, but I still have some changes to propose.

Changes Outdated
(Xavier Leroy, report by Jérémie Dimino, review by David Allsopp)

- #9552: restore ocamloptp build and installation
(Florian Angeletti, review by David Allsopp and Xavier Leroy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry probably does not belong to the patchset.

end else
Location.error "this output is not supported in json format"
|> Location.print_report Format.err_formatter;
Error_occurred.set ();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most errors in this file use a worse approach which is to just use Format.fprintf Format.err_formatter. In theory your approach is better (it prints things in a way that is more consistent with the rest of the compiler tools), but:

  • the main advantage would be to print a properly located error, but in this case there is no location to give (other than the filename)
  • now the file mixes two different ways to report errors, for no excellent reason

My preference would go for having, first, a small refactoring commit that defines a report_errorf helper function (this is slightly tricky to write if we want to allow format-string arguments, so let me suggest some code below), and uses it for all errors in the existing code. Then you could use this function here, and be consistent with the rest fo the code. I don't have strong opinions on whether this function should use fprintf directly or Location.error -- probably Location.error is better.

let report_errorf fmt =
  Error_occurred.set ();
  Location.errorf fmt

" Print module dependencies in raw form (not suitable for make)";
"-json", Arg.Set print_json,
" Print module dependencies in JSON format when used with \
-modules flag.";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "when used with -modules flag" I would write "only supported with -modules for now".

utils/misc.ml Outdated
Format.fprintf ppf "\\u%04X" (Char.code c)
| c -> Format.fprintf ppf "%c" c

let escape_string ppf s = String.iter (escape_char ppf) s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that an approach that escapes a whole string at once (using a Buffer.t to build the intermediate result), and then formats it as a string, would be much more efficient (I would guess at least 50x). I am not sure whether this matters or not. Probably not, as long as the JSON payloads remain rather small, but what if we later wanted to produce a JSON version of the -dtypedtree output for example, which can get relatively large? I'm a bit nervous about this, and would suggest trying to write a more efficient version now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the following benchmark code:

module FormatVersion = struct
  let format_char ppf chr =
    match chr with
    | ('\"' | '\\') as c ->
      Format.fprintf ppf "\\%c" c
    | '\n' ->
      Format.fprintf ppf "\\%c" 'n'
    | '\t' ->
      Format.fprintf ppf "\\%c" 't'
    | '\r' ->
      Format.fprintf ppf "\\%c" 'r'
    | '\b' ->
      Format.fprintf ppf "\\%c" 'b'
    | '\x00' .. '\x1F' | '\x7F' as c ->
      Format.fprintf ppf "\\u%04X" (Char.code c)
    | c -> Format.fprintf ppf "%c" c
             
  let format_string ppf s = String.iter (format_char ppf) s
end

module BufferVersion = struct
  let escape_string str =
    let buf = Buffer.create (String.length str * 5 / 4) in
    for i = 0 to String.length str - 1 do
      match str.[i] with
      | '\\' -> Buffer.add_string buf {|\\|}
      | '\"' -> Buffer.add_string buf {|\"|}
      | '\n' -> Buffer.add_string buf {|\n|}
      | '\t' -> Buffer.add_string buf {|\t|}
      | '\r' -> Buffer.add_string buf {|\r|}
      | '\b' -> Buffer.add_string buf {|\b|}
      | '\x00' .. '\x1F' | '\x7F' as c ->
        Printf.bprintf buf "\\u%04X" (Char.code c)
      | c -> Buffer.add_char buf c
    done;
    Buffer.contents buf

  let format_string ppf s =
    Format.fprintf ppf "%s" (escape_string s)
end

module BufferVersionOpt = struct
  let escape_string str =
    let buf = Buffer.create (String.length str * 5 / 4) in
    let first_unescaped = ref 0 in
    let no_escape i =
      if i < !first_unescaped then
        first_unescaped := i
    in
    let escape i =
      let first = !first_unescaped in
      if i < first then ()
      else begin
        Buffer.add_substring buf str first (i - first); (* i is excluded *)
        first_unescaped := max_int;
      end
    in
    for i = 0 to String.length str - 1 do
      match str.[i] with
      | '\\' -> escape i; Buffer.add_string buf {|\\|}
      | '\"' -> escape i; Buffer.add_string buf {|\"|}
      | '\n' -> escape i; Buffer.add_string buf {|\n|}
      | '\t' -> escape i; Buffer.add_string buf {|\t|}
      | '\r' -> escape i; Buffer.add_string buf {|\r|}
      | '\b' -> escape i; Buffer.add_string buf {|\b|}
      | '\x00' .. '\x1F' | '\x7F' as c ->
        escape i;
        Printf.bprintf buf "\\u%04X" (Char.code c)
      | c -> no_escape i
    done;
    escape (String.length str);
    Buffer.contents buf

  let format_string ppf s =
    Format.fprintf ppf "%s" (escape_string s)
end

module FormatVersionOpt = struct
  let format_string ppf str =
    let first_unescaped = ref 0 in
    let no_escape i =
      if i < !first_unescaped then
        first_unescaped := i
    in
    let escape i =
      let first = !first_unescaped in
      if i < first then ()
      else begin
        Format.pp_print_string ppf
          (String.sub str first (i - first)); (* i is excluded *)
        first_unescaped := max_int;
      end
    in
    for i = 0 to String.length str - 1 do
      match str.[i] with
      | '\\' -> escape i; Format.pp_print_string ppf {|\\|}
      | '\"' -> escape i; Format.pp_print_string ppf {|\"|}
      | '\n' -> escape i; Format.pp_print_string ppf {|\n|}
      | '\t' -> escape i; Format.pp_print_string ppf {|\t|}
      | '\r' -> escape i; Format.pp_print_string ppf {|\r|}
      | '\b' -> escape i; Format.pp_print_string ppf {|\b|}
      | '\x00' .. '\x1F' | '\x7F' as c ->
        escape i;
        Format.fprintf ppf "\\u%04X" (Char.code c)
      | c -> no_escape i
    done;
    escape (String.length str)
end

let test_ok format_string =
  let ppf = Format.str_formatter in
  let input, expected_output =
    "01'2\\34\"\n5\t6\r\b7\x1A8 9",
   {|01'2\\34\"\n5\t6\r\b7\u001A8 9|} in
  format_string ppf input;
  let result = Format.flush_str_formatter () in
  result = expected_output
  || (Printf.eprintf "\"%s\"%!" result; false)

let small_input = {|
  let format_char ppf chr =
    match chr with
    | ('\"' | '\\') as c ->
      Format.fprintf ppf "\\%c" c
    | '\n' ->
      Format.fprintf ppf "\\%c" 'n'
    | '\t' ->
      Format.fprintf ppf "\\%c" 't'
    | '\r' ->
      Format.fprintf ppf "\\%c" 'r'
    | '\b' ->
      Format.fprintf ppf "\\%c" 'b'
    | '\x00' .. '\x1F' | '\x7F' as c ->
      Format.fprintf ppf "\\u%04X" (Char.code c)
    | c -> Format.fprintf ppf "%c" c
             
  let format_string ppf s = String.iter (format_char ppf) s
|}

let large_input = String.concat "\n\n" (List.init 100 (fun _ -> small_input))

let test_perf format_string =
  let null = open_out Filename.null in
  Fun.protect ~finally:(fun () -> close_out null) @@ fun () ->
  let ppf = Format.formatter_of_out_channel null in
  for i = 1 to 1_000 do
    format_string ppf large_input
  done

let versions = [
  "Format", FormatVersion.format_string;
  "Buffer", BufferVersion.format_string;
  "BufferOpt", BufferVersionOpt.format_string;
  "FormatOpt", FormatVersionOpt.format_string;
]

let () =
  versions |> List.iter @@ fun (name, f) ->
  Printf.printf "testing %s: %!" name;
  Printf.printf "%s\n%!"
    (if test_ok f then "ok" else "fail")

let () =
  let usage_error () =
    prerr_endline
      "command-line argument expected: 'Format' or 'Buffer' or 'BufferOpt'.";
    exit 2 in
  match Sys.argv.(1) with
  | exception _ -> usage_error ()
  | v ->
  match List.assoc v versions with
  | exception Not_found -> usage_error ()
  | f ->
  test_perf f

On my machine, performance results are as follows:

$ ocamlopt -o escape_json escape_json.ml && time ./escape_json Format
[...] real	0m5.775s

$ ocamlopt -o escape_json escape_json.ml && time ./escape_json Buffer
[...] real	0m0.360s

$ ocamlopt -o escape_json escape_json.ml && time ./escape_json BufferOpt
[...] real	0m0.338s

$ ocamlopt -o escape_json escape_json.ml && time ./escape_json FormatOpt
[...] real	0m0.900s

Taking the Buffer version as a sweet spot (more readable than BufferOpt and almost as efficient), it gives a 16x performance improvement over the char-by-char Format version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth it here, but having a fast path for strings not needing any escaping (the vast majority of cases) would also give a nice speedup.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine, the speed-up in the fast-path case is 3x (for the Buffer implementation). That said, I expect that many of the large strings the compiler prints contain \n, so the fast path would be rare there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely: on a large fast-path input, Buffer is at 0.30s, BufferOpt at 0.20s, and BufferFastPath at 0.10s. So maybe BufferOpt makes more sense than I thought. I would still use the simple and readable Buffer version.

@muskangarg21 muskangarg21 force-pushed the JSON branch 3 times, most recently from ae4e729 to 6c5f70e Compare June 24, 2020 21:06
@muskangarg21
Copy link
Copy Markdown
Contributor Author

@gasche
I have updated the code as per your suggestions. Please review.

@muskangarg21 muskangarg21 changed the title [WIP]:Added JSON output to ocamldep Added JSON output to ocamldep Jun 26, 2020
@damiendoligez
Copy link
Copy Markdown
Member

ping @gasche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support export of software dependencies from OCaml modules in the format “JSON”

5 participants