Skip to content

hexdump_pp: Respect indentation of parent boxes#175

Merged
avsm merged 2 commits intomirage:masterfrom
cfcs:align_hexdump_pp
Sep 11, 2017
Merged

hexdump_pp: Respect indentation of parent boxes#175
avsm merged 2 commits intomirage:masterfrom
cfcs:align_hexdump_pp

Conversation

@cfcs
Copy link
Copy Markdown

@cfcs cfcs commented Sep 9, 2017

This wraps hexdump_pp output in a box, making it respect the parent box' indentation:

utop # Format.printf "yolo: %a" Cstruct.hexdump_pp (Cstruct.create 100) ;;
yolo: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                                           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                                           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                                           00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
      00 00 00 00 - : unit = ()

the old behavior was to output:

utop # Format.printf "yolo: %a" Cstruct.hexdump_pp (Cstruct.create 100) ;;
yolo: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 - : unit = ()

I experimented with getting rid of the if i mod 16 = ... block by using Format.pp_set_margin, but ran into this (bug?): https://gist.github.com/cfcs/b39f0fa96a96235e41ff382eca8c3b86

Edit: It also changes the behavior when the lines are too small to print 16 octets to this (I couldn't see an obvious way to determine how many chars are available in the current line to print the same amount of octets on all lines, but I feel this is more readable in any case):

utop # Format.printf "yolo: %a" Cstruct.hexdump_pp (Cstruct.create 100) ;;
yolo: 00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00
      00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00              
      00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 
      00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 
      00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 
      00 00 00 00 00 00 00 00 00 00
      00 00 00 00 00 00 
      00 00 00 00 - : unit = ()

The old behavior was:

utop # Format.printf "yolo: %a" Cstruct.hexdump_pp (Cstruct.create 100) ;;
yolo: 00 00 00 00 00 00 00 00 00 0
0 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 0
0 00 00 00
00 00 00 00 00 00 00 00 00 00 00 0
0 00 00 00
00 00 00 00 00 00 00 00 00 00 00 0
0 00 00 00
00 00 00 00 00 00 00 00 00 00 00 0
0 00 00 00
00 00 00 00 00 00 00 00 00 00 00 0
0 00 00 00
00 00 00 00 - : unit = ()

ping @avsm

@hannesm
Copy link
Copy Markdown
Member

hannesm commented Sep 9, 2017

I appreciate this enhancement of the pretty printer!

@cfcs
Copy link
Copy Markdown
Author

cfcs commented Sep 9, 2017

@hannesm suggested inserting whitespace after every 8 octets also, so a 16-octet line would become

00 01 02 03 04 05 06 07   08 09 0A 0B 0C 0D 0E 0F

thus making it easier to spot "byte number 9."

I'm happy to add that, but would like to hear the opinion of the maintainer(s) before introducing that in this PR. Thoughts?

@avsm
Copy link
Copy Markdown
Member

avsm commented Sep 10, 2017

The extra white space also sounds good to me. We could also have a second column that prints the ascii printable characters as hexdump does, to make protocol dumps easier to parse. That doesn't need to block this PR of course.

@samoht
Copy link
Copy Markdown
Member

samoht commented Sep 10, 2017

For more complete formating, ocaml-hex might be of interested well:

# Hex.(hexdump (of_cstruct (Cstruct.create 100)));;
00000000: 0000 0000 0000 0000 0000 0000 0000 0000
00000001: 0000 0000 0000 0000 0000 0000 0000 0000
00000002: 0000 0000 0000 0000 0000 0000 0000 0000
00000003: 0000 0000 0000 0000 0000 0000 0000 0000
00000004: 0000 0000 0000 0000 0000 0000 0000 0000
00000005: 0000 0000 0000 0000 0000 0000 0000 0000
00000006: 0000 0000

I have no strong opinion on where this function could go, but it would be nice to not duplicate it too much :-)

@cfcs
Copy link
Copy Markdown
Author

cfcs commented Sep 10, 2017

  • @avsm I'll add the whitespace in the middle later today.

  • As for the ASCII, I feel like that should either be ?(ascii=false) (to not change outputs too much around) or a hexdump_ascii_pp function.

  • @samoht I think it's a good idea to add an index column, but I think that the Hex.hexdump one is a bit confusing. It seems to print the line number (of however many lines it decided to break it into) rather than the offset into the Cstruct.t. Consider hexdump -C which makes it easy to find the byte at offset 49 - go down to 000..40, second octet after the middle break (66):

$ echo {A..z} | hexdump -C
00000000  41 20 42 20 43 20 44 20  45 20 46 20 47 20 48 20  |A B C D E F G H |
00000010  49 20 4a 20 4b 20 4c 20  4d 20 4e 20 4f 20 50 20  |I J K L M N O P |
00000020  51 20 52 20 53 20 54 20  55 20 56 20 57 20 58 20  |Q R S T U V W X |
00000030  59 20 5a 20 5b 20 20 5d  20 5e 20 5f 20 60 20 61  |Y Z [  ] ^ _ ` a|
00000040  20 62 20 63 20 64 20 65  20 66 20 67 20 68 20 69  | b c d e f g h i|
00000050  20 6a 20 6b 20 6c 20 6d  20 6e 20 6f 20 70 20 71  | j k l m n o p q|
00000060  20 72 20 73 20 74 20 75  20 76 20 77 20 78 20 79  | r s t u v w x y|
00000070  20 7a 0a                                          | z.|
00000073

I don't see an obvious way to do the index and have it look nice in the face of line-wrapping since the Format API is a bit limited. I think you can do it by updating a ref using a callback from ?print_if_linebreak? when a linebreak has been inserted, but maybe that's a bit dirty and should be a separate PR.

The functions currently look like this:

val Cstruct.hexdump_pp : Format.formatter -> t -> unit
val Cstruct.hexdump : t -> unit

I'd propose changing it to something like (happy to change the names and/or interface):

val Cstruct.hexdump_pp : ?print_offset:bool -> Format.formatter -> t -> unit
(** Pretty-print octets in hex, optionally prefixed by the hexadecimal offset for each line *)

val Cstruct.hexdump_ascii_pp : ?print_offset:bool -> Format.formatter -> t -> unit
(** Pretty-print both hex octets and printable characters, like hexdump -C *)

val Cstruct.hexdump : ?print_offset:bool -> ?print_ascii:bool -> t -> unit

Suggestions welcome!

@hannesm
Copy link
Copy Markdown
Member

hannesm commented Sep 10, 2017

@cfcs I like optional arguments more than a different function -- ?(print_ascii = false) and ?(print_index = false)

@samoht I also feel bad that there's code duplication between hex and cstruct -- but since hex already depends on cstruct, maybe it should use the pretty printer from here (according to http://docs.mirage.io/hex/Hex/index.html#val-hexdump there are two printers, one to stdout, one to a string -- but no pretty printer).

maybe hex should even be integrated into Cstruct? I commonly have (esp. for testcases) local definitions for of_hex : string -> Cstruct.t option.

@cfcs
Copy link
Copy Markdown
Author

cfcs commented Sep 10, 2017

@hannesm I'm fine with optional arguments, I'll write a proposal for the ASCII stuff and update this PR.

@hannesm @samoht : Re: Possible additions (slightly off-topic):

(* find char [c] in [b], starting at [offset] and giving up after [max_offset] *)
let index_opt b ?(max_offset) ?(offset=0) c : int option =

(* find substring [needle] in [b] *)
let find b ?(max_offset) ?(offset=0) needle =

let split_by_char c ?offset ?max_offset buf : (t*t, 'error) result =
  • more cstruct.something modules for serialization of common libs that do not depend on Cstruct, like my Ptime functions
  • exception-free result functions would also be a very nice addition, perhaps in a separate submodule?

@samoht
Copy link
Copy Markdown
Member

samoht commented Sep 11, 2017

It's clearly becoming out of scope of this PR, but yea, ocaml-hex should probably re-use Cstruct.hexdump_pp + optional arguments (e.g. I'm fine to put the main hex cstruct converter in ocaml-cstruct).

@cfcs
Copy link
Copy Markdown
Author

cfcs commented Sep 11, 2017

Ok, I think I bit of more than I can chew... After spending 16-20 hours on trying to understand format.ml, and running into lots of bugs and undocumented behavior, I don't feel very optimistic re: me being able to add these things with Format. Let's just get this merged and wait for someone to fix the Format library upstream.
ping @avsm

@hannesm
Copy link
Copy Markdown
Member

hannesm commented Sep 11, 2017

then let's merge this now, or is anyone against this change (I'm too shy to press merge buttons)!?

@avsm avsm merged commit c53f0e2 into mirage:master Sep 11, 2017
@avsm
Copy link
Copy Markdown
Member

avsm commented Sep 11, 2017

I feel your pain about understanding Format @cfcs :-) Thanks very much for this change. I've got some fuzzing changes queued up for a PR and can then cut a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants