Add a chapter for the instrumented runtime by abbysmal · Pull Request #9541 · ocaml/ocaml

abbysmal · 2020-05-07T09:30:10Z

Hello,

Following #9082 , here is a draft for the instrumented runtime's manual page.
I think it contains all the essential information for using the instrumented runtime.

Some details maybe needs discussion:

The babeltrace section mentions and links to various external tooling, I'm unsure what is the preferred path for this kind of content. Since the library acts as the reference implementation for the format we are using, I believe mentioning it is useful. (and it bringing nice extra features, like an easy Jupyter integration.)
The eventlog-tools (https://github.com/ocaml-multicore/eventlog-tools) section is also an external tool, albeit one we have control over. The documentation on it is pretty scarce in the manual page, as I believe the tool should document itself. It is for now hosted in the ocaml-multicore organization, maybe there is a better location for it, @avsm seems to think it could be moved somewhere else.

After merging #9082 some discussions were brought up on the naming of a few options in the user facing API for the instrumented runtime. (@shindere @dra27)
I didn't ping back there, waiting for the manual page to be done, to make a better assessment of how understandable the API is, given proper documentation.
Maybe this can be better discussed as well now.

Thanks,

gasche

Thanks! I made a first round of review.

I think that it is perfectly fine to talk about third-party tools that are important to the chapter.

One thing that I found a bit lacking is a sort of "run-through example" where you show a (simple) complete session: compiling and running a program with the instrumented runtime, running the program, and then "using" (visualizing) the result. (The ASCII version is easy, but maybe it would also be possible to have a Screenshot of the Chrome viewer?)

gasche · 2020-05-17T20:38:32Z

manual/manual/cmds/instrumented-runtime.etex

+can generate {\em trace files} that can then be read
+and analyzed by users in order to understand specific runtime behaviors.
+
+{\em Common Trace Format} is the general purpose binary


I would write "The {\em Common Trace Format} (CTF)", and generally use the same grammar for the expanded version as for the common noun "format". On the other hand, the abbreviation CTF itself can be used as a proper noun ("the problem of CTF", etc.).

gasche · 2020-05-17T20:44:18Z

manual/manual/cmds/instrumented-runtime.etex

+  in the program being traced.
+\end{itemize}
+
+For more information on {\em Common Trace Format}, see the website at


the Common Trace Format

gasche · 2020-05-17T20:50:15Z

manual/manual/cmds/instrumented-runtime.etex

+
+This filename can also be specified using the
+{\tt OCAML\_EVENTLOG\_FILE} environment variable.
+The given path will be suffixed with {\tt .pid.eventlog}.


Do I correctly understand that by default I would get the file caml-<pid>-eventlog, but if I use OCAML_EVENTLOG_FILE I would get caml.<pid>.eventlog, with dots instead of dashes? This is strange.

Above you use {pid} instead of just pid to make it explicit that this a metavariable (for a concrete pid) in the examples. Maybe it would be nice to use it concistently, eg. here {\tt .\{pid\}.eventlog}?

Indeed the final filename by default is -eventlog, and not .eventlog, there is however no strict reason for this. (the tooling makes no assumptions on the filenames.)
If further adjustments are deemed necessary to the interface (following the _FILE/_PREFIX conversation), we can change this behavior as well.

gasche · 2020-05-17T20:50:52Z

manual/manual/cmds/instrumented-runtime.etex

+The given path will be suffixed with {\tt .pid.eventlog}.
+
+\begin{verbatim}
+        OCAML_EVENTLOG_FILE=/tmp/some/prefix OCAML_EVENTLOG_ENABLED=1 ./program


I think there already was a discussion of this but it is slightly weird to name _FILE a variable that only set the prefix of the file. I think that OCAML_EVENTLOG_NAME (or _PREFIX) would get less confusing; ideally it could also be useful to be able to set a file for sure, for example for scripting usage (when the file is a freshly generated temporary file, so no conflict is to be expected), maybe OCAML_EVENTLOG_FILE or OCAML_EVENTLOG_FILE_FULL? (Does your tooling make assumptions about the naming scheme?)

I agree that the name could be changed to _PREFIX, however my approach on allowing the user to explicitly set a full filename is a conservative one, considering in future designs (when executing on OCaml Multicore, with multiple domains), there likely won't be one, but many traces generated.
With this usecase in mind, I would rather avoid implementing a feature that may be obsoleted in the future. I considered the "pid" to provide a good enough indicator for scripting purpose.

I'm okay with _PREFIX and no way to set a specific file. Maybe it could be useful to document the reason for this scheme, so that in particular script authors know that they should look for $OCAML_EVENTLOG_FILE_PREFIX.*.eventlog, and not assume a single file, if they want to be robust to future changes.

Sure, how should we proceed with the _PREFIX change? Should I open another PR to trunk?
I will document as well the reason for not being able to produce a specific filename.
I will as well change the default naming scheme to caml-{pid}.eventlog to make everything more consistent.

gasche · 2020-05-17T20:55:27Z

manual/manual/cmds/instrumented-runtime.etex

+\end{verbatim}
+
+In this example, the trace will be available at path
+{\tt /tmp/some/prefix.pid.eventlog}.


gasche · 2020-05-17T21:03:20Z

manual/manual/cmds/instrumented-runtime.etex

+For a program with an extended running time where the collection of only a
+small sample of events is required, using the {\em eventlog_resume} and
+{\em eventlog_pause} primitives may help relieve some of the
+tracing induced performance impact.


So what would typically be the tracing you observed when eventlog is paused?

If eventlog is paused during a whole run, the tracefile would contain no traces, merely just the Common Trace Format header.

When you debug interactive/long running program it's not always evident to do so. I think it would have been nice to provide a standard way to control the collection from the outside, modulo a call to a single function, something like:

Gc.install_eventlog_control : ?signum:int -> unit -> unit

that would install a signal handler so that you can killall -SIGUSR1 myprogram collection.

Also there's no way to programmatically flush ?

Thank you for your input,

There's no way to programmatically flush, besides relying on the fact that Gc.eventlog_pause do force a flush. I think this could be useful indeed.
I think interfacing eventlog controls through signals could be interesting (and signals are indeed the preferred interface for such things), and we could consider it for the next iteration on the instrumented runtime.
(although I think the design may be a bit less straightforward with tracing domains in multicore, ie. is the sighandler installed for every domain, or on only the one executing the code? Can this interface be kept consistent between now and future releases?).

The instrumented runtime having been merged already, I am sorry to have enabled such discussions so late in the process by making the documentation after the fact.
I do not know if additions at this stage are possible (or maybe they could be considered minor tweaks.), however all of this is noted and can be addressed in future iterations of the instrumented runtime.
(as domains will introduce required changes to the instrumentation facility, and provide a complete baseline, feature wise, on which it will be easier to reason in terms of expose-able features that will not need to be revisited later.)

My understanding is that we don't really know yet how signals will work in Multicore. I would recommend never having only signals as a way to change things; there should be a programmatic control to use instead of signals for each feature. (One could also think of exporting a control socket, possibly decided through an environment variable, but maybe we can wait for the user need to arise before implementing this.)

The instrumented runtime having been merged already, I am sorry to have enabled such discussions so late in the process by making the documentation after the fact.

As a strong believer in documentation, I support this message :-)

I do not know if additions at this stage are possible (or maybe they could be considered minor tweaks.)

Yes, given that this instrumented runtime is completely new, there are no users whose workflow would break yet, so I think you should feel free to propose further changes for now, until the Release Candidate process has started (at this point we want to minimize changes to the release branch).

(cc our release gods, @Octachron @damiendoligez)

however all of this is noted and can be addressed in future iterations of the instrumented runtime.

Sure these comments were not meant to be addressed right now. API wise the only real thing I feel is missing is a way to programmatically flush (e.g. because you know appropriate points in your program where it's convenient to do that).

The signal idea (which may indeed not be a good one in the context of multicore) was just me trying to figure out a quick and dirty generic way to be able control instrumentation from the outside.

gasche · 2020-05-17T21:03:30Z

manual/manual/cmds/instrumented-runtime.etex

+{\em eventlog_pause} primitives may help relieve some of the
+tracing induced performance impact.
+
+\section{s:instr-runtime-read}{Read traces}


Reading traces?

gasche · 2020-05-17T21:07:33Z

manual/manual/cmds/instrumented-runtime.etex

+      {https://github.com/ocaml-multicore/eventlog-tools}.
+\else
+{\tt https://github.com/ocaml-multicore/eventlog-tools}
+\fi


In addition to pointing at the page for more documentation (which is fine), I would like to see two representative examples of usage of the tools; one to print an histogram (why not show the actual output on an example and say a word about what we see?), and also the command sequence to convert and visualize the trace in chromium. Basically I would expect people reading this manual page to be able to do basic tracing of their OCaml program by just reusing examples shown in the documentation.

This sounds good, I will provide such examples shortly.

I have one extra question regarding this particular request:
Should the inclusion of such examples be done in the manual page, or it could be done in the eventlog-tools repository itself ? (for the eventlog-tools related demonstrations.)
As for babeltrace, I think it is fine to include a demonstration as well as a small code snippet showing basic decoding of an OCaml trace, for processing. (as is our usecase in the multicore team, using Jupyter.)

Should the inclusion of such examples be done in the manual page, or it could be done in the eventlog-tools repository itself ?

I think it is fine if it is in the eventlog-tools repository, and arguably even better: if you change the tool (and remember to update the documentation), all manual pages that refer to the right repository will get the updated documentation.

I think the documentation review process is mostly useful to ensure that the documentation exists. We have more leeway on where precisely the documentation is, as long as you are actually forced to write it :-)

gasche · 2020-05-17T21:08:07Z

manual/manual/cmds/instrumented-runtime.etex

+copied in the same directory as the generated trace file.
+However, {\em babeltrace} expects the file to be named
+{\tt metadata} in order to process the trace.
+Thus, it will need to be renamed when copied to the trace's directory.


Again, showing a concrete example of that (the shell sequence that does a copy and then invokes babeltrace to do somthing) would be useful.

Noted for both, I will provide concrete examples, thanks.

gasche · 2020-05-17T21:10:57Z

manual/manual/cmds/instrumented-runtime.etex

+
+\begin{verbatim}
+       ocamlopt -runtime-variant i program.ml -o program
+\end{verbatim}


Is there a simple example somewhere of how to configure the runtime variant in a Dune project? (If so, could you link to it as well?)

If we use the bytecode compiler, is it possible to choose the runtime at ocamlrun invocation time rather than at compilation time? This could make it easier to experiment with the instrumented runtime without having to tweak the build system.

Configuring the executable to use the instrumented runtime with dune is as simple as using a (flag "-runtime-variant=i") parameter.
As for ocamlrun, there is now an ocamlruni executable available that is the instrumented runtime.
I can add mentions for both in this section

as a side note, for the dune support question, an issue was opened here: ocaml/dune#3500

Thanks! For Dune I believe it would be nice to document whatever way currently works. (Most prospective users of the feature are not calling ocamlfind directly, they want to know how to do it through Dune). If the tool gains new feature to make this even easier, we can edit the manual.

gasche · 2020-05-18T07:51:09Z

Note: some of the aspects I mentioned as missing (for example, screenshots) are in fact present in the first message of the Eventlog PR #9082. I think it would be nice to review this first message and ensure that all the information it contained (that is relevant for users) is present in the documentation.

gasche · 2020-05-26T13:07:58Z

@Engil thanks a lot for your further work on this PR. For now I haven't had a full look at the result again, please let me know when you think I should do it.

abbysmal · 2020-05-26T13:19:24Z

@gasche I need to improve a few things still before it is reviewable, I will report back here once I believe I addressed most of your suggestions and comments, thank you for your review.
I will do my best to make it ready soon.

abbysmal · 2020-05-29T11:46:38Z

@gasche
I think the PR is ready for further review, sorry about the delay.
Note that some of my recent changes include changing the option names to suggestions made earlier.
(namely, the default output path being caml-{pid}.eventlog, as well as changing OCAML_EVENTLOG_FILE with OCAML_EVENTLOG_PREFIX.)
This change is however still not upstreamed.
I have a branch making this change to the runtime, should I open a PR for it?
The branch is here: https://github.com/Engil/ocaml/tree/instrumented_runtime_consistent_options

I also included a simple (as simple as I could get it) decoder example using babeltrace.

Have a great day,

shindere · 2020-06-02T15:08:56Z

Engil (2020/05/29 04:46 -0700):

I think the PR is ready for further review, sorry about the delay. Note that some of my recent changes include changing the option names to suggestions made earlier. (namely, the default output path being `caml-{pid}.eventlog`, as well as changing `OCAML_EVENTLOG_FILE` with `OCAML_EVENTLOG_PREFIX`.) This change is however still not upstreamed. I have a branch making this change to the runtime, should I open a PR for it? The branch is here: https://github.com/Engil/ocaml/tree/instrumented_runtime_consistent_options

IMO, the code and documentation should be kept in sync as much as possible. It was not originally the case because there was no man page at the beginning, but I think if now the manpage documents something, then the bits that make the code behave as the document specifies should be kept as close as possible to hte documentation itself. In other words, I'd be in favour of including your code-updating branch in this PR.

dbuenzli

I look forward to use all that @Engil. Except after reading this nice piece of documentation I still don't get what kind of insights/statistics the system can give on my programs :-) You need to make that point clear in the introduction of the document.

dbuenzli · 2020-06-04T08:40:21Z

manual/manual/cmds/instrumented-runtime.etex

+%HEVEA\cutname{instrumented-runtime.html}
+
+This chapter describes the OCaml instrumented runtime, a runtime variant
+allowing the collection of events and metrics.


At that point I want to know what these events and metrics are to understand if that's what I'm looking for or should stop reading.

I concur that the introduction should state which "insights into a program's execution" are provided. Are we measuring garbage collection events? Access to out-of-heap pointers? Cache misses?... At the end of the paragraph, the reader should have a basic idea on how the instrumented runtime can help them.

Indeed, I just pushed a revision to address this issue.
The introductory paragraph now covers the kind of information you can find with the instrumented runtime

dbuenzli · 2020-06-04T08:44:13Z

manual/manual/cmds/instrumented-runtime.etex

+
+\end{verbatim}
+
+Compiling the program and linking it with the instrumented runtime,


I think it should be made more clear that the instrumented runtime is only a "link time entity". That is no special flags are needed during the compilation phase of the objects that comprises your final program.

dbuenzli · 2020-06-04T08:47:50Z

manual/manual/cmds/instrumented-runtime.etex

+\end{verbatim}
+
+The program will execute and generate a trace file in the
+current directory.


current working directory and I guess it's the current working directory of the program.

dbuenzli · 2020-06-04T08:51:13Z

manual/manual/cmds/instrumented-runtime.etex

+         (flags "-runtime-variant=i"))
+\end{verbatim}
+
+The instrumented runtime can as well be used with OCaml's bytecode interpreter.


as well -> also
with OCaml's -> with the OCaml

But IANANES.

dbuenzli · 2020-06-04T08:54:18Z

manual/manual/cmds/instrumented-runtime.etex

+\end{verbatim}
+
+The instrumented runtime can as well be used with OCaml's bytecode interpreter.
+This can be done by either compiling the program using the


This can be done by adding the flag {\tt -runtime-variant=i} when linking the program with {\tt ocamlc}

dbuenzli · 2020-06-04T09:14:09Z

manual/manual/cmds/instrumented-runtime.etex

+
+\begin{verbatim}
+        OCAML_EVENTLOG_ENABLED=p ./program
+\end{verbatim}


Ah @gasche also got confused, it's still not clear (see above).

dbuenzli · 2020-06-04T09:15:01Z

manual/manual/cmds/instrumented-runtime.etex

+        OCAML_EVENTLOG_ENABLED=p ./program
+\end{verbatim}
+
+The program will have to resume events collection explicitly.


It never started so resume is misleading. Maybe use unpause or even start here.

dbuenzli · 2020-06-04T09:16:18Z

manual/manual/cmds/instrumented-runtime.etex

+\end{verbatim}
+
+The program will have to resume events collection explicitly.
+This can be done via {\tt Gc.eventlog_resume} and


This can (only ?) be done from the program.

dbuenzli · 2020-06-04T09:21:34Z

manual/manual/cmds/instrumented-runtime.etex

+\end{verbatim}
+
+The resulting trace contains only one event payload, namely a {\em flush} event,
+indicating how much time was taken to flush the trace file to disk.


This is not a documentation issue but I find it strange that a paused event log that remains so does anything to the program. I would expect the file to be created but for it to be empty. (I can see however that you'd have to detect "was never unpaused in the program" condition that it may not be suitable to do so for perf reasons)

I think it is sane to have at least the CTF header being written, as a part of the file opening process.
The flush event however could be avoided, without trickery, its intended purpose is to time the execution of the eventlog flush function, however at least one eventlog flush happens in any instrumented program: atexit, which forces a buffer flush.
This buffer could very well be empty however (as is the case for a paused program), so this extra flushing information could be skipped, since the flush function will exactly flush nothing. (and not introduce any performance penalty whatsoever)

dbuenzli · 2020-06-04T09:34:00Z

manual/manual/cmds/instrumented-runtime.etex

+For a program with an extended running time where the collection of only a
+small sample of events is required, using the {\em eventlog_resume} and
+{\em eventlog_pause} primitives may help relieve some of the
+tracing induced performance impact.


When you debug interactive/long running program it's not always evident to do so. I think it would have been nice to provide a standard way to control the collection from the outside, modulo a call to a single function, something like:

Gc.install_eventlog_control : ?signum:int -> unit -> unit

that would install a signal handler so that you can killall -SIGUSR1 myprogram collection.

Also there's no way to programmatically flush ?

abbysmal · 2020-06-05T12:19:08Z

Thank you very much for the review.
I pushed a few changes to the manual page.
The diff may be a bit painful to go through, sorry about this, I had to move around big chunks of the manual (I think swapping the "Reading traces" section and "Controls+limitations" suggestion made sense and provide a better reading experience).
I tried to address some formulation issues as well and provide more details where required, I will re-read the thread and see if I missed any comment, please let me know if I forgot anything.

shindere · 2020-06-08T10:29:23Z

Here is what I can see:

"Collected metrics includes" -> include

"For more information on the {\em Common Trace Format}, see the website
at": I'd remove "the website at".

Note that the instrumented runtime, being an alternative runtime for
OCaml programs, is only referenced during the linking stage of the
executable. As such, there is not need to adjust the compilation steps
of intermediate compilation units or libraries.

I would remove the comma after runtime and add an "it" before "is only
referenced" in the first sentence. The second sentence could be
rephrased, perhaps somethink like: "This means that only the linking
stage needs to be adjusted, not the compile stage" (the notion of
intermediary compile units sounds a bit odd to me).

"This script expect to receive": expects

"printing each events in the process":event, singular

"For more information on {\em babeltrace}, see the website at": same
remark as above, "the website at" seems superfluous to me.

The default trace filename is {\tt caml-\{pid\}.eventlog}, where {\tt
\{pid\}} is the process identifier of the traced program.

Perhaps you could write "PID" (uppercase) to make it even more obvious
that it's justa place-holder?

This filename can also be specified using the {\tt
OCAML\_EVENTLOG\_PREFIX} environment variable. The given path will be
suffixed with {\tt \{.pid\}.eventlog}.

Just a thought. If you wanted to give users control over the name of the
generated files, you could have filenames that accept place-holders like
%p for the pid and then users could do what they want.

In this example, the trace will be available at path {\tt
/tmp/a_prefix.\{pid\}.eventlog}.

Here again it feels to me that an uppercase PID would improve
readability.

This restriction is in place to make room for future improvements to the
instrumented runtime, where the single trace file per session design may
replaced.

I think you should at least add a "be" -- "bay be replaced". That's
where the idea mentionned above may become useful: with place-holders
like %p you would be safe and users would keep their ability to have
full control on how files are named.

"Note as well that intermediate directories in the given path will not
be created": parent directories is the right terminology, I thinnk.

"The runtime assume the path is accessible for writing and creating
files": assumes. There may be something to clarify here but I'm not
sure. Basically, what you are saying is that by specifying a prefix,
users also specify a directory (the dirname of that perfix) where other
files may be created, right?

At the beginning of the "Pausing and resuming tracing", there are several
occurrences of "events collection", some where event is plural, others
where it's singular. You may want to change this so that all are the
same.

"indicating how much time was taken to flush the trace file to disk":
suggestion, how much time was spent flishing?

"The resulting trace will contain every events": either "every event" or
"all events", I think I'd go for the second one.

"The instrumented runtime currently put a strong emphasis": puts

Finally, I think when you explain how to build the program to be
instrumented you use -o program but then later you refer to the
produced executable as a.out.

abbysmal · 2020-06-09T13:46:45Z

Hello @shindere ,
Thank you for your time and your review and suggestions. I tried to rephrase for the best following your recommendations, and fixed the various typos.

Just a thought. If you wanted to give users control over the name of the
generated files, you could have filenames that accept place-holders like
%p for the pid and then users could do what they want.

I agree this may provide a nice alternative to the current situation, having substitutions in the prefix path would alleviate a few of the issues, however it still works only when assuming that there may be only one trace generated by the runtime. (which it may not be doing in OCaml Multicore.).

If I understand the idea, it is to leave the option to the user to leave out any modifier (%p in your example.) if desired.

I think having the option to leave out any modifier goes against the precautions taken to ensure that the design of the instrumented runtime is not limited by providing users with options that goes against potential design changes.

If a user leaves out the modifier and wants a specific path, but then we want to output multiple trace files (be it Multicore where we could support one trace file per domain, or a possible inclusion of following forked processes), what would be the correct course of action?

However if we were to allow such modifiers and actually require them, it would give user a bit more control, although the level of control gained in that specific situation does not seems worth the extra handling to me. (but I may be wrong and missing an obvious improvement over the current, static append)

I am very happy to discuss these suggestions, and I will keep them in mind when iterating on the multicore design of eventlog to find the sweet spot that enables a good user experience while keeping an adequate model for it.

shindere · 2020-06-09T15:47:12Z

Two preliminary remarks: 1. You will need to update the Changes entry. 2. I find the "manual page" in the title of the pull request a bit misleading because it makes me expect " "man page", which is of course not what the pull request contains. So perhaps the title of the PR could be edited to refer to, e.g., a chapter on the instrumented runtime added to the OCaml reference manual or something like that? Engil (2020/06/09 06:47 -0700):

Thank you for your time and your review and suggestions.

My pleasure.

I tried to rephrase for the best following your recommendations, and fixed the various typos.

Thanks!

> Just a thought. If you wanted to give users control over the name of the generated files, you could have filenames that accept place-holders like %p for the pid and then users could do what they want. I agree this may provide a nice alternative to the current situation, having substitutions in the prefix path would alleviate a few of the issues, however it still works only when assuming that there may be only one trace generated by the runtime.

Why? Would it make absolutely no sense to have the traces of several domains mixed in the same trace file? If it would make absolutely no sense then yes, you could either request that all formatters are present or, if they are not, make the files as specific as you want by adding numbers or whatever.

If I understand the idea, it is to leave the option to the user to leave out any modifier (`%p` in your example.) if desired.

Actually, what I am looking for is to provide as much flexibility to the user as possible. If you think the formatters are mandatory then fiine, at least with the formatters the user has a control on how the files are named.

I think having the option to leave out any modifier goes against the precautions taken to ensure that the design of the instrumented runtime is not limited by providing users with options that goes against potential design changes.

Well as I said, if you are sure it makes no sense to "mix" traces then yes you can make the modifiers mandatory. To give you an example, take redirection of stdout and stderr. Sometimes one wants to have them separate because what matters is to know what got printed where. Sometimes, on the countrary, it is important that they are _not_ separated, because what really matters is the order in which events occur. So in that case, since both approaches may have their interests, one definitely wants to support both of them. If in your case one approach (mixing traces, say) has absolutely no semantics then it feels right to me that it is not supported. But then, in my opinion, mandatory modifiers are better than no modifiers at all because, although they are mandatory, their availability provides a flexibility that does not exist without them.

If a user leaves out the modifier and wants a specific path, but then we want to output multiple trace files (be it Multicore where we could support one trace file per domain, or a possible inclusion of following forked processes), what would be the correct course of action?

Do the previouos remarks help, here?

However if we were to allow such modifiers and actually require them, it would give user a bit more control, although the level of control gained in that specific situation does not seems worth the extra handling to me. (but I may be wrong and missing an obvious improvement over the current, static append)

Well you can wait until a concrete need arises, of course.

gasche · 2020-06-09T16:02:59Z

It sounds very delicate (and complex) from an implementation perspective to efficiently write from several domains in parallel to a single trace file -- you have to deal with concurrent updates to the file. On the other hand, having each domain produce a trace in a separate file is easy and requires no particular locking mechanism. I suppose this is why each domain would write to a separate file -- I find this reason very convincing.

I think that the discussion on naming templates is somewhat overkill for now. We can have something that is simple and makes sense now (even if it gives up some control over file placement), and iterate later to extend the system with more precise/complex naming features.

Remark: I do like @shindere's proposal to use template variables in filenames. I think it would extend gracefully to the multicore scenario by adding a template parameter for the domain number. But again, this feature can come later, with only a prefix for now.

manual/manual/cmds/instrumented-runtime.etex

avsm · 2020-06-09T16:07:50Z

manual/manual/cmds/instrumented-runtime.etex

+
+The {\em Common Trace Format} is the general purpose binary
+format used to encode traces.
+A complete trace is comprised of:


Suggested change

A complete trace is comprised of:

A complete trace comprises of:

comprises of or
~~comprises~~ consists of

manual/manual/cmds/instrumented-runtime.etex

shindere · 2020-06-10T08:58:30Z

Gabriel Scherer (2020/06/09 09:03 -0700):

It sounds very delicate (and complex) from an implementation perspective to efficiently write from several domains in parallel to a single trace file -- you have to deal with concurrent updates to the file. On the other hand, having each domain produce a trace in a separate file is easy and requires no particular locking mechanism. I suppose this is why each domain would write to a separate file -- I find this reason very convincing. I think that the discussion on naming templates is somewhat overkill for now. We can have something that is simple and makes sense now (even if it gives up some control over file placement), and iterate later to extend the system with more precise/complex naming features. Remark: I do like @shindere's proposal to use template variables in filenames. I think it would extend gracefully to the multicore scenario by adding a template parameter for the domain number. But again, this feature can come later, with only a prefix for now.

Just a quick note that my suggestion for template variables (thanks for the name @gasche) was not necessarily meant to be implemneted now. I was just thinking aloud and putting ideas out of my brains, in a place which is more accessible to others. ;-)

gasche · 2020-06-10T09:13:35Z

Yes.

Still thinking aloud, I think that a forward-looking scheme for template variables would be to have a variable %(trace-id) for a possibly-structured "trace identifier", instead of specific variables for "process identifier" and "domain identifier". This way, if you use OCAML_EVENTLOG_TEMPLATE="myname-%(trace-id).log", today %(trace-id) is instantiated with just the process identifier, but tomorrow you may get <pid>#<domainid> there: the same setting graciously adapts to the runtime change. Script users have to be careful not too assume too much on the trace-id format (it's just a number today but this may change), but they can write robust scripts, yet have (slightly) more control than with OCAML_EVENTLOG_PREFIX.

(_TEMPLATE settings that do not contain "%(trace-id)" in them should be rejected, because doing otherwise would require storing several traces to the same file.)

The current state can be explained as saying that OCAML_EVENTLOG_TEMPLATE is defined as "$(OCAML_EVENTLOG_PREFIX).%(trace-id).eventlog".

The documentation is a bit more precise as it says ".%(pid).eventlog"; maybe we could rephrase slightly to encourage scripts that are robust to a future change of prefix-completion format. Maybe, instead of recommending script authors to match on the PID, we should encourage them to generate a random number and use it as part of the prefix to identify the run with certainty? Or document that the trace-id will always start with the pid as a prefix, followed by a non-number?

abbysmal · 2020-06-10T12:31:22Z

I updated the branch with the latest suggestions, thank you everyone.

I updated the Changes entry, however since there are now changes in the eventlog code itself (option are renamed.), I believe the Changes entry should be split, or changed to fit in the right category?

Would it make absolutely no sense to have the traces of several domains
mixed in the same trace file?

I don't think it technically makes no sense to have the traces of several domains in a single file, however there is a few drawbacks to this:

As @gasche mentions, it may not be the most sensible approach to handling tracing in multicore.
It is possible to have a single thread taking care of flushing duties, with valuable benefits to it, offloading flushing duty away from each individual domains, meaning the flushing overhead is now a satellite concern.
However it also means having a locking mechanism in place to ensure the buffer handling duties are not problematic accross domains and the flushing thread..

One more aspect to this, is that there may be value in having a semantic separation of trace information.
One stream (in CTF speech, what is currently a single file containing our payloads right now) could contain only a specific subsets of information. (think, having specific domain execution information in one trace, and other information in another.)

This become truer if user-defined events are allowed, where it would make sense (and would be the easiest path) to have separate streams (so, trace files) for each.
Specifically, the problem would be related to how user-defined events would be defined, likely meaning a dynamic metadata file generation, or the user providing one. A different metadata file would imply the requirement of separate trace files (as one single trace file can not contains payload for two different metadata files.)

Those are potential approaches to future design iterations that causes me to be cautious with validating too early assumptions about the tracing infrastructure.
Of course most of it is conjecture until domains become a part of the mainline OCaml distribution. (and user generated-events are even more so conjecture, as this problem may be investigated with other means.)
But I tend to strand on the cautious side of things for now.

Just a quick note that my suggestion for template variables (thanks for
the name @gasche) was not necessarily meant to be implemneted now. I was
just thinking aloud and putting ideas out of my brains, in a place which
is more accessible to others. ;-)

I am very happy with discussing improvements to the instrumented runtime and trying to figure out the best possible user experience, thank you for your suggestions again. :)

I think the variable template suggestion is a good idea (and the OCAML_EVENTLOG_TEMPLATE suggestion from @gasche)
Having a single, possibly structured identifier would provide enough room for use cases I can foresee (and mentioned earlier in this post).
Elaborating on the idea, I think a correct way to do it would be to have a UUID (generated during instrumentation bootstrap) of some sort to always prefix trace files. (as in, this would be a specific part of %(trace-id))
This would provide a consistent pattern (following whichever UUID convention) to know a file is a trace, and would provide with even more consistence across potential many trace files session in a potential multicore implementation. (allowing to programmatically regroup tracefiles per sessions in a script.)

gasche · 2020-06-10T12:38:48Z

Regarding the changelog, the two following approaches are completely fine:

Adding the number of the current PR to the entry of the "Eventlog implementation" PR if that works.
Having an entry for the current PR in the "Documentation" section, even if it includes some small code changes.

Please choose whatever you prefer.

shindere · 2020-06-10T15:31:41Z

Engil (2020/06/10 05:31 -0700):

I updated the Changes entry, however since there are now changes in the eventlog code itself (option are renamed.), I believe the Changes entry should be split, or changed to fit in the right category?

Did you consider simply merging the Changes entry of this PR with the one that already exists for the instrumented runtime itself?

> Would it make absolutely no sense to have the traces of several domains mixed in the same trace file? I don't think it technically makes no sense to have the traces of several domains in a single file, however there is a few drawbacks to this: As @gasche mentions, it may not be the most sensible approach to handling tracing in multicore. It is possible to have a single thread taking care of flushing duties, with valuable benefits to it, offloading flushing duty away from each individual domains, meaning the flushing overhead is now a satellite concern. However it also means having a locking mechanism in place to ensure the buffer handling duties are not problematic accross domains and the `flushing` thread.. One more aspect to this, is that there may be value in having a semantic separation of trace information.

Yes, I understand, and thanks a lot for all the explanations. I was jsut suggesting that, if both (separating and not separating) may be sueful then it could be nice to let the user the choice. But only if obth make sense, of course.

abbysmal · 2020-06-15T07:55:31Z

Did you consider simply merging the Changes entry of this PR with the
one that already exists for the instrumented runtime itself?

If that sounds like the best option I can do that, however since it's two different PR I think the current option is fine.

shindere · 2020-06-15T08:34:29Z

Engil (2020/06/15 00:55 -0700):

> Did you consider simply merging the Changes entry of this PR with the one that already exists for the instrumented runtime itself? If that sounds like the best option I can do that, however since it's two different PR I think the current option is fine.

No problem, it was just a suggestion.

abbysmal · 2020-06-29T12:49:15Z

Hello,

Is there anything to improve further in this PR?
On my side I think I addressed previous comments, tell me if I missed any. (and if I'm currently blocking merging this PR, sorry if it is the case!)

Octachron

Reading the new chapter, this seems ready for inclusion. Improvements can always be made later.
Nevertheless, you may want to fix a handful of typos.

I had some more nitpicking comments, but you can ignore those.

Octachron · 2020-07-03T09:30:22Z

manual/manual/cmds/Makefile

  ocamldep.tex profil.tex debugger.tex browser.tex ocamldoc.tex \
  warnings-help.tex ocamlbuild.tex flambda.tex spacetime-chapter.tex \
-  afl-fuzz.tex unified-options.tex
+  afl-fuzz.tex instrumented-runtime.tex unified-options.tex


I would propose to enable caml_example for the instrumented-runtime file (by adding it to the WITH_CAMLEXAMPLE variable below).

Octachron · 2020-07-03T09:35:16Z

manual/manual/cmds/instrumented-runtime.etex

+
+Collected metrics include time spent executing the {\em garbage collector},
+allowing the overall execution time of individual pauses to be measured
+down to the time spent in specific parts of the garbage collection algorithm.


The sentence feels somewhat unbalanced: the essential information is introduced by "allowing" (which is repeated from the line above). Maybe more

Collected metrics include time spent executing the various phases of garbage collection for each individual pause.

Or if we want to keep the two parts structure:

Collected metrics include time spent executing the {\em garbage collector}.
The overall execution time of individual pauses are measured
down to the time spent in specific parts of the garbage collection.

A more personal opinion, I tend to favour latex'\emph{}\ over {\em }`.

Octachron · 2020-07-03T09:36:14Z

manual/manual/cmds/instrumented-runtime.etex

+
+Insight is also given on memory allocation and motion, recording
+the size of allocated memory blocks, as well as value promotions from the
+{\em minor heap} to the {\em major heap}.


For me this seems to be a part of the paragraph above. since those are collected metrics.

Maybe the first comma could be replaced by by to avoid mixing two kind of commas

Insight is also given on memory allocation and motion by recording
the size of allocated memory blocks, as well as value promotions from the
{\em minor heap} to the {\em major heap}.

?

Octachron · 2020-07-03T09:37:10Z

manual/manual/cmds/instrumented-runtime.etex

+\ifouthtml
+\ahref{https://diamon.org/ctf/}{https://diamon.org/ctf/}.
+\else
+{\tt https://diamon.org/ctf/}


you can use \href{https://diamon.org/ctf/} in both backends.

Octachron · 2020-07-03T09:39:03Z

manual/manual/cmds/instrumented-runtime.etex

+         |> clear
+         |> ignore
+
+\end{verbatim}


As suggested above, the verbatim environment could be replaced with caml_example to enable highlighting and code verification

\begin{caml_example*}{verbatim} ... \end{caml_example*}

In this case, you probably want to remove the extra-indentation.

Octachron · 2020-07-03T09:49:03Z

manual/manual/cmds/instrumented-runtime.etex

+         |> clear
+         |> ignore
+
+\end{verbatim}


A caml_example environment should work there too.

Octachron · 2020-07-03T09:55:39Z

manual/manual/cmds/instrumented-runtime.etex

+and analyzed by users in order to understand specific runtime behaviors.
+
+The {\em Common Trace Format} is the general purpose binary
+format used to encode traces.


I have the impression than encode put too much accent on the encoding part rather than the storing part.
An alternative phrasing might be

The generated trace files are stored using the \emph{Common Trace Format), which is a general purpose binary format.

(You can do whatever you want with the above suggestion, including ignoring it).

Octachron · 2020-07-03T10:09:24Z

manual/manual/cmds/instrumented-runtime.etex

+
+\subsection{ss:instr-runtime-tools}{eventlog-tools}
+{\em eventlog-tools} is a set of libraries and tools written in OCaml
+that allow to decode and perform basic format conversions and analysis.


a set ... that allows except if each individual library or tool can perfom "basic format conversions and analysis" by itself.

Octachron · 2020-07-03T10:11:22Z

manual/manual/cmds/instrumented-runtime.etex

+\subsection{ss:instr-runtime-babeltrace}{babeltrace}
+
+{\em babeltrace} is a C library, as well as a Python binding and set of tools
+that serve as the reference implementation for the {\em Common Trace Format}.


Missing comma?

, as well as a Python binding and set of tools, that serves

Octachron · 2020-07-03T10:12:52Z

manual/manual/cmds/instrumented-runtime.etex

+
+The {\em eventlog_metadata} file can be found at this path and
+copied in the same directory as the generated trace file.
+However, {\em babeltrace} expect the file to be named


…_FILE to OCAML_EVENTLOG_PREFIX

abbysmal · 2020-07-08T01:33:58Z

Thank you @Octachron for your review, I addressed the issue your mentioned, I also rebase and squashed a few things as far as the history allows me to practically do it.
I however refrained from changing the various {\em} to emph{} but if this is a requirement I will do it. :)

Octachron · 2020-07-08T10:03:00Z

This looks good to merge for me, except for the is comprised of typo mentioned by @avsm and @dra27 which is still here.
My personal preference for \emph{} is fortunately not a requirement.

abbysmal · 2020-07-08T10:18:09Z

Good catch, I missed it. I just pushed the fix for this typo.

Octachron · 2020-07-08T10:24:32Z

manual/manual/cmds/instrumented-runtime.etex

+and analyzed by users in order to understand specific runtime behaviors.
+
+The generated trace files are stored using the {\em Common Trace Format}, which
+is general purpose binary tracing format.


missing article "a general purpose"

I pushed a fix, thanks.

Add a chapter for the instrumented runtime (cherry picked from commit f879c05)

Octachron · 2020-07-08T16:49:05Z

Merged, and cherry-picked to 4.11 to not separate the instrumented runtime from its documentation.

shindere · 2020-07-08T17:07:04Z

Florian Angeletti (2020/07/08 09:49 -0700):

Merged, and cherry-picked to 4.11 to not separate the instrumented runtime from its documentation.

Ah, good that you did remind about that! Thanks a lot, Florian!

…

gasche reviewed May 17, 2020

View reviewed changes

gasche mentioned this pull request May 18, 2020

[RFC] Eventlog tracing system #9082

Merged

dbuenzli reviewed Jun 4, 2020

View reviewed changes

avsm reviewed Jun 9, 2020

View reviewed changes

manual/manual/cmds/instrumented-runtime.etex Outdated Show resolved Hide resolved

avsm reviewed Jun 9, 2020

View reviewed changes

manual/manual/cmds/instrumented-runtime.etex Outdated Show resolved Hide resolved

abbysmal changed the title ~~Add manual page for the instrumented runtime~~ Add a chapter for the instrumented runtime Jun 10, 2020

Octachron added this to the 4.11 milestone Jun 29, 2020

Octachron approved these changes Jul 3, 2020

View reviewed changes

abbysmal force-pushed the instrumented_runtime_manual branch from 810ffd1 to d5632a3 Compare July 8, 2020 01:10

abbysmal added 2 commits July 8, 2020 03:13

add manual page for the instrumented runtime

f3ee9e3

instrumented runtime: change environment variable from OCAML_EVENTLOG…

f674c27

…_FILE to OCAML_EVENTLOG_PREFIX

abbysmal added 2 commits July 8, 2020 03:13

update Changes entry for the instrumented runtime manual page

cfb62e3

manual: adjustments to the instrumented runtime manual page

9ba979e

abbysmal force-pushed the instrumented_runtime_manual branch from d5632a3 to 9ba979e Compare July 8, 2020 01:23

manual: fix typo in instrumented runtime manual

a8b782b

Octachron reviewed Jul 8, 2020

View reviewed changes

manual: fix a typo in instrumented runtime manpage

4765f24

Octachron merged commit f879c05 into ocaml:trunk Jul 8, 2020

Octachron added a commit that referenced this pull request Jul 8, 2020

Merge pull request #9541 from Engil/instrumented_runtime_manual

936e168

Add a chapter for the instrumented runtime (cherry picked from commit f879c05)


		\end{verbatim}

		Compiling the program and linking it with the instrumented runtime,

	A complete trace is comprised of:
	A complete trace comprises of:

Conversation

abbysmal commented May 7, 2020

Uh oh!

gasche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abbysmal May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasche commented May 18, 2020

Uh oh!

gasche commented May 26, 2020

Uh oh!

abbysmal commented May 26, 2020

Uh oh!

abbysmal commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shindere commented Jun 2, 2020 via email

Uh oh!

dbuenzli left a comment

Choose a reason for hiding this comment

Uh oh!

abbysmal May 22, 2020 •

edited

Loading

abbysmal commented May 29, 2020 •

edited

Loading