Skip to content

Add a chapter for the instrumented runtime#9541

Merged
Octachron merged 6 commits intoocaml:trunkfrom
abbysmal:instrumented_runtime_manual
Jul 8, 2020
Merged

Add a chapter for the instrumented runtime#9541
Octachron merged 6 commits intoocaml:trunkfrom
abbysmal:instrumented_runtime_manual

Conversation

@abbysmal
Copy link
Copy Markdown
Contributor

@abbysmal abbysmal commented May 7, 2020

Hello,

Following #9082 , here is a draft for the instrumented runtime's manual page.
I think it contains all the essential information for using the instrumented runtime.

Some details maybe needs discussion:

  • The babeltrace section mentions and links to various external tooling, I'm unsure what is the preferred path for this kind of content. Since the library acts as the reference implementation for the format we are using, I believe mentioning it is useful. (and it bringing nice extra features, like an easy Jupyter integration.)
  • The eventlog-tools (https://github.com/ocaml-multicore/eventlog-tools) section is also an external tool, albeit one we have control over. The documentation on it is pretty scarce in the manual page, as I believe the tool should document itself. It is for now hosted in the ocaml-multicore organization, maybe there is a better location for it, @avsm seems to think it could be moved somewhere else.

After merging #9082 some discussions were brought up on the naming of a few options in the user facing API for the instrumented runtime. (@shindere @dra27)
I didn't ping back there, waiting for the manual page to be done, to make a better assessment of how understandable the API is, given proper documentation.
Maybe this can be better discussed as well now.

Thanks,

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I made a first round of review.

I think that it is perfectly fine to talk about third-party tools that are important to the chapter.

One thing that I found a bit lacking is a sort of "run-through example" where you show a (simple) complete session: compiling and running a program with the instrumented runtime, running the program, and then "using" (visualizing) the result. (The ASCII version is easy, but maybe it would also be possible to have a Screenshot of the Chrome viewer?)

can generate {\em trace files} that can then be read
and analyzed by users in order to understand specific runtime behaviors.

{\em Common Trace Format} is the general purpose binary
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write "The {\em Common Trace Format} (CTF)", and generally use the same grammar for the expanded version as for the common noun "format". On the other hand, the abbreviation CTF itself can be used as a proper noun ("the problem of CTF", etc.).

in the program being traced.
\end{itemize}

For more information on {\em Common Trace Format}, see the website at
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Common Trace Format


This filename can also be specified using the
{\tt OCAML\_EVENTLOG\_FILE} environment variable.
The given path will be suffixed with {\tt .pid.eventlog}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I correctly understand that by default I would get the file caml-<pid>-eventlog, but if I use OCAML_EVENTLOG_FILE I would get caml.<pid>.eventlog, with dots instead of dashes? This is strange.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you use {pid} instead of just pid to make it explicit that this a metavariable (for a concrete pid) in the examples. Maybe it would be nice to use it concistently, eg. here {\tt .\{pid\}.eventlog}?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the final filename by default is -eventlog, and not .eventlog, there is however no strict reason for this. (the tooling makes no assumptions on the filenames.)
If further adjustments are deemed necessary to the interface (following the _FILE/_PREFIX conversation), we can change this behavior as well.

The given path will be suffixed with {\tt .pid.eventlog}.

\begin{verbatim}
OCAML_EVENTLOG_FILE=/tmp/some/prefix OCAML_EVENTLOG_ENABLED=1 ./program
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there already was a discussion of this but it is slightly weird to name _FILE a variable that only set the prefix of the file. I think that OCAML_EVENTLOG_NAME (or _PREFIX) would get less confusing; ideally it could also be useful to be able to set a file for sure, for example for scripting usage (when the file is a freshly generated temporary file, so no conflict is to be expected), maybe OCAML_EVENTLOG_FILE or OCAML_EVENTLOG_FILE_FULL? (Does your tooling make assumptions about the naming scheme?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the name could be changed to _PREFIX, however my approach on allowing the user to explicitly set a full filename is a conservative one, considering in future designs (when executing on OCaml Multicore, with multiple domains), there likely won't be one, but many traces generated.
With this usecase in mind, I would rather avoid implementing a feature that may be obsoleted in the future. I considered the "pid" to provide a good enough indicator for scripting purpose.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with _PREFIX and no way to set a specific file. Maybe it could be useful to document the reason for this scheme, so that in particular script authors know that they should look for $OCAML_EVENTLOG_FILE_PREFIX.*.eventlog, and not assume a single file, if they want to be robust to future changes.

Copy link
Copy Markdown
Contributor Author

@abbysmal abbysmal May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, how should we proceed with the _PREFIX change? Should I open another PR to trunk?
I will document as well the reason for not being able to produce a specific filename.
I will as well change the default naming scheme to caml-{pid}.eventlog to make everything more consistent.

\end{verbatim}

In this example, the trace will be available at path
{\tt /tmp/some/prefix.pid.eventlog}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{pid}?

For a program with an extended running time where the collection of only a
small sample of events is required, using the {\em eventlog_resume} and
{\em eventlog_pause} primitives may help relieve some of the
tracing induced performance impact.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what would typically be the tracing you observed when eventlog is paused?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If eventlog is paused during a whole run, the tracefile would contain no traces, merely just the Common Trace Format header.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you debug interactive/long running program it's not always evident to do so. I think it would have been nice to provide a standard way to control the collection from the outside, modulo a call to a single function, something like:

Gc.install_eventlog_control : ?signum:int -> unit -> unit

that would install a signal handler so that you can killall -SIGUSR1 myprogram collection.

Also there's no way to programmatically flush ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your input,

There's no way to programmatically flush, besides relying on the fact that Gc.eventlog_pause do force a flush. I think this could be useful indeed.
I think interfacing eventlog controls through signals could be interesting (and signals are indeed the preferred interface for such things), and we could consider it for the next iteration on the instrumented runtime.
(although I think the design may be a bit less straightforward with tracing domains in multicore, ie. is the sighandler installed for every domain, or on only the one executing the code? Can this interface be kept consistent between now and future releases?).

The instrumented runtime having been merged already, I am sorry to have enabled such discussions so late in the process by making the documentation after the fact.
I do not know if additions at this stage are possible (or maybe they could be considered minor tweaks.), however all of this is noted and can be addressed in future iterations of the instrumented runtime.
(as domains will introduce required changes to the instrumentation facility, and provide a complete baseline, feature wise, on which it will be easier to reason in terms of expose-able features that will not need to be revisited later.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we don't really know yet how signals will work in Multicore. I would recommend never having only signals as a way to change things; there should be a programmatic control to use instead of signals for each feature. (One could also think of exporting a control socket, possibly decided through an environment variable, but maybe we can wait for the user need to arise before implementing this.)

The instrumented runtime having been merged already, I am sorry to have enabled such discussions so late in the process by making the documentation after the fact.

As a strong believer in documentation, I support this message :-)

I do not know if additions at this stage are possible (or maybe they could be considered minor tweaks.)

Yes, given that this instrumented runtime is completely new, there are no users whose workflow would break yet, so I think you should feel free to propose further changes for now, until the Release Candidate process has started (at this point we want to minimize changes to the release branch).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc our release gods, @Octachron @damiendoligez)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however all of this is noted and can be addressed in future iterations of the instrumented runtime.

Sure these comments were not meant to be addressed right now. API wise the only real thing I feel is missing is a way to programmatically flush (e.g. because you know appropriate points in your program where it's convenient to do that).

The signal idea (which may indeed not be a good one in the context of multicore) was just me trying to figure out a quick and dirty generic way to be able control instrumentation from the outside.

{\em eventlog_pause} primitives may help relieve some of the
tracing induced performance impact.

\section{s:instr-runtime-read}{Read traces}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading traces?

{https://github.com/ocaml-multicore/eventlog-tools}.
\else
{\tt https://github.com/ocaml-multicore/eventlog-tools}
\fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to pointing at the page for more documentation (which is fine), I would like to see two representative examples of usage of the tools; one to print an histogram (why not show the actual output on an example and say a word about what we see?), and also the command sequence to convert and visualize the trace in chromium. Basically I would expect people reading this manual page to be able to do basic tracing of their OCaml program by just reusing examples shown in the documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good, I will provide such examples shortly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one extra question regarding this particular request:
Should the inclusion of such examples be done in the manual page, or it could be done in the eventlog-tools repository itself ? (for the eventlog-tools related demonstrations.)
As for babeltrace, I think it is fine to include a demonstration as well as a small code snippet showing basic decoding of an OCaml trace, for processing. (as is our usecase in the multicore team, using Jupyter.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the inclusion of such examples be done in the manual page, or it could be done in the eventlog-tools repository itself ?

I think it is fine if it is in the eventlog-tools repository, and arguably even better: if you change the tool (and remember to update the documentation), all manual pages that refer to the right repository will get the updated documentation.

I think the documentation review process is mostly useful to ensure that the documentation exists. We have more leeway on where precisely the documentation is, as long as you are actually forced to write it :-)

copied in the same directory as the generated trace file.
However, {\em babeltrace} expects the file to be named
{\tt metadata} in order to process the trace.
Thus, it will need to be renamed when copied to the trace's directory.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, showing a concrete example of that (the shell sequence that does a copy and then invokes babeltrace to do somthing) would be useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted for both, I will provide concrete examples, thanks.


\begin{verbatim}
ocamlopt -runtime-variant i program.ml -o program
\end{verbatim}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a simple example somewhere of how to configure the runtime variant in a Dune project? (If so, could you link to it as well?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the bytecode compiler, is it possible to choose the runtime at ocamlrun invocation time rather than at compilation time? This could make it easier to experiment with the instrumented runtime without having to tweak the build system.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuring the executable to use the instrumented runtime with dune is as simple as using a (flag "-runtime-variant=i") parameter.
As for ocamlrun, there is now an ocamlruni executable available that is the instrumented runtime.
I can add mentions for both in this section

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a side note, for the dune support question, an issue was opened here: ocaml/dune#3500

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For Dune I believe it would be nice to document whatever way currently works. (Most prospective users of the feature are not calling ocamlfind directly, they want to know how to do it through Dune). If the tool gains new feature to make this even easier, we can edit the manual.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 18, 2020

Note: some of the aspects I mentioned as missing (for example, screenshots) are in fact present in the first message of the Eventlog PR #9082. I think it would be nice to review this first message and ensure that all the information it contained (that is relevant for users) is present in the documentation.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 26, 2020

@Engil thanks a lot for your further work on this PR. For now I haven't had a full look at the result again, please let me know when you think I should do it.

@abbysmal
Copy link
Copy Markdown
Contributor Author

@gasche I need to improve a few things still before it is reviewable, I will report back here once I believe I addressed most of your suggestions and comments, thank you for your review.
I will do my best to make it ready soon.

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented May 29, 2020

@gasche
I think the PR is ready for further review, sorry about the delay.
Note that some of my recent changes include changing the option names to suggestions made earlier.
(namely, the default output path being caml-{pid}.eventlog, as well as changing OCAML_EVENTLOG_FILE with OCAML_EVENTLOG_PREFIX.)
This change is however still not upstreamed.
I have a branch making this change to the runtime, should I open a PR for it?
The branch is here: https://github.com/Engil/ocaml/tree/instrumented_runtime_consistent_options

I also included a simple (as simple as I could get it) decoder example using babeltrace.

Have a great day,

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 2, 2020 via email

Copy link
Copy Markdown
Contributor

@dbuenzli dbuenzli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to use all that @Engil. Except after reading this nice piece of documentation I still don't get what kind of insights/statistics the system can give on my programs :-) You need to make that point clear in the introduction of the document.

%HEVEA\cutname{instrumented-runtime.html}

This chapter describes the OCaml instrumented runtime, a runtime variant
allowing the collection of events and metrics.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point I want to know what these events and metrics are to understand if that's what I'm looking for or should stop reading.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur that the introduction should state which "insights into a program's execution" are provided. Are we measuring garbage collection events? Access to out-of-heap pointers? Cache misses?... At the end of the paragraph, the reader should have a basic idea on how the instrumented runtime can help them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I just pushed a revision to address this issue.
The introductory paragraph now covers the kind of information you can find with the instrumented runtime


\end{verbatim}

Compiling the program and linking it with the instrumented runtime,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be made more clear that the instrumented runtime is only a "link time entity". That is no special flags are needed during the compilation phase of the objects that comprises your final program.

\end{verbatim}

The program will execute and generate a trace file in the
current directory.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current working directory and I guess it's the current working directory of the program.

(flags "-runtime-variant=i"))
\end{verbatim}

The instrumented runtime can as well be used with OCaml's bytecode interpreter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well -> also
with OCaml's -> with the OCaml

But IANANES.

\end{verbatim}

The instrumented runtime can as well be used with OCaml's bytecode interpreter.
This can be done by either compiling the program using the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done by adding the flag {\tt -runtime-variant=i} when linking the program with {\tt ocamlc}


\begin{verbatim}
OCAML_EVENTLOG_ENABLED=p ./program
\end{verbatim}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah @gasche also got confused, it's still not clear (see above).

OCAML_EVENTLOG_ENABLED=p ./program
\end{verbatim}

The program will have to resume events collection explicitly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never started so resume is misleading. Maybe use unpause or even start here.

\end{verbatim}

The program will have to resume events collection explicitly.
This can be done via {\tt Gc.eventlog_resume} and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can (only ?) be done from the program.

\end{verbatim}

The resulting trace contains only one event payload, namely a {\em flush} event,
indicating how much time was taken to flush the trace file to disk.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a documentation issue but I find it strange that a paused event log that remains so does anything to the program. I would expect the file to be created but for it to be empty. (I can see however that you'd have to detect "was never unpaused in the program" condition that it may not be suitable to do so for perf reasons)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is sane to have at least the CTF header being written, as a part of the file opening process.
The flush event however could be avoided, without trickery, its intended purpose is to time the execution of the eventlog flush function, however at least one eventlog flush happens in any instrumented program: atexit, which forces a buffer flush.
This buffer could very well be empty however (as is the case for a paused program), so this extra flushing information could be skipped, since the flush function will exactly flush nothing. (and not introduce any performance penalty whatsoever)

For a program with an extended running time where the collection of only a
small sample of events is required, using the {\em eventlog_resume} and
{\em eventlog_pause} primitives may help relieve some of the
tracing induced performance impact.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you debug interactive/long running program it's not always evident to do so. I think it would have been nice to provide a standard way to control the collection from the outside, modulo a call to a single function, something like:

Gc.install_eventlog_control : ?signum:int -> unit -> unit

that would install a signal handler so that you can killall -SIGUSR1 myprogram collection.

Also there's no way to programmatically flush ?

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Jun 5, 2020

Thank you very much for the review.
I pushed a few changes to the manual page.
The diff may be a bit painful to go through, sorry about this, I had to move around big chunks of the manual (I think swapping the "Reading traces" section and "Controls+limitations" suggestion made sense and provide a better reading experience).
I tried to address some formulation issues as well and provide more details where required, I will re-read the thread and see if I missed any comment, please let me know if I forgot anything.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 8, 2020

Here is what I can see:

"Collected metrics includes" -> include

"For more information on the {\em Common Trace Format}, see the website
at": I'd remove "the website at".

Note that the instrumented runtime, being an alternative runtime for
OCaml programs, is only referenced during the linking stage of the
executable. As such, there is not need to adjust the compilation steps
of intermediate compilation units or libraries.

I would remove the comma after runtime and add an "it" before "is only
referenced" in the first sentence. The second sentence could be
rephrased, perhaps somethink like: "This means that only the linking
stage needs to be adjusted, not the compile stage" (the notion of
intermediary compile units sounds a bit odd to me).

"This script expect to receive": expects

"printing each events in the process":event, singular

"For more information on {\em babeltrace}, see the website at": same
remark as above, "the website at" seems superfluous to me.

The default trace filename is {\tt caml-\{pid\}.eventlog}, where {\tt
\{pid\}} is the process identifier of the traced program.

Perhaps you could write "PID" (uppercase) to make it even more obvious
that it's justa place-holder?

This filename can also be specified using the {\tt
OCAML\_EVENTLOG\_PREFIX} environment variable. The given path will be
suffixed with {\tt \{.pid\}.eventlog}.

Just a thought. If you wanted to give users control over the name of the
generated files, you could have filenames that accept place-holders like
%p for the pid and then users could do what they want.

In this example, the trace will be available at path {\tt
/tmp/a_prefix.\{pid\}.eventlog}.

Here again it feels to me that an uppercase PID would improve
readability.

This restriction is in place to make room for future improvements to the
instrumented runtime, where the single trace file per session design may
replaced.

I think you should at least add a "be" -- "bay be replaced". That's
where the idea mentionned above may become useful: with place-holders
like %p you would be safe and users would keep their ability to have
full control on how files are named.

"Note as well that intermediate directories in the given path will not
be created": parent directories is the right terminology, I thinnk.

"The runtime assume the path is accessible for writing and creating
files": assumes. There may be something to clarify here but I'm not
sure. Basically, what you are saying is that by specifying a prefix,
users also specify a directory (the dirname of that perfix) where other
files may be created, right?

At the beginning of the "Pausing and resuming tracing", there are several
occurrences of "events collection", some where event is plural, others
where it's singular. You may want to change this so that all are the
same.

"indicating how much time was taken to flush the trace file to disk":
suggestion, how much time was spent flishing?

"The resulting trace will contain every events": either "every event" or
"all events", I think I'd go for the second one.

"The instrumented runtime currently put a strong emphasis": puts

Finally, I think when you explain how to build the program to be
instrumented you use -o program but then later you refer to the
produced executable as a.out.

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Jun 9, 2020

Hello @shindere ,
Thank you for your time and your review and suggestions. I tried to rephrase for the best following your recommendations, and fixed the various typos.

Just a thought. If you wanted to give users control over the name of the
generated files, you could have filenames that accept place-holders like
%p for the pid and then users could do what they want.

I agree this may provide a nice alternative to the current situation, having substitutions in the prefix path would alleviate a few of the issues, however it still works only when assuming that there may be only one trace generated by the runtime. (which it may not be doing in OCaml Multicore.).

If I understand the idea, it is to leave the option to the user to leave out any modifier (%p in your example.) if desired.

I think having the option to leave out any modifier goes against the precautions taken to ensure that the design of the instrumented runtime is not limited by providing users with options that goes against potential design changes.

If a user leaves out the modifier and wants a specific path, but then we want to output multiple trace files (be it Multicore where we could support one trace file per domain, or a possible inclusion of following forked processes), what would be the correct course of action?

However if we were to allow such modifiers and actually require them, it would give user a bit more control, although the level of control gained in that specific situation does not seems worth the extra handling to me. (but I may be wrong and missing an obvious improvement over the current, static append)

I am very happy to discuss these suggestions, and I will keep them in mind when iterating on the multicore design of eventlog to find the sweet spot that enables a good user experience while keeping an adequate model for it.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 9, 2020 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 9, 2020

It sounds very delicate (and complex) from an implementation perspective to efficiently write from several domains in parallel to a single trace file -- you have to deal with concurrent updates to the file. On the other hand, having each domain produce a trace in a separate file is easy and requires no particular locking mechanism. I suppose this is why each domain would write to a separate file -- I find this reason very convincing.

I think that the discussion on naming templates is somewhat overkill for now. We can have something that is simple and makes sense now (even if it gives up some control over file placement), and iterate later to extend the system with more precise/complex naming features.

Remark: I do like @shindere's proposal to use template variables in filenames. I think it would extend gracefully to the multicore scenario by adding a template parameter for the domain number. But again, this feature can come later, with only a prefix for now.


The {\em Common Trace Format} is the general purpose binary
format used to encode traces.
A complete trace is comprised of:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A complete trace is comprised of:
A complete trace comprises of:

Copy link
Copy Markdown
Member

@dra27 dra27 Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comprises of or
comprises consists of

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 10, 2020 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 10, 2020

Yes.

Still thinking aloud, I think that a forward-looking scheme for template variables would be to have a variable %(trace-id) for a possibly-structured "trace identifier", instead of specific variables for "process identifier" and "domain identifier". This way, if you use OCAML_EVENTLOG_TEMPLATE="myname-%(trace-id).log", today %(trace-id) is instantiated with just the process identifier, but tomorrow you may get <pid>#<domainid> there: the same setting graciously adapts to the runtime change. Script users have to be careful not too assume too much on the trace-id format (it's just a number today but this may change), but they can write robust scripts, yet have (slightly) more control than with OCAML_EVENTLOG_PREFIX.

(_TEMPLATE settings that do not contain "%(trace-id)" in them should be rejected, because doing otherwise would require storing several traces to the same file.)

The current state can be explained as saying that OCAML_EVENTLOG_TEMPLATE is defined as "$(OCAML_EVENTLOG_PREFIX).%(trace-id).eventlog".

The documentation is a bit more precise as it says ".%(pid).eventlog"; maybe we could rephrase slightly to encourage scripts that are robust to a future change of prefix-completion format. Maybe, instead of recommending script authors to match on the PID, we should encourage them to generate a random number and use it as part of the prefix to identify the run with certainty? Or document that the trace-id will always start with the pid as a prefix, followed by a non-number?

@abbysmal abbysmal changed the title Add manual page for the instrumented runtime Add a chapter for the instrumented runtime Jun 10, 2020
@abbysmal
Copy link
Copy Markdown
Contributor Author

I updated the branch with the latest suggestions, thank you everyone.

I updated the Changes entry, however since there are now changes in the eventlog code itself (option are renamed.), I believe the Changes entry should be split, or changed to fit in the right category?

Would it make absolutely no sense to have the traces of several domains
mixed in the same trace file?

I don't think it technically makes no sense to have the traces of several domains in a single file, however there is a few drawbacks to this:

As @gasche mentions, it may not be the most sensible approach to handling tracing in multicore.
It is possible to have a single thread taking care of flushing duties, with valuable benefits to it, offloading flushing duty away from each individual domains, meaning the flushing overhead is now a satellite concern.
However it also means having a locking mechanism in place to ensure the buffer handling duties are not problematic accross domains and the flushing thread..

One more aspect to this, is that there may be value in having a semantic separation of trace information.
One stream (in CTF speech, what is currently a single file containing our payloads right now) could contain only a specific subsets of information. (think, having specific domain execution information in one trace, and other information in another.)

This become truer if user-defined events are allowed, where it would make sense (and would be the easiest path) to have separate streams (so, trace files) for each.
Specifically, the problem would be related to how user-defined events would be defined, likely meaning a dynamic metadata file generation, or the user providing one. A different metadata file would imply the requirement of separate trace files (as one single trace file can not contains payload for two different metadata files.)

Those are potential approaches to future design iterations that causes me to be cautious with validating too early assumptions about the tracing infrastructure.
Of course most of it is conjecture until domains become a part of the mainline OCaml distribution. (and user generated-events are even more so conjecture, as this problem may be investigated with other means.)
But I tend to strand on the cautious side of things for now.

Just a quick note that my suggestion for template variables (thanks for
the name @gasche) was not necessarily meant to be implemneted now. I was
just thinking aloud and putting ideas out of my brains, in a place which
is more accessible to others. ;-)

I am very happy with discussing improvements to the instrumented runtime and trying to figure out the best possible user experience, thank you for your suggestions again. :)

I think the variable template suggestion is a good idea (and the OCAML_EVENTLOG_TEMPLATE suggestion from @gasche)
Having a single, possibly structured identifier would provide enough room for use cases I can foresee (and mentioned earlier in this post).
Elaborating on the idea, I think a correct way to do it would be to have a UUID (generated during instrumentation bootstrap) of some sort to always prefix trace files. (as in, this would be a specific part of %(trace-id))
This would provide a consistent pattern (following whichever UUID convention) to know a file is a trace, and would provide with even more consistence across potential many trace files session in a potential multicore implementation. (allowing to programmatically regroup tracefiles per sessions in a script.)

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 10, 2020

Regarding the changelog, the two following approaches are completely fine:

  • Adding the number of the current PR to the entry of the "Eventlog implementation" PR if that works.
  • Having an entry for the current PR in the "Documentation" section, even if it includes some small code changes.

Please choose whatever you prefer.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 10, 2020 via email

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Jun 15, 2020

Did you consider simply merging the Changes entry of this PR with the
one that already exists for the instrumented runtime itself?

If that sounds like the best option I can do that, however since it's two different PR I think the current option is fine.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jun 15, 2020 via email

@abbysmal
Copy link
Copy Markdown
Contributor Author

Hello,

Is there anything to improve further in this PR?
On my side I think I addressed previous comments, tell me if I missed any. (and if I'm currently blocking merging this PR, sorry if it is the case!)

@Octachron Octachron added this to the 4.11 milestone Jun 29, 2020
Copy link
Copy Markdown
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the new chapter, this seems ready for inclusion. Improvements can always be made later.
Nevertheless, you may want to fix a handful of typos.

I had some more nitpicking comments, but you can ignore those.

ocamldep.tex profil.tex debugger.tex browser.tex ocamldoc.tex \
warnings-help.tex ocamlbuild.tex flambda.tex spacetime-chapter.tex \
afl-fuzz.tex unified-options.tex
afl-fuzz.tex instrumented-runtime.tex unified-options.tex
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to enable caml_example for the instrumented-runtime file (by adding it to the WITH_CAMLEXAMPLE variable below).


Collected metrics include time spent executing the {\em garbage collector},
allowing the overall execution time of individual pauses to be measured
down to the time spent in specific parts of the garbage collection algorithm.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence feels somewhat unbalanced: the essential information is introduced by "allowing" (which is repeated from the line above). Maybe more

Collected metrics include time spent executing the various phases of garbage collection for each individual pause.

Or if we want to keep the two parts structure:

Collected metrics include time spent executing the {\em garbage collector}.
The overall execution time of individual pauses are measured
down to the time spent in specific parts of the garbage collection.

A more personal opinion, I tend to favour latex'\emph{}\ over {\em }`.


Insight is also given on memory allocation and motion, recording
the size of allocated memory blocks, as well as value promotions from the
{\em minor heap} to the {\em major heap}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this seems to be a part of the paragraph above. since those are collected metrics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the first comma could be replaced by by to avoid mixing two kind of commas

Insight is also given on memory allocation and motion by recording
the size of allocated memory blocks, as well as value promotions from the
{\em minor heap} to the {\em major heap}.

?

\ifouthtml
\ahref{https://diamon.org/ctf/}{https://diamon.org/ctf/}.
\else
{\tt https://diamon.org/ctf/}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use \href{https://diamon.org/ctf/} in both backends.

|> clear
|> ignore

\end{verbatim}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, the verbatim environment could be replaced with caml_example to enable highlighting and code verification

\begin{caml_example*}{verbatim}
...
\end{caml_example*}

In this case, you probably want to remove the extra-indentation.

|> clear
|> ignore

\end{verbatim}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A caml_example environment should work there too.

and analyzed by users in order to understand specific runtime behaviors.

The {\em Common Trace Format} is the general purpose binary
format used to encode traces.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression than encode put too much accent on the encoding part rather than the storing part.
An alternative phrasing might be

The generated trace files are stored using the \emph{Common Trace Format), which is a general purpose binary format.

(You can do whatever you want with the above suggestion, including ignoring it).


\subsection{ss:instr-runtime-tools}{eventlog-tools}
{\em eventlog-tools} is a set of libraries and tools written in OCaml
that allow to decode and perform basic format conversions and analysis.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a set ... that allows except if each individual library or tool can perfom "basic format conversions and analysis" by itself.

\subsection{ss:instr-runtime-babeltrace}{babeltrace}

{\em babeltrace} is a C library, as well as a Python binding and set of tools
that serve as the reference implementation for the {\em Common Trace Format}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma?

, as well as a Python binding and set of tools, that serves


The {\em eventlog_metadata} file can be found at this path and
copied in the same directory as the generated trace file.
However, {\em babeltrace} expect the file to be named
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expects

@abbysmal abbysmal force-pushed the instrumented_runtime_manual branch from 810ffd1 to d5632a3 Compare July 8, 2020 01:10
@abbysmal abbysmal force-pushed the instrumented_runtime_manual branch from d5632a3 to 9ba979e Compare July 8, 2020 01:23
@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Jul 8, 2020

Thank you @Octachron for your review, I addressed the issue your mentioned, I also rebase and squashed a few things as far as the history allows me to practically do it.
I however refrained from changing the various {\em} to emph{} but if this is a requirement I will do it. :)

@Octachron
Copy link
Copy Markdown
Member

This looks good to merge for me, except for the is comprised of typo mentioned by @avsm and @dra27 which is still here.
My personal preference for \emph{} is fortunately not a requirement.

@abbysmal
Copy link
Copy Markdown
Contributor Author

abbysmal commented Jul 8, 2020

Good catch, I missed it. I just pushed the fix for this typo.

and analyzed by users in order to understand specific runtime behaviors.

The generated trace files are stored using the {\em Common Trace Format}, which
is general purpose binary tracing format.
Copy link
Copy Markdown
Member

@Octachron Octachron Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing article "a general purpose"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix, thanks.

@Octachron Octachron merged commit f879c05 into ocaml:trunk Jul 8, 2020
Octachron added a commit that referenced this pull request Jul 8, 2020
Add a chapter for the instrumented runtime

(cherry picked from commit f879c05)
@Octachron
Copy link
Copy Markdown
Member

Merged, and cherry-picked to 4.11 to not separate the instrumented runtime from its documentation.

@shindere
Copy link
Copy Markdown
Contributor

shindere commented Jul 8, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants