add current command-line to 'command_history' header entry#962
add current command-line to 'command_history' header entry#962jdtournier merged 3 commits intotag_0.3.16from
Conversation
as suggested in #956.
|
Makes sense that, if you were interested in one command, of all commands, that would indeed probably be the most recent one. Another solution, rather than inverting the order (which has the potential to be non-intuitive in other scenarios), would be to let |
I like this idea! You've probably noticed that my recent commit on The other thing to worry about if we reverse the order of the history is to ensure it's still consistent with the MGH format. I assume these entries appear in chronological order in the file, but that is admittedly only an assumption... |
😁 of course you already have... Sorry, should have re-read your post before jumping in. |
|
In MGH format the command line history is chronological, not reverse. See the example of |
|
OK, I've added the version of the executable to the command history, so we can track that through also. I'll merge this now, we can modify over the next few days if need be. |
Fixes for compiling against 3.0_RC1 branch. Moved contents of MGH format "other" field from header comment entries to header key/value pairs. Add contents of MGH_TAG_CMDLINE to header key/value entry "command_history" as in #962, and vice-versa.
|
|
||
| std::string cmd = App::argv[0]; | ||
| for (int n = 1; n < App::argc; ++n) | ||
| cmd += std::string(" \"") + App::argv[n] + "\""; |
There was a problem hiding this comment.
@jdtournier Is it necessary to add double-quotes to every entry in argv? Could it not be reserved only for entries that contain whitespace?
There was a problem hiding this comment.
Yes, it is a bit ugly. But white spaces aren't the only potential problem. We'd have to watch for at least these characters: []*?%#~&$(){}\t\\"';:<> - i.e. absolutely anything that the shell might conceivably interpret. If the command we received contained these characters, then that's what the user intended, and they must have supplied the argument in such as way as to avoid interpretation by the shell. So we should respect that - the arguments the executable received have already been interpreted by the shell, so we should ensure that a straight copy/paste of the command works without the shell re-interpreting anything incorrectly.
Also, I think we should be using single quotes ' to avoid further variable expansion, etc. - otherwise an argument like stuff$n.mif would be interpreted when it should be left alone (although I completely agree that users who actually names their files like that deserve everything they get...).
There was a problem hiding this comment.
Reminds me of here, where I wrap paths in quotation marks only when necessary for shlex.split(). Admittedly that maybe needs to check for more than just whitespace; but that's what triggered the implementation. Why not simply use a similar heuristic here, i.e. only wrap a string / path in quotation marks if it contains special characters (which will be the vast minority of the case)?
There was a problem hiding this comment.
Minor addition to this: Could probably use Path::basename(App::argv[0]). I'm not sure that keeping track of the location of the binary on the system for each command in the history is worth the bloat for the very rare instance where it might provide useful information.
|
Another follow-up on this: If a command takes multiple input images, and each contains its own |
This is an issue I'd raised a while back, and I'm not sure there's an obvious easy answer here... If we store a tree, it needs to be easily parsed by a human, I doubt we'll see these entries needing to be parsed by a computer any time soon (unless you guys have grand plans I'm not privy to). But you'd need to store a record of which input to the Nth command a particular history branch refers to, and ways of expressing this hierarchy. Maybe we need to prefix the command history entry with the filename of the file it refers to (as provided to the command storing the history), and some way of expressing the tree structure in a way that can easily be represented on a terminal screen. I'm not sure there's any worth in making the header structure any more complicated if it doesn't also make the display of this information any easier. From my point of view, if we can come up with a textual representation of the hierarchy, there is nothing stopping us from just storing this in the header using the current key: value format, and just dumping it out when needed, like we do the comments fields for instance. All it needs is for the formatting that carries the hierarchy information to be part of the text stored in these fields, for example: would need to store a structure like this: There's no reason why we couldn't store that as-is in the header, simply prefixing each line with a Then the history would become: which should be relatively easy to manage: the whole history for the arguments gets indented by 4 or so spaces, the leading line prefixed by A bigger issue though is how to handle this transparently. At the moment, the history update is handled by the |
|
By the way, I do think it would be worth removing the quotes when they're not required. We just need a semi-robust way of detecting characters that would require quoting (whitespaces probably aren't enough). Also, I'd rather keep the full name of the |
|
I'm not entirely convinced the benefits are worth the hassle, and the level of intrusion in commands (and potential performance). As @dchristiaens brought up in that original discussion, I too (any others that I know) use non-MRtrix steps at times. And then there's manual steps (e.g. ROI creation or editing) in many typical scenarios too... For me, if any benefit at all, there's mostly some value into storing the last command an image was generated from, as some sort of automated note-taking functionality. The current implementation already goes beyond that by keeping the (linear) history, but even there it creates such a big bulk of almost mess when outputting that, that I've actually never used it for any purpose. I'd rather look at the 'mrinfo' of the command/image before and look at what produced that, if I'd ever need to. Thinking about that though, this may be something: if an image just stores the last command that generated it (which has the source images file names in there too), and applies a convention as to what path these are relative to (or stores this (relative) path), then you could have an |
This doesn't just apply to
More generally, I think I'm in the camp of only storing the generating command, not a full command history. With the potential for non-MRtrix3 commands & use of non-MRtrix3 image formats in intermediate steps, potential non-linear paths, and no immediate need to be able to re-produce processing steps in an automated fashion using this functionality alone, it might be a rabbit hole best avoided. This would also be much more compatible with the Python scripts, since the "generating command" stored in the final output image header(s) could be over-ridden by the script library to represent the user's call to the script, rather than the internal |
Yes, agreed. That's probably the best (easiest) defined behaviour as well. As I mentioned above, this still doesn't limit users from recreating a pretty detailed history in a well-controlled environment. |
Just thought I'd point out that this won't handle piped commands... There'd be no permanent record of how the temporary files were generated. But generally, I don't think we necessarily need to vastly improve on what we already have. It's a nice bonus, and if someone wants to improve it, they'd be more than welcome. But this has to remain a very low priority issue...
As far as I can tell, the correct action here is by necessity command-specific, so I don't think we can generalise enormously. Helper functions may help for common cases, like the one you highlighted in But then it's almost impossible to guarantee correct behaviour in general, since simply swapping the inputs would result in no DW scheme anyway if it wasn't already there in the |
Not trivially anyway. Technically the temporary file's header entry should contain within it the segment of the piped command used to generate that temporary file, so theoretically it might be possible to correspondingly stitch the full piped command back together; just need to differentiate between input & output pipes. Even with non-linear pipe arrangements, you could detect the fact that the command-line argument is a temporary file rather than a -
Nope: only erases if both images contain a DW scheme, and they don't match. A mask image shouldn't contain a DW scheme (it should be in It does however mean that the DW scheme will be retained if the DWI is the first argument, whereas it'll be absent if the mask is the first argument. Hence the suggestion for more "advanced" handling of header "consistency / variance", that may be sufficiently complex to justify embedding in helper functions. |
Yes, that might be possible. Just to make sure I understand this right, the idea would be to store the full multi-stage pipeline command in the header as a single $ dwi2mask dwi.mif mask.mif
$ dwi2tensor dwi.mif - | tensor2metric - -mask mask.mif -fa fa.mifthen while And this would be achieved by reading the history from the temporary file produced by So that would do-able, with a few gotchas. As you mention, handling multiple temporary inputs to the same command is going to be tricky (but then it's a very rarely used feature - might be worth disabling altogether...?). Other minor issue is what to do if the different commands happen to have different version strings - maybe store them explicitly in order at the end, e.g. like this? And ultimately, this still suffers from the problem I raised before: if we want this to happen transparently, we need to gather the information about all the input headers before creating the output header. I reckon it would be sufficient to ensure all inputs are read before any output is created - maybe enforced through runtime checks just to make sure... |
Neat. Hadn't picked up on that. All good.
Yes, I can see what you mean. I agree that would be useful, particularly for multi-input single-output apps. |
Well, would probably remove the "
👍
Actually I don't think so. Let's use the example from the documentation:
The temporary image created by There's then a second temporary image created by the combination of For The shell is responsible for taking the standard output of the call encapsulated within
In |
Yes, I don't mean it's impossible, just messy. And given that I'm probably the only person to have ever used the feature, I think we could just decide to simplify and limit passing of temporary files to pipes only, which would then make this a lot simpler to manage... Unless this happens to be something you guys use all the time...?
That's not an issue - the code needs to know this to figure out whether to push the image name out to |
as suggested in #956.
All up for discussion. It's a simple addition, so why not. But it's not perfect, that history won't survive conversion to other formats - although it might (with a few modifications) work well with the MGH formats.