============================================================
After you learned how to define a PipelineDog (Web) step (here), now let's look at how to construct an analysis pipeline with several connected steps.
Let's assume that you have a list of file in gzip compressed format, and you want to compare the file sizes for each file when uncompressed, gzip compressed,and bzip2 compressed. A logical workflow could be:
- Uncompress the gzipped files, but keep the original gzipped versions intact;
- Compress the uncompressed files with bzip2, but also keep the uncompressed versions;
- Run
duto find the file sizes, and store the file sizes into a new text file for each file's three different versions.
First, let's put the paths of the list of gzip compressed files (these file paths should be for your local system) in the 1.txt file (i.e., a List File), and after this is done, 1.txt contains the follow lines (i.e., Line Entries):
/a/ABC.gz
/b/CDE.gz
/c/FGH.gz
This 1.txt should be uploaded to the PipelineDog Web App, or created within the Web App. After this is done, this List File 1.txt should be available on the left side panel under the header "List Files".
The entire workflow can then be specified easily as the following PipelineDog Project script:
{
1-1 : {
name : Gunzip while keep original,
in : $1.txt,
run : gunzip -c ~A > ~B,
~A : {},
~B : {mod : "S'.txt'"},
out : $~B
},
2-1 : {
name : Bzip2 while keep original,
in : $1-1.out,
run : bzip2 -k ~A,
~A : {},
out : {mod : "S'.bz2'"}
},
3-1 : {
name : Check file size and save to file,
in : [$1.txt, $1-1.out, $2-1.out],
run : du ~A > ~B,
~A : {},
~B : {mod : "S'.ds'"}
}
}
This script can written as a single plain text file (e.g., test.txt) locally on your own system, and then uploaded to PipelineDog Web App (by clicking on "New Project" and then select this test.txt to Upload), or you can created these three steps one by one within the PipelineDog Web App.
Once parsed, the actual commands will be run are the following:
gunzip -c /a/ABC.gz > /a/ABC.gz.txt
gunzip -c /a/CDE.gz > /a/CDE.gz.txt
gunzip -c /a/FGH.gz > /a/FGH.gz.txt
bzip2 -k /a/ABC.gz.txt
bzip2 -k /a/CDE.gz.txt
bzip2 -k /a/FGH.gz.txt
du ABC.gz > ABC.gz.ds
du CDE.gz > CDE.gz.ds
du FGH.gz > FGH.gz.ds
du ABC.gz.txt > ABC.gz.txt.ds
du CDE.gz.txt > CDE.gz.txt.ds
du FGH.gz.txt > FGH.gz.txt.ds
du ABC.gz.txt.bz2 > ABC.gz.txt.bz2.ds
du CDE.gz.txt.bz2 > CDE.gz.txt.bz2.ds
du FGH.gz.txt.bz2 > FGH.gz.txt.bz2.ds
Please note: this is an over-simplified example. In a real pipeline Bash script generated by PipelineDog, there could be many more lines of Bash code.
This script also demonstrated the idea of how to make a pipeline out of individual steps: by using one step's output as another step's input (e.g., Step 1-1's output was used as Step 2-1's input by specifying in: $1-1.out in Step 2-1). This example also showed how one step could access multiple input List Files (e.g., Step 3-1 used the initial List File as the input as well as Step 1-1 and 2-1's output objects by using in : [$1.txt, $1-1.out, $2-1.out]). In addition, we have also seen an example where a step did not specify an output object (Step 3-1).
The syntax of this example script should be understandable once you have read the PipelineDog (Web) Step Definition guide (here).
Additionally, you can organize your PipelineDog script by separating the parts that you need to modify from time to time from the parts that will largely remain the same. For example, the above script can be rewritten as the following, with the help of $ Reserved Word in PipelineDog script:
{
gunzipC : {
myStepNum : 1-1,
myIn : 1.txt,
},
bzip2K : {
myStepNum : 2-1,
myIn : 1-1.out
},
fs : {
myStepNum : 3-1,
myIn : [1.txt, 1-1.out, 2-1.out]
},
$gunzipC.myStepNum : {
name : Gunzip while keep original,
in : $gunzipC.myIn,
run : gunzip -c ~A > ~B,
~A : {},
~B : {mod : "S'.txt'"},
out : $~B
},
$bzip2K.myStepNum : {
name : Bzip2 while keep original,
in : $bzip2K.myIn,
run : bzip2 -k ~A,
~A : {},
out : {mod : "S'.bz2'"}
},
$fs.myStepNum : {
name : Check file size and save to file,
in : $fs.myIn,
run : du ~A > ~B,
~A : {},
~B : {mod : "S'.ds'"}
}
}
In the example above, the actual Step Definitions stay the same from run to run in the same pipeline, and if one Step Definition is reused in another pipeline, the Step Definition still can be directly copy-pasted without any change. On the other hand, the three objects (gunzipC, bzip2K, and fs) contains all the variables that are probably subject to change from run to run (e.g., gunzipC.myIn), or from pipeline to pipeline (e.g., gunzipC.myStepNum).
If you have any questions regarding how does each step is defined, please refer to the PipelineDog step definition (here). If you have any questions about LEASH Expression and its usage, please refer to the LEASH expression definition here.
============================================================
Anbo Zhou
Yeting Zhang
Yazhou Sun
Jinchuan Xing
May 2016
Aug 2016
Oct 2016