These are Cogent Shell (cosh) scripts for the Cogent Numbers DataBrowser to manage the process of running simulations on a remote cluster. This framework replaces grunt and incorporates some of its key features, while eliminating the need for any python code, and the extra complexity of the server-side daemon.
A key feature of cosh is the ability to transparently run shell commands on a remote host, connected through ssh, including using the scp protocol to copy files, in addition to the standard capturing of command output to a variable. This eliminates the need for any code to be installed on the remote host: everything runs from the local client (laptop), greatly simplifying the overall programming logic.
The databrowser.Browser GUI widget provides a tabbed file-browser for editing files, viewing tabular data in table.Table spreadsheet-like tables, and interactively plotting data. In addition, custom tables can be created, as done in these scripts, that summarize directories of data (from different simulation runs) with various meta data displayed. By selecting rows in such tables and running scripts installed on the toolbar, the user can manage the process of submitting and comparing simulation runs.
The general workflow is as follows, assuming a standard install has been performed, with a simdata symbolic link in the main simulation code directory pointing to the simulation data with these scripts installed.
- Run numbers from the main sim code directory with a new browser:
> numbers -e databrowser.NewBrowserWindow("simdata")-
Jobsshows current jobs in a Jobs tab -
Submitruns a new sim job -
Statusgets status of any running jobs. Anything done running gets status ofFinalizedand is no longer updated by Status. All job metadata is downloaded from host, but not the Results output data, which isFetched separately because it may be large and often needs to be consolidated in a particular way, because multiple runs of a job are executed in parallel. -
Fetchgets result.tsvfiles from server, consolidating parallel runs into_allepc.tsvand_avgepc.tsvetc files. It can be run on running or Finalized jobs. When run on Finalized, then the status is set toFetchedand it is automatically skipped in any future Fetch actions. -
Resultsgrabs specific result data files into aResultstab, from which further examination and plotting occurs. This step is necessary because there are typically multiple different types of results files, so you need to select which type you want view. -
Plotplots combined data across any selected files inResultstab, allowing you to compare them, using theJobIDas a legend so each job has its own line color. -
Diffshows a diff browser for any two selected Jobs, or one selected job vs. the current sim working directory.
From this simscripts directory, run:
> ./install.cosh ~/full/path/to/simWhich creates a project directory in ~/simdata/projname/username that has a symbolic link in the sim directory so it is accessible directly from there, but is not actually located there so there are no issues with git ownership of these files.
You then need to create a defaults.cosh file that sets various parameters on the Config object specific to this project, including extra non-Go file and sub-directories that might need to be copied up to the server to run it. There are also job configuration parameters (e.g., max runtime). Here is an example, for a sim in axon/examples, that requires the extra go get:
// to run, in numbers:
// databrowser.NewBrowserWindow("simdata")
// primary remote server: avail as @1
cossh hpc2.engr.ucdavis.edu
func defaults() {
cf := &Config
cf.Defaults()
cf.ServerName = "hpc2"
cf.ExtraGoGet = "github.com/emer/axon/v2@main"
cf.Job.Hours = 1
cf.Job.Qos = "oreillylab"
cf.ExcludeNodes = "agate-[0,17-19,28,41-45]"
cf.ExtraFiles = []string{"config_job.toml"}
cf.Update()
}
defaults()The following annotated example code demonstrates the key features of the cosh language, from the Status script. cosh automatically detects shell exec lines vs. Go code in an intuitive way, based on various syntactic indicators. Within a Go context, exec code is explicitly indicated with backticks, and within exec, Go code is surrounded by { } braces.
@1 specifies the current remote host -- there can be any number of maintained host connections, with unique names -- and @0 is the local host:
sj := `@1 cat job.job` // get the job id, by running cat on remote host @1
if sstat != "Done" && !force { // standard Go control logic
[@1 squeue -j {sj} -o %T >& job.squeue] // [ ] = don't stop on failure
stat := `@1 cat job.squeue` // get results
...The current working directory is maintained and updated on each host (local and remote). Here is another example of the combination of go and shell exec, including the scp command to retrieve files from the remote host:
jfiles := `@1 /bin/ls -1 job.*` // get all job files
for _, jf := range cosh.SplitLines(jfiles) { // cosh package has helper functions
rfn := "@1:" + jf // prefix filename with @1: for remote host, otherwise local
scp {rfn} {jf} // { } indicates go expressions within shell exec context
}
@0 // switch context back to local host for further processing