A new runner for DRMAA (currently UNIVA)#7004
Conversation
1b921d6 to
17e2b7a
Compare
|
Because I'm new around here and not 100% sure what this implies, I'm going to ask some questions.
|
|
Hi @datakid
No. Slurm-drmaa is for SLURM clusters (which use sacct,... for querying jobs). univa-drmaa is for clusters running UNIVA grid engine (which use qacct,... for querying) -- but I guess it also works for SUN grid engine (but I can not test this). Both SlurmJobRunner and UnivaJobRunner derive from DRMAAJobRunner which can not be used in the setting that submits jobs as the real user. This is because the (python) drmaa library can only query jobs that are created in the same drmaa session, but in the real user setting jobs are started by an external script ( The solution of SlurmJobRunner and UnivaJobRunner is to use the corresponding command line tools to query the job state.
I do not understand this question. |
763dc12 to
9d71988
Compare
Reimplementation of the DRMAA runner inspired by the SLURM runner. Currently tested only for the UNIVA grid engine (but I'm optimistic that it should work as well for other drmaa systems). This solves the problem that the current DRMAAJobRunner does not work when jobs are submitted as real user (because jobs that are started in a different drmaa session can not be accessed from the session that is open in galaxy): - this is done by resorting to command line tools qstat and qacct if the drmaa library can not be used to check the job status and to get run time information. - this has the additional advantage that if the drmaa library functions are not working (DRMAAJobRunner had implemented a repeated checking to handle this problem) the runner can still use the command line tools. Furthermore (in contrast to the original drmaa runner) the new one tests for run time and memory violations: - memory violations are determined by comparing the used and the requested memory - run time violations are determined by checking the signal that killed the job and by comparing the used and the requested run time Where the used memory and time are determined with drmaa.wait() or qacct Open (or better perspective): - adaptions to other grid engines. the current implementation (the command line calls and result parsing) might be specific for the Univa grid engine. to include other GEs one could determine the GE (+ version) and make the calls and result parsing depending on this. Implementation note: The changes in drmaa.py do not change the functionality at all, but only reorganize the code. In particular part of the function `check_watched_items` was put into a new function `check_watched_item` in order to make subclassing more convenient. Replaces galaxyproject#6931 (which replaced galaxyproject#4275), since I did mess up with git again (there were some duplicated commits).
9d71988 to
55f5235
Compare
|
This looks like a good, isolated first start so I'm merging. Thanks so much for the work, and sorry for making you jump through hoops about the memory handling. |
Reimplementation of the DRMAA runner inspired by the SLURM runner. Currently tested only for the UNIVA grid engine (but I'm optimistic that it should work as well for other drmaa systems).
This solves the problem that the current DRMAAJobRunner does not work when jobs are submitted as real user (because jobs that are started in a different drmaa session can not be accessed from the session that is open in galaxy):
Furthermore (in contrast to the original drmaa runner) the new one tests for run time and memory violations:
TODO:
Open (or better perspective):
Implementation note:
The changes in drmaa.py do not change the functionality at all, but only reorganize the code. In particular part of the function
check_watched_itemswas put into a new functioncheck_watched_itemin order to make subclassing more convenient.Replaces #4857 (which replaced #4275 ), since I did mess up with git again (there were some duplicated commits).