add modules for sra-human-scrubber#2694
Conversation
|
@rpetit3 , it looks like the version naming structure for the database has changed: |
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/sra-human-scrubber:2.0.0--hdfd78af_0': | ||
| 'quay.io/biocontainers/sra-human-scrubber:2.0.0--hdfd78af_0' }" | ||
|
|
| DBVERSION=\$(curl "https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/current/version.txt") | ||
| curl -f "https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/human_filter.db.\${DBVERSION}" -o "\${DBVERSION}.human_filter.db" |
There was a problem hiding this comment.
This could probably be done outside of a process? I mean Channel.fromPath("https://ftp.ncbi.nlm.nih.gov/sra/dbs/human_filter/human_filter.db.\${DBVERSION}" -o "\${DBVERSION}.human_filter.db") will achieve much the same thing?
There was a problem hiding this comment.
This process is currently looking up what the most recent version is, and downloading that.
There was a problem hiding this comment.
I definitely think you can do a GET and make a channel it will achieve the same thing and not require a full process?
There was a problem hiding this comment.
Yeah, that is true. I'm just trying to get PRs to a point where they can be merged in (or potentially to be closed if they aren't necessary).
One option would be to just add the scrubber module in, and leave the database fetching to be external (potentially via a local module). Otherwise this is going to change the md5sum any time the external database is changed anyway.
|
|
||
| input: | ||
| tuple val(meta), path(reads) | ||
| path db |
There was a problem hiding this comment.
| path db | |
| tuple val(meta2), path(db) |
| def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | ||
| if (meta.single_end) { | ||
| """ | ||
| zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| sra-human-scrubber: $VERSION | ||
| END_VERSIONS | ||
| """ | ||
| } else { | ||
| """ | ||
| zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | ||
| zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| sra-human-scrubber: $VERSION | ||
| sra-human-scrubber-db: \$DBVERSION | ||
| END_VERSIONS | ||
| """ | ||
| } |
There was a problem hiding this comment.
I prefer to not use meta.single_end, although I know I'm in the minority. Instead I prefer to test the number of FASTQs explicitly.
| def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | |
| if (meta.single_end) { | |
| """ | |
| zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | |
| cat <<-END_VERSIONS > versions.yml | |
| "${task.process}": | |
| sra-human-scrubber: $VERSION | |
| END_VERSIONS | |
| """ | |
| } else { | |
| """ | |
| zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | |
| zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | |
| cat <<-END_VERSIONS > versions.yml | |
| "${task.process}": | |
| sra-human-scrubber: $VERSION | |
| sra-human-scrubber-db: \$DBVERSION | |
| END_VERSIONS | |
| """ | |
| } | |
| def VERSION = '2.0.0' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | |
| def num_fastq = reads instanceof List ? reads.size() : 1 | |
| if (num_fastq == 1) { | |
| """ | |
| zcat ${reads} | scrub.sh -d $db | gzip > ${prefix}.scrubbed.fastq.gz | |
| cat <<-END_VERSIONS > versions.yml | |
| "${task.process}": | |
| sra-human-scrubber: $VERSION | |
| END_VERSIONS | |
| """ | |
| } else { | |
| // Could handle it better here but you get the idea | |
| """ | |
| zcat ${reads[0]} | scrub.sh -d $db | gzip > ${prefix}_R1.scrubbed.fastq.gz | |
| zcat ${reads[1]} | scrub.sh -d $db | gzip > ${prefix}_R2.scrubbed.fastq.gz | |
| cat <<-END_VERSIONS > versions.yml | |
| "${task.process}": | |
| sra-human-scrubber: $VERSION | |
| sra-human-scrubber-db: \$DBVERSION | |
| END_VERSIONS | |
| """ | |
| } |
There was a problem hiding this comment.
I also find the single-end modules a bit odd, have noticed that in trying to get these old PRs across the line.
|
@rpetit3 Do you still plan to finish this module? |
|
Closing due to inactivity. Please reopen if you (or anyone else) have some time to work on this PR. |

PR checklist
Closes #2693
versions.ymlfile.labelPROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-awarePROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-awarePROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-awareSRA Human Scrubber uses a >1gb database, so I went with stub runs for the tests.