[WIP] Update PE Illumina amplicon data variation workflow by wm75 · Pull Request #82 · galaxyproject/iwc

wm75 · 2022-01-20T11:46:11Z

Most importantly, this version corrects allele-frequency calculation for called variants.

@nekrut please read the changelog carefully.
This solution to the AF issue is different from what we discussed before. It should leave actual called variants unaltered and only fix call stats. The price to pay is increased computational demand again because we need to run lofreq yet another time on each sample.

Most importantly, this version corrects allele-frequency calculation for called variants.

wm75 · 2022-01-20T15:28:50Z

Hmm, any idea what the remaining issue with the WF test is @mvdbeek ?

mvdbeek · 2022-01-20T15:34:42Z

galaxy.jobs.handler INFO 2022-01-20 14:45:47,534 [pN:main,p:5505,tN:JobHandlerQueue.monitor_thread] (14) Job dispatched
galaxy.jobs.runners ERROR 2022-01-20 14:45:47,579 [pN:main,p:5505,tN:LocalRunner.work_thread-1] (14) Failure preparing job
Traceback (most recent call last):
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/jobs/runners/__init__.py", line 243, in prepare_job
    job_wrapper.prepare()
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/jobs/__init__.py", line 1153, in prepare
    tool_evaluator.set_compute_environment(compute_environment, get_special=get_special)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/evaluation.py", line 80, in set_compute_environment
    visit_input_values(self.tool.inputs, incoming, validate_inputs)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 170, in visit_input_values
    visit_input_values(input.cases[values['__current_case__']].inputs, values, callback, new_name_prefix, label_prefix, parent_prefix=name_prefix, **payload)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 176, in visit_input_values
    callback_helper(input, input_values, name_prefix, label_prefix, parent_prefix=parent_prefix, context=context)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 137, in callback_helper
    new_value = callback(**args)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/evaluation.py", line 78, in validate_inputs
    value = input.from_json(value, request_context, context)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/basic.py", line 1898, in from_json
    raise ParameterValueError("specify a dataset of the required format / build for parameter", self.name)
galaxy.tools.parameters.basic.ParameterValueError: parameter 'ref': specify a dataset of the required format / build for parameter

Not sure which tool this is, but it fails on the ref parameter

mvdbeek · 2022-01-20T15:35:51Z

Probably lofreq-viterbi

wm75 · 2022-01-20T15:46:11Z

Ah, an unconnected ref input to the new lofreq call step! Thanks for the pointer!

mvdbeek · 2022-01-20T15:47:57Z

That's a good test for @simonbray's new linters!

mvdbeek · 2022-01-20T15:53:10Z

The best practice panel sees it, but we do need a running Galaxy to detect this, so the linter doesn't help (in this case). Galaxy OTOH should fail the invocation since that can never work.

wm75 · 2022-01-20T15:54:00Z

Oh, should have checked myself :(

wm75 · 2022-01-20T15:55:52Z

Not sure the multi QC plots output ever had a label. At least I haven't touched the step at all.

mvdbeek · 2022-01-20T15:56:41Z

Yeah, don't think that's necessary

wm75 · 2022-01-20T15:59:40Z

Lets see if the currently running test succeeds, but can just as well add the label afterwards :)

mvdbeek · 2022-01-20T16:50:41Z

Nice! But looks like there's still a minor test diff:

Output with path /tmp/tmpp6eat3jj/Final (SnpEff-) annotated variants with strand-bias soft filter applied different than expected, difference (using diff):
( /home/runner/work/iwc/iwc/workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling/test-data/final_snpeff_annotated_variants.vcf v. /tmp/tmp35y077z0final_snpeff_annotated_variants.vcf )
--- local_file
+++ history_data
@@ -1,7 +1,7 @@
 ##fileformat=VCFv4.0
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##fileDate=20220120
-##source=lofreq call --verbose --ref reference.fa --call-indels --min-cov 5 --max-depth 1000000 --min-bq 30 --min-alt-bq 30 --min-mq 20 --max-mq 255 --min-jq 0 --min-alt-jq 0 --def-alt-jq 0 --sig 0.0005 --bonf dynamic --no-default-filter --no-default-filter -r NC_045512.2:1-7475 -o /data/jwd/main/040/057/40057641/working/pp-tmp/lofreq2_call_paralleljt0osmsr/0.vcf.gz reads.bam 
+##source=lofreq call --verbose --ref reference.fa --call-indels --min-cov 5 --max-depth 1000000 --min-bq 30 --min-alt-bq 30 --min-mq 20 --max-mq 255 --min-jq 0 --min-alt-jq 0 --def-alt-jq 0 --sig 0.0005 --bonf dynamic --no-default-filter --no-default-filter -r NC_045512.2:1-14951 -o /tmp/tmpd7i4kept/job_working_directory/000/13/working/pp-tmp/lofreq2_call_parallelshucx38v/0.vcf.gz reads.bam 
 ##reference=reference.fa
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
 ##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
@@ -10,17 +10,17 @@
 ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
 ##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
 ##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
-##FILTER=<ID=min_snvqual_67,Description="Minimum SNV Quality (Phred) 67">
-##FILTER=<ID=min_indelqual_61,Description="Minimum Indel Quality (Phred) 61">
+##FILTER=<ID=min_snvqual_69,Description="Minimum SNV Quality (Phred) 69">
+##FILTER=<ID=min_indelqual_62,Description="Minimum Indel Quality (Phred) 62">
 ##FILTER=<ID=min_snvqual_73,Description="Minimum SNV Quality (Phred) 73">
 ##FILTER=<ID=min_indelqual_67,Description="Minimum Indel Quality (Phred) 67">
 ##contig=<ID=NC_045512.2>
 ##bcftools_annotateVersion=1.10.2+htslib-1.10.2
-##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 15:10:43 2022
+##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 16:13:37 2022
 ##INFO=<ID=AmpliconBias,Number=0,Type=Flag,Description="Indicates that the AF value of the variant could not be corrected for potential amplicon bias.">
-##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --mark-sites -AmpliconBias --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 15:17:46 2022
+##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --mark-sites -AmpliconBias --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 16:17:36 2022
 ##SnpEffVersion="4.5covid19 (build 2020-04-15 22:26), by Pablo Cingolani"
-##SnpEffCmd="SnpEff  -i vcf -o vcf -formatEff -classic -no-downstream -no-intergenic -no-upstream -stats /data/jwd/main/040/057/40057650/outputs/galaxy_dataset_036d8a69-858f-4d42-b06f-8735386f42c1.dat NC_045512.2 /data/dnb05/galaxy_db/files/1/e/f/dataset_1ef4bb48-42ef-4ea2-919e-59b030065daf.dat "
+##SnpEffCmd="SnpEff  -i vcf -o vcf -formatEff -classic -no-downstream -no-intergenic -no-upstream -stats /tmp/tmpd7i4kept/files/000/dataset_39.dat NC_045512.2 /tmp/tmpd7i4kept/files/000/dataset_37.dat "
 ##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank  | Genotype [ | ERRORS | WARNINGS ] )' ">
 ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
 ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">

Maybe increase the lines_diff ?

wm75 · 2022-01-20T17:23:41Z

I think I've seen that small difference in FDR adjusted QUAL score thresholds (the min_snvqual and min_indelqual INFO lines) between the workflow on .eu and in testing here previously. It's a bit worrying that this is not 100% reproducible. Given how small the difference is, however, it's probably ok to ignore it for now?

mvdbeek · 2022-01-24T13:47:05Z

From @wm75:

I think the AF fix for the Illumina variation workflow may need another round of discussing.

mvdbeek · 2022-02-04T14:37:42Z

Did we decide not to pursue this @wm75 ?

wm75 · 2022-02-14T11:40:58Z

The solution here got superseded by #94.

wm75 added 2 commits January 20, 2022 12:38

Update PE Illumina amplicon data variation workflow

d22afa1

Most importantly, this version corrects allele-frequency calculation for called variants.

Update WF test output

747793f

Fix missing ref to lofreq connection

4dc255b

Fix missing output label

0e5ecb4

Partially revert test data header, increase lines diff

14a34e6

mvdbeek approved these changes Jan 20, 2022

View reviewed changes

wm75 mentioned this pull request Jan 21, 2022

Updating workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling from 0.4.2 to 0.4.3 simonbray/iwc#37

Closed

mvdbeek changed the title ~~Update PE Illumina amplicon data variation workflow~~ [WIP] Update PE Illumina amplicon data variation workflow Jan 24, 2022

wm75 closed this Feb 14, 2022

Conversation

wm75 commented Jan 20, 2022

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 20, 2022

Uh oh!

wm75 commented Jan 20, 2022

Uh oh!

mvdbeek commented Jan 24, 2022

Uh oh!

mvdbeek commented Feb 4, 2022

Uh oh!

wm75 commented Feb 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mvdbeek commented Jan 20, 2022 •

edited

Loading