Skip to content

[WIP] Update PE Illumina amplicon data variation workflow#82

Closed
wm75 wants to merge 5 commits intogalaxyproject:mainfrom
wm75:allele-frequency-fix
Closed

[WIP] Update PE Illumina amplicon data variation workflow#82
wm75 wants to merge 5 commits intogalaxyproject:mainfrom
wm75:allele-frequency-fix

Conversation

@wm75
Copy link
Member

@wm75 wm75 commented Jan 20, 2022

Most importantly, this version corrects allele-frequency calculation for called variants.

@nekrut please read the changelog carefully.
This solution to the AF issue is different from what we discussed before. It should leave actual called variants unaltered and only fix call stats. The price to pay is increased computational demand again because we need to run lofreq yet another time on each sample.

wm75 added 2 commits January 20, 2022 12:38
Most importantly, this version corrects allele-frequency calculation for
called variants.
@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

Hmm, any idea what the remaining issue with the WF test is @mvdbeek ?

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

galaxy.jobs.handler INFO 2022-01-20 14:45:47,534 [pN:main,p:5505,tN:JobHandlerQueue.monitor_thread] (14) Job dispatched
galaxy.jobs.runners ERROR 2022-01-20 14:45:47,579 [pN:main,p:5505,tN:LocalRunner.work_thread-1] (14) Failure preparing job
Traceback (most recent call last):
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/jobs/runners/__init__.py", line 243, in prepare_job
    job_wrapper.prepare()
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/jobs/__init__.py", line 1153, in prepare
    tool_evaluator.set_compute_environment(compute_environment, get_special=get_special)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/evaluation.py", line 80, in set_compute_environment
    visit_input_values(self.tool.inputs, incoming, validate_inputs)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 170, in visit_input_values
    visit_input_values(input.cases[values['__current_case__']].inputs, values, callback, new_name_prefix, label_prefix, parent_prefix=name_prefix, **payload)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 176, in visit_input_values
    callback_helper(input, input_values, name_prefix, label_prefix, parent_prefix=parent_prefix, context=context)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/__init__.py", line 137, in callback_helper
    new_value = callback(**args)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/evaluation.py", line 78, in validate_inputs
    value = input.from_json(value, request_context, context)
  File "/tmp/tmpyv_uw16j/galaxy-dev/lib/galaxy/tools/parameters/basic.py", line 1898, in from_json
    raise ParameterValueError("specify a dataset of the required format / build for parameter", self.name)
galaxy.tools.parameters.basic.ParameterValueError: parameter 'ref': specify a dataset of the required format / build for parameter

Not sure which tool this is, but it fails on the ref parameter

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

Probably lofreq-viterbi

@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

Ah, an unconnected ref input to the new lofreq call step! Thanks for the pointer!

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

That's a good test for @simonbray's new linters!

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

The best practice panel sees it, but we do need a running Galaxy to detect this, so the linter doesn't help (in this case). Galaxy OTOH should fail the invocation since that can never work.
Screenshot 2022-01-20 at 16 49 06

@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

Oh, should have checked myself :(

@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

Not sure the multi QC plots output ever had a label. At least I haven't touched the step at all.

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

Yeah, don't think that's necessary

@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

Lets see if the currently running test succeeds, but can just as well add the label afterwards :)

@mvdbeek
Copy link
Member

mvdbeek commented Jan 20, 2022

Nice! But looks like there's still a minor test diff:

Output with path /tmp/tmpp6eat3jj/Final (SnpEff-) annotated variants with strand-bias soft filter applied different than expected, difference (using diff):
( /home/runner/work/iwc/iwc/workflows/sars-cov-2-variant-calling/sars-cov-2-pe-illumina-artic-variant-calling/test-data/final_snpeff_annotated_variants.vcf v. /tmp/tmp35y077z0final_snpeff_annotated_variants.vcf )
--- local_file
+++ history_data
@@ -1,7 +1,7 @@
 ##fileformat=VCFv4.0
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##fileDate=20220120
-##source=lofreq call --verbose --ref reference.fa --call-indels --min-cov 5 --max-depth 1000000 --min-bq 30 --min-alt-bq 30 --min-mq 20 --max-mq 255 --min-jq 0 --min-alt-jq 0 --def-alt-jq 0 --sig 0.0005 --bonf dynamic --no-default-filter --no-default-filter -r NC_045512.2:1-7475 -o /data/jwd/main/040/057/40057641/working/pp-tmp/lofreq2_call_paralleljt0osmsr/0.vcf.gz reads.bam 
+##source=lofreq call --verbose --ref reference.fa --call-indels --min-cov 5 --max-depth 1000000 --min-bq 30 --min-alt-bq 30 --min-mq 20 --max-mq 255 --min-jq 0 --min-alt-jq 0 --def-alt-jq 0 --sig 0.0005 --bonf dynamic --no-default-filter --no-default-filter -r NC_045512.2:1-14951 -o /tmp/tmpd7i4kept/job_working_directory/000/13/working/pp-tmp/lofreq2_call_parallelshucx38v/0.vcf.gz reads.bam 
 ##reference=reference.fa
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
 ##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
@@ -10,17 +10,17 @@
 ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
 ##INFO=<ID=CONSVAR,Number=0,Type=Flag,Description="Indicates that the variant is a consensus variant (as opposed to a low frequency variant).">
 ##INFO=<ID=HRUN,Number=1,Type=Integer,Description="Homopolymer length to the right of report indel position">
-##FILTER=<ID=min_snvqual_67,Description="Minimum SNV Quality (Phred) 67">
-##FILTER=<ID=min_indelqual_61,Description="Minimum Indel Quality (Phred) 61">
+##FILTER=<ID=min_snvqual_69,Description="Minimum SNV Quality (Phred) 69">
+##FILTER=<ID=min_indelqual_62,Description="Minimum Indel Quality (Phred) 62">
 ##FILTER=<ID=min_snvqual_73,Description="Minimum SNV Quality (Phred) 73">
 ##FILTER=<ID=min_indelqual_67,Description="Minimum Indel Quality (Phred) 67">
 ##contig=<ID=NC_045512.2>
 ##bcftools_annotateVersion=1.10.2+htslib-1.10.2
-##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 15:10:43 2022
+##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 16:13:37 2022
 ##INFO=<ID=AmpliconBias,Number=0,Type=Flag,Description="Indicates that the AF value of the variant could not be corrected for potential amplicon bias.">
-##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --mark-sites -AmpliconBias --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 15:17:46 2022
+##bcftools_annotateCommand=annotate --columns QUAL,INFO --annotations annotations.vcf.gz --mark-sites -AmpliconBias --output-type v --threads 1 input.vcf.gz; Date=Thu Jan 20 16:17:36 2022
 ##SnpEffVersion="4.5covid19 (build 2020-04-15 22:26), by Pablo Cingolani"
-##SnpEffCmd="SnpEff  -i vcf -o vcf -formatEff -classic -no-downstream -no-intergenic -no-upstream -stats /data/jwd/main/040/057/40057650/outputs/galaxy_dataset_036d8a69-858f-4d42-b06f-8735386f42c1.dat NC_045512.2 /data/dnb05/galaxy_db/files/1/e/f/dataset_1ef4bb48-42ef-4ea2-919e-59b030065daf.dat "
+##SnpEffCmd="SnpEff  -i vcf -o vcf -formatEff -classic -no-downstream -no-intergenic -no-upstream -stats /tmp/tmpd7i4kept/files/000/dataset_39.dat NC_045512.2 /tmp/tmpd7i4kept/files/000/dataset_37.dat "
 ##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank  | Genotype [ | ERRORS | WARNINGS ] )' ">
 ##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
 ##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">

Maybe increase the lines_diff ?

@wm75
Copy link
Member Author

wm75 commented Jan 20, 2022

I think I've seen that small difference in FDR adjusted QUAL score thresholds (the min_snvqual and min_indelqual INFO lines) between the workflow on .eu and in testing here previously. It's a bit worrying that this is not 100% reproducible. Given how small the difference is, however, it's probably ok to ignore it for now?

@mvdbeek mvdbeek changed the title Update PE Illumina amplicon data variation workflow [WIP] Update PE Illumina amplicon data variation workflow Jan 24, 2022
@mvdbeek
Copy link
Member

mvdbeek commented Jan 24, 2022

From @wm75:

I think the AF fix for the Illumina variation workflow may need another round of discussing.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 4, 2022

Did we decide not to pursue this @wm75 ?

@wm75
Copy link
Member Author

wm75 commented Feb 14, 2022

The solution here got superseded by #94.

@wm75 wm75 closed this Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants