Skip to content

Error when running GRIDSS #474

@GiterKR

Description

@GiterKR

Hi,

Please help me in resolving the error I get when running GRIDSS.

Here is the command I used:

../../gridss.sh -r ../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa \
				-o VCF/BH004_grids.vcf.gz \
				-a BAM/BH004_grids.bam \
				--labels BH004 \
				../BAM/BH004_ReSorted.bam 

Here is the output displayed:

Fri Mar 19 03:15:27 EDT 2021: Full log file is: ./gridss.full.20210319_031527.node155.hpc.local.301999.log
which: no time in (/usr/bin)
Fri Mar 19 03:15:27 EDT 2021: Not found /usr/bin/time
Fri Mar 19 03:15:27 EDT 2021: Using GRIDSS jar /home/rajk/gridss-2.11.0-gridss-jar-with-dependencies.jar
Fri Mar 19 03:15:27 EDT 2021: Using reference genome "../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa"
Fri Mar 19 03:15:27 EDT 2021: Using output VCF VCF/BH004_grids.vcf.gz
Fri Mar 19 03:15:27 EDT 2021: Using assembly bam  BAM/BH004_grids.bam
Fri Mar 19 03:15:27 EDT 2021: Using 8 worker threads.
Fri Mar 19 03:15:27 EDT 2021: Using no blacklist bed. The encode DAC blacklist is recommended for hg19.
Fri Mar 19 03:15:27 EDT 2021: Using JVM maximum heap size of 30g for assembly and variant calling.
Fri Mar 19 03:15:27 EDT 2021: Using input file ../BAM/BH004_ReSorted.bam
Fri Mar 19 03:15:27 EDT 2021: label is BH004
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/R/4.0.2/bin/Rscript
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/samtools/1.11/bin/samtools
Fri Mar 19 03:15:27 EDT 2021: Found /etc/alternatives/java_sdk_1.8.0/bin/java
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/bwa/0.7.10/bin/bwa
Fri Mar 19 03:15:27 EDT 2021: samtools version: 1.11+htslib-1.11
Fri Mar 19 03:15:27 EDT 2021: R version: R scripting front-end version 4.0.2 (2020-06-22)
Fri Mar 19 03:15:27 EDT 2021: bwa Version: 0.7.10-r789
which: no time in (/usr/bin)
Fri Mar 19 03:15:27 EDT 2021: bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
Fri Mar 19 03:15:28 EDT 2021: java version: openjdk version "1.8.0_242" OpenJDK Runtime Environment (build 1.8.0_242-b08)       OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
Fri Mar 19 03:15:28 EDT 2021: Max file handles: 4096
Fri Mar 19 03:15:28 EDT 2021: Running GRIDSS steps: setupreference, preprocess, assemble, call,
Fri Mar 19 03:15:28 EDT 2021: Start pre-processing      ../BAM/BH004_ReSorted.bam
Fri Mar 19 03:15:28 EDT 2021: Running   CollectGridssMetrics    ../BAM/BH004_ReSorted.bam       first 10000000 records
"$timecmd java -Xmx$otherjvmheap $jvm_args -cp $gridss_jar gridss.analysis.CollectGridssMetrics REFERENCE_SEQUENCE=$reference TMP_DIR=$dir ASSUME_SORTED=true I=$f O=$tmp_prefix THRESHOLD_COVERAGE=$maxcoverage FILE_EXTENSION=null GRIDSS_PROGRAM=null GRIDSS_PROGRAM=CollectIdsvMetrics PROGRAM=null PROGRAM=CollectInsertSizeMetrics STOP_AFTER=$metricsrecords $picardoptions" command completed with exit code 1.
*****
The underlying error message can be found in ./gridss.full.20210319_031527.node155.hpc.local.301999.log.
*****

Here is the text in the log file:

Fri Mar 19 03:15:27 EDT 2021: Full log file is: ./gridss.full.20210319_031527.node155.hpc.local.301999.log
Fri Mar 19 03:15:27 EDT 2021: Not found /usr/bin/time
Fri Mar 19 03:15:27 EDT 2021: Using GRIDSS jar /home/rajk/gridss-2.11.0-gridss-jar-with-dependencies.jar
Fri Mar 19 03:15:27 EDT 2021: Using reference genome "../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa"
Fri Mar 19 03:15:27 EDT 2021: Using output VCF VCF/BH004_grids.vcf.gz
Fri Mar 19 03:15:27 EDT 2021: Using assembly bam  BAM/BH004_grids.bam
Fri Mar 19 03:15:27 EDT 2021: Using 8 worker threads.
Fri Mar 19 03:15:27 EDT 2021: Using no blacklist bed. The encode DAC blacklist is recommended for hg19.
Fri Mar 19 03:15:27 EDT 2021: Using JVM maximum heap size of 30g for assembly and variant calling.
Fri Mar 19 03:15:27 EDT 2021: Using input file ../BAM/BH004_ReSorted.bam
Fri Mar 19 03:15:27 EDT 2021: label is BH004
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/R/4.0.2/bin/Rscript
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/samtools/1.11/bin/samtools
Fri Mar 19 03:15:27 EDT 2021: Found /etc/alternatives/java_sdk_1.8.0/bin/java
Fri Mar 19 03:15:27 EDT 2021: Found /opt/software/bwa/0.7.10/bin/bwa
Fri Mar 19 03:15:27 EDT 2021: samtools version: 1.11+htslib-1.11
Fri Mar 19 03:15:27 EDT 2021: R version: R scripting front-end version 4.0.2 (2020-06-22)
Fri Mar 19 03:15:27 EDT 2021: bwa Version: 0.7.10-r789
Fri Mar 19 03:15:27 EDT 2021: bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
Fri Mar 19 03:15:28 EDT 2021: java version: openjdk version "1.8.0_242" OpenJDK Runtime Environment (build 1.8.0_242-b08)       OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode) 
Fri Mar 19 03:15:28 EDT 2021: Max file handles: 4096
Fri Mar 19 03:15:28 EDT 2021: Running GRIDSS steps: setupreference, preprocess, assemble, call,
Fri Mar 19 03:15:28 EDT 2021: Start pre-processing      ../BAM/BH004_ReSorted.bam
Fri Mar 19 03:15:28 EDT 2021: Running   CollectGridssMetrics    ../BAM/BH004_ReSorted.bam       first 10000000 records
INFO    2021-03-19 03:15:28     Defaults        Found file for property samjdk.reference_fasta: /home/rajk/SV_MAC_BH/GRIDSS/../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa 
INFO    2021-03-19 03:15:28     CollectGridssMetrics    

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CollectGridssMetrics -REFERENCE_SEQUENCE ../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa -TMP_DIR ./BH004_ReSorted.bam.gridss.working -ASSUME_SORTED true -I ../BAM/BH004_ReSorted.bam -O ./BH004_ReSorted.bam.gridss.working/tmp.BH004_ReSorted.bam -THRESHOLD_COVERAGE 50000 -FILE_EXTENSION null -GRIDSS_PROGRAM null -GRIDSS_PROGRAM CollectIdsvMetrics -PROGRAM null -PROGRAM CollectInsertSizeMetrics -STOP_AFTER 10000000
**********


03:15:28.683 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/rajk/gridss-2.11.0-gridss-jar-with-dependencies.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Mar 19 03:15:28 EDT 2021] CollectGridssMetrics GRIDSS_PROGRAM=[CollectIdsvMetrics] THRESHOLD_COVERAGE=50000 INPUT=../BAM/BH004_ReSorted.bam ASSUME_SORTED=true STOP_AFTER=10000000 OUTPUT=./BH004_ReSorted.bam.gridss.working/tmp.BH004_ReSorted.bam FILE_EXTENSION=null PROGRAM=[CollectInsertSizeMetrics] TMP_DIR=[./BH004_ReSorted.bam.gridss.working] REFERENCE_SEQUENCE=../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa    METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_UNPAIRED=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Mar 19 03:15:28 EDT 2021] Executing as rajk@node155 on Linux 3.10.0-1127.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-b08; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.11.0-gridss
ERROR   2021-03-19 03:15:29     ReferenceCommandLineProgram     Reference genome used by ../BAM/BH004_ReSorted.bam does not match reference genome ../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa. The reference supplied must match the reference used for every input.
[Fri Mar 19 03:15:29 EDT 2021] gridss.analysis.CollectGridssMetrics done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2075918336
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: In files /home/rajk/SV_MAC_BH/GRIDSS/../BAM/BH004_ReSorted.bam and /home/rajk/SV_MAC_BH/GRIDSS/../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:345)
        at gridss.cmdline.ReferenceCommandLineProgram.ensureDictionaryMatches(ReferenceCommandLineProgram.java:117)
        at gridss.analysis.CollectGridssMetrics.doWork(CollectGridssMetrics.java:75)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:196)
        at gridss.analysis.CollectGridssMetrics.main(CollectGridssMetrics.java:57)
Caused by: htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 1 don't match: 1/69331447/10 1/85426708/2/M5=526c549b204117f61cd292042a7127d2/UR=file:/home/rajk/SV_MAC_BH/GRIDSS/../Reference/Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa
        at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
        at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:343)
        ... 5 more

Appreciate all your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions