Skip to content

Bogart failure in Canu 2.1 - error with AS_BAT_MarkRepeatReads #1806

@dportik

Description

@dportik

Hello,
I am performing a metagenome assembly with PacBio HiFi data and have run into a repeatable error. I am using canu 2.1 with:
canu -d STD -p Zymo6331-STD -pacbio-hifi Zymo6331-STD.fastq genomeSize=100m maxInputCoverage=1000 batMemory=200

I am running on SGE with the same configuration I have used to run other assemblies successfully (gridEngineResourceOption="-pe smp THREADS -l mem_free=MEMORY" and gridOptions="-V -S /bin/bash -q bigmem").

I am working with three samples, each run using the same set of arguments as above (but with different -d, -p, and fastq names). Two finished without any issues. For one sample, the run eventually fails. I scanned the error message and it suggested trying to run it again. I did, and it produced the same error again. It suggests the issue occurs when running Bogart:

Found perl:
   /bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /mnt/software/j/jre/1.8.0_144/bin/java
   java version "1.8.0_144"

Found canu:
   /mnt/software/c/canu/2.1/bin/canu
   canu 2.1

-- canu 2.1
--
-- CITATIONS
--
-- For assemblies of PacBio HiFi reads:
--   Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S.
--   HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
--   biorXiv. 2020.
--   https://doi.org/10.1101/2020.03.14.992248
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_144' (from '/mnt/software/j/jre/1.8.0_144/bin/java') with -d64 support.
-- Detected gnuplot version '4.6 patchlevel 2   ' (from 'gnuplot') and image format 'png'.
-- Detected 72 CPUs and 504 gigabytes of memory.
-- Detected Sun Grid Engine in '/usr/share/gridengine/default'.
-- User supplied Parallel Environment 'smp'.
-- User supplied Memory Resource      'mem_free'.
-- 
-- Found  17 hosts with  80 cores and  754 GB memory under Sun Grid Engine control.
-- Found   1 host  with  48 cores and  440 GB memory under Sun Grid Engine control.
-- Found   4 hosts with  24 cores and   47 GB memory under Sun Grid Engine control.
-- Found  18 hosts with  48 cores and  251 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  16 cores and   46 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  80 cores and  755 GB memory under Sun Grid Engine control.
-- Found   1 host  with  24 cores and   39 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  56 cores and  503 GB memory under Sun Grid Engine control.
-- Found   2 hosts with  24 cores and   31 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  72 cores and  503 GB memory under Sun Grid Engine control.
-- Found  13 hosts with  24 cores and   94 GB memory under Sun Grid Engine control.
-- Found  26 hosts with  48 cores and  503 GB memory under Sun Grid Engine control.
-- Found   4 hosts with 256 cores and  503 GB memory under Sun Grid Engine control.
-- Found   2 hosts with  16 cores and   94 GB memory under Sun Grid Engine control.
-- Found   1 host  with  48 cores and  188 GB memory under Sun Grid Engine control.
-- Found   1 host  with  24 cores and  141 GB memory under Sun Grid Engine control.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     13.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       10.000 GB    8 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   10.000 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  cor       16.000 GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8.000 GB    4 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200.000 GB    8 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- In 'Zymo6331-ULI.seqStore', found PacBio HiFi reads:
--   PacBio HiFi:              1
--
--   Corrected:                1
--   Corrected and Trimmed:    1
--
-- Generating assembly 'Zymo6331-ULI' in '/dept/appslab/projects/old/2020/dp_Zymo/HiCanu_Assembly/ULI':
--    - assemble HiFi reads.
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0000 (  0.00%)
--    obtOvlErrorRate 0.0250 (  2.50%)
--    utgOvlErrorRate 0.0100 (  1.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0000 (  0.00%)
--    obtErrorRate    0.0250 (  2.50%)
--    utgErrorRate    0.0100 (  1.00%)
--    cnsErrorRate    0.0500 (  5.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: canu 2.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  91264.777 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   mergeOrphans()-- ignored       0               tigs with 0 reads; failed to place
ABORT:   mergeOrphans()--
ABORT:   
ABORT:   ==> MARK SIMPLE BUBBLES.
ABORT:       using 0.010000 user-specified threshold
ABORT:   
ABORT:   
ABORT:   findPotentialOrphans()-- working on 4954 tigs.
ABORT:   findPotentialOrphans()-- found 3866 potential orphans.
ABORT:   mergeOrphans()-- flagged    3691        bubble tigs with 550991 reads
ABORT:   mergeOrphans()-- placed        0 unique orphan tigs with 0 reads
ABORT:   mergeOrphans()-- shattered     0 repeat orphan tigs with 0 reads
ABORT:   mergeOrphans()-- ignored       1               tigs with 38 reads; failed to place
ABORT:   mergeOrphans()--
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- singleton
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
ABORT:   classifyAsUnassembled()--     58 tigs      687914 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
ABORT:   classifyAsUnassembled()--   4758 tigs   111950257 bases -- acceptable contigs
ABORT:   
ABORT:   
ABORT:   ==> GENERATING ASSEMBLY GRAPH.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 4954 tigs, with 8 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   AssemblyGraph()-- allocating vectors for placements, 117.061MB
ABORT:   AssemblyGraph()-- finding edges for 885032 reads (821084 contained), ignoring 1672196 unplaced reads, with 8 threads.
ABORT:   AssemblyGraph()-- building reverse edges.
ABORT:   AssemblyGraph()-- build complete.
ABORT:   
ABORT:   ==> BREAK REPEATS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 4954 tigs, with 8 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   bogart: bogart/AS_BAT_MarkRepeatReads.C:887: std::vector<breakReadEnd> buildBreakPoints(TigVector&, Unitig*, intervalList<int>&, intervalList<int>&, std::vector<confusedEdge>&): Assertion `isRepeat == true' failed.
ABORT:   
ABORT:   Failed with 'Aborted'; backtrace (libbacktrace):
ABORT:   utility/src/utility/system-stackTrace.C::83 in _Z17AS_UTL_catchCrashiP7siginfoPv()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   bogart/AS_BAT_MarkRepeatReads.C::887 in _Z16buildBreakPointsR9TigVectorP6UnitigR12intervalListIiES5_RSt6vectorI12confusedEdgeSaIS7_EE()
ABORT:   bogart/AS_BAT_MarkRepeatReads.C::1051 in _Z15markRepeatReadsP13AssemblyGraphR9TigVectordjdRSt6vectorI12confusedEdgeSaIS4_EE()
ABORT:   bogart/bogart.C::656 in main()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:

Any suggestions as to what might be going wrong here? I have attached the unitigger.err file here too (unitigger.err.txt), in case that helps.

Thanks!
Dan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions