Skip to content

Unlabeled synthetic HBV sequences in Genbank can impact VIRUSBreakend output #508

@toddajohnson

Description

@toddajohnson

I followed the instructions in virusbreakend-build to produce the virusbreakenddb and have been examining VBE (GRIDSS 2.12.0) output for 11 ICGC PCAWG HCC samples, four of which have known HBV integrations, with previous short-read results and a recent long-read sequencing based analysis. For one sample in particular (HCC RK147), about half of integration sites were not being reported by VBE. Of course, some of those could be due to cutoffs for frequency/fragment support, but when I examined the gridss.assembly.bam and viral.bam files in IGV, I noticed that regions on the ends consisted mostly of reads with MAPQ 0. Since the read extraction and breakend assembly process removes reads with low mapping quality (GridssConfiguration minMapq=20 in the log), I suspect that causes overlapping genome integration sites to not be assembled and called by VBE.

Here is the IGV figure:

RK147_adjusted_AB819617 1_igv_snapshot

Both of the HBV entries in Genbank that were chosen by VBE as the best viral sequence (AB819617.1 and AB206816.2) for the Japanese HCC samples were uploaded by Japanese researchers, but they are not patient isolates or curated viral genome sequences, but rather, they were genetically engineered. AB819617.1 has two versions of the HBV X gene (one on each end) and AB206816.2 is described as being a "1.3 x complete genome" in the Genbank entry (designed to better infect mouse models).

Unless this is already included in the recent mods made for issue #502, it seems that the database build process may need additional filters. I am currently downloading the pre-built virusbreakenddb to check the results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions