Thanks for VirusBreakend, its a really nice tool!
tldr: Is it possible to make virusbreakend work on BAMs aligned to reference genomes containing decoy sequences such that the output is identical to what would be obtained if said decoys were not included
The human reference genomes used in many of our pipelines include viral decoy sequences. One common example of this is hs37d5 (1000genomes), which includes an EBV (NC_007605) decoy sequence that seems to interfere with virusbreakend's ability to correctly identify EBV positive samples (presumably because the viral reads end up mapped?).
In my case, It would be nice to consider reads that map to the sequences NC_007605 or hs37d5 as potentially viral (and thus included in the kraken run). Further, when it comes to breakpoint calling, the breakpoints of interest are those that occur in the non-decoy sequences.
Would it be possible to add an option that makes virusbreakend aware of decoy sequences? This would allow end-users to easily integrate virusbreakend into existing workflows irrespective of the version of the human reference genome they use.
Thanks again for the tool!
Kind regards,
Sam