Are you using the latest version of samtools and HTSlib? If not, please specify.
(run samtools --version)
I am using samtools 1.11 (htslib 1.11)
Please describe your environment.
- OS (run
uname -sr on Linux/Mac OS or wmic os get Caption, Version on Windows)
- machine architecture (run
uname -m on Linux/Mac OS or wmic os get OSArchitecture on Windows)
- compiler (run
gcc --version or clang --version)
CentOS Linux 7 x86_64 with gcc 4.8.5
Please specify the steps taken to generate the issue, the command you are running and the relevant output.
I have encountered an issue with samtools markdup running past 8 days without actually completing on a few BAMs. I don’t get any relevant output since it just doesn’t finish. Digging deeper, I found out that a ~400 bp region in chromosome 2, with more than 8 million reads, is causing this issue. It looks like markdup just isn't optimized to deal with abnormally high depth regions, unless, I missed some parameter that would help. I have created micro BAMs of that region (with an additional 500bp on both ends) and tried running markdup on them but I still get the same results. For both types of BAMs, I used the command (BAM names were replaced with "micro"):
samtools markdup \
-d “2500” \
-S \
-f “micro.samtools.markdup.txt” \
--threads 4 \
--write-index \
-T “micro” \
“micro.bam” \
“micro.md.bam”
Also, I ran GATK MarkDuplicates (v4.1.8.0) on these micro BAMs and they finished under 35 minutes. The GATK command I used was the following:
gatk MarkDuplicates \
--java-options “-Xmx28G” \
--TMP_DIR “temp” \
--ASSUME_SORT_ORDER coordinate \
--CLEAR_DT false \
--ADD_PG_TAG_TO_READS false \
--OPTICAL_DUPLICATE_PIXEL_DISTANCE “2500” \
--INPUT "micro.bam" \
--METRICS_FILE “micro.gatk.mdmetrics.txt” \
--OUTPUT “micro.md.bam”
If needed, I could provide a micro BAM that is causing this issue. Thank you.
Are you using the latest version of samtools and HTSlib? If not, please specify.
(run
samtools --version)I am using samtools 1.11 (htslib 1.11)
Please describe your environment.
uname -sron Linux/Mac OS orwmic os get Caption, Versionon Windows)uname -mon Linux/Mac OS orwmic os get OSArchitectureon Windows)gcc --versionorclang --version)CentOS Linux 7 x86_64 with gcc 4.8.5
Please specify the steps taken to generate the issue, the command you are running and the relevant output.
I have encountered an issue with samtools markdup running past 8 days without actually completing on a few BAMs. I don’t get any relevant output since it just doesn’t finish. Digging deeper, I found out that a ~400 bp region in chromosome 2, with more than 8 million reads, is causing this issue. It looks like markdup just isn't optimized to deal with abnormally high depth regions, unless, I missed some parameter that would help. I have created micro BAMs of that region (with an additional 500bp on both ends) and tried running markdup on them but I still get the same results. For both types of BAMs, I used the command (BAM names were replaced with "micro"):
Also, I ran GATK MarkDuplicates (v4.1.8.0) on these micro BAMs and they finished under 35 minutes. The GATK command I used was the following:
If needed, I could provide a micro BAM that is causing this issue. Thank you.