[view] Added -N to filter by file containing read names#1324
[view] Added -N to filter by file containing read names#1324daviesrob merged 1 commit intosamtools:developfrom
Conversation
|
Looks good thanks, especially removing the code duplication. I've taken the liberty of pushing a commit to add the option into the man page, hopefully that looks alright. If it is, I'll squash everything together and merge it. Do let us know if you think you have anything else that might be useful. There's no hard-and-fast rules about what gets included - I would guess the criteria is how much work there is in porting it into the samtools framework plus ongoing maintenance compared to how many people are likely to want to use it. |
0d76c05 to
7028dd4
Compare
|
Thanks for the prompt response. It's now rebased into a single commit. Let me know if there's anything else needed for this PR. |
|
Nothing else needed, and now merged. Thanks. |
|
Could this be extended to exclude readnames too? I use both sets of logic in different contexts. Perhaps adding a |
|
You can invert the filter at the moment, albeit not completely efficiently, using the You can also use |
|
Ha! Never would have though to try that. Thanks. |
|
Hi, I would like to ask how is the speed of this function? Is it random access? or traverse the input file |
|
In-memory hash table for read name filtering then standard samtools view
for the reads. If you filter to a region it'll use the index
…On Fri, 25 Aug 2023, 1:25 pm Cong Feng, ***@***.***> wrote:
Hi, I would like to ask how is the speed of this function? Is it random
access? or traverse the input file
—
Reply to this email directly, view it on GitHub
<#1324 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOBYOAD6XKHSFMSAU7TJITXXALI7ANCNFSM4SOAUEQQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Also refactored the string khash sets to remove code duplication.
This feature has come up multiple times on biostars and I personally have had to use (the much slower)
picard.FilterSamReadsmultiple times purely because this option is missing from samtools.More generally, what is the threshold for inclusion into samtools? I'm slowly building up a collection of useful but not-quite-as-universally-useful-as-samtools utilities. Should I make PRs for them, or keep them in a specialised utility? For example, my utility that converts just the unmapped bases (ie, unmapped reads, soft clipped bases, and the bases that don't map to any chimeric alignment) to fastq could be a PR to
samtools fastq/fasta.