Skip to content

[view] Added -N to filter by file containing read names#1324

Merged
daviesrob merged 1 commit intosamtools:developfrom
d-cameron:view_readnames
Oct 14, 2020
Merged

[view] Added -N to filter by file containing read names#1324
daviesrob merged 1 commit intosamtools:developfrom
d-cameron:view_readnames

Conversation

@d-cameron
Copy link
Copy Markdown

Also refactored the string khash sets to remove code duplication.

This feature has come up multiple times on biostars and I personally have had to use (the much slower) picard.FilterSamReads multiple times purely because this option is missing from samtools.

More generally, what is the threshold for inclusion into samtools? I'm slowly building up a collection of useful but not-quite-as-universally-useful-as-samtools utilities. Should I make PRs for them, or keep them in a specialised utility? For example, my utility that converts just the unmapped bases (ie, unmapped reads, soft clipped bases, and the bases that don't map to any chimeric alignment) to fastq could be a PR to samtools fastq/fasta.

@d-cameron d-cameron changed the title Added view -N to filter by file containing read names [view] Added -N to filter by file containing read names Oct 13, 2020
@daviesrob
Copy link
Copy Markdown
Member

Looks good thanks, especially removing the code duplication. I've taken the liberty of pushing a commit to add the option into the man page, hopefully that looks alright. If it is, I'll squash everything together and merge it.

Do let us know if you think you have anything else that might be useful. There's no hard-and-fast rules about what gets included - I would guess the criteria is how much work there is in porting it into the samtools framework plus ongoing maintenance compared to how many people are likely to want to use it.

@d-cameron
Copy link
Copy Markdown
Author

d-cameron commented Oct 14, 2020

Thanks for the prompt response. It's now rebased into a single commit. Let me know if there's anything else needed for this PR.

@daviesrob daviesrob merged commit 7028dd4 into samtools:develop Oct 14, 2020
@daviesrob
Copy link
Copy Markdown
Member

Nothing else needed, and now merged.

Thanks.

@mcshane
Copy link
Copy Markdown
Contributor

mcshane commented Oct 15, 2020

Could this be extended to exclude readnames too? I use both sets of logic in different contexts. Perhaps adding a ^ to the front of the FILE could indicate exclusion as used in bcftools.

@daviesrob
Copy link
Copy Markdown
Member

You can invert the filter at the moment, albeit not completely efficiently, using the -U option ("output reads not selected by filters to FILE"). So just use -U to output the data you want and -o /dev/null to throw away the rest. E.g.:

samtools view -N names -b -o /dev/null -U keep.bam in.bam

You can also use -U - to send the data to stdout.

@mcshane
Copy link
Copy Markdown
Contributor

mcshane commented Oct 15, 2020

Ha! Never would have though to try that. Thanks.

@fengcong3
Copy link
Copy Markdown

Hi, I would like to ask how is the speed of this function? Is it random access? or traverse the input file

@d-cameron
Copy link
Copy Markdown
Author

d-cameron commented Aug 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants