samtools view filtering and documentation#1346
samtools view filtering and documentation#1346jkbonfield wants to merge 2 commits intosamtools:developfrom
Conversation
sam_view.c
Outdated
| case 1: no_pg = 1; break; | ||
| case 'z': | ||
| if (!(settings.filter = hts_filter_init(optarg))) { | ||
| fprintf(stderr, "main_samview: Failed to parse filter\n"); |
There was a problem hiding this comment.
Heh, I just copied the local style from about 10 lines above (-x), but can change it.
There was a problem hiding this comment.
16 /fprintf.*main_samview/ vs 29 /print_error/ in sam_view.c — gotta keep that ratio improving one line at a time 😄
|
What's the reason for choosing |
|
To be honest no real reason! I initially started with -z because so many others were in use that I just went through the alphabet (backwards), but it kind of stuck. I'm not planning on adding this option elsewhere. View is different as it's the most commonly used tool and as you pointed out it needs different logic to cope with the -U option anyway. Indeed -e is a better choice. |
2295408 to
cb58fdd
Compare
| char** remove_aux; | ||
| int multi_region; | ||
| char* tag; | ||
| hts_filter_t *filter; |
There was a problem hiding this comment.
#include "htslib/hts_expr.h" for this.
sam_view.c
Outdated
| } | ||
| } | ||
|
|
||
| if (settings->filter && !sam_passes_filter(h, b, settings->filter)) |
There was a problem hiding this comment.
Since sam_passes_filter returns -1 on error, this check should take this result into account. Thus,
if (settings->filter && sam_passes_filter(h, b, settings->filter) < 1)
return 1;might be more appropriate.
sam_view.c
Outdated
| case 1: no_pg = 1; break; | ||
| case 'e': | ||
| if (!(settings.filter = hts_filter_init(optarg))) { | ||
| print_error("main_samview", "Failed to parse filter"); |
There was a problem hiding this comment.
It might be useful to also print the filter.
There was a problem hiding this comment.
Actually it's wrong anyway as it doesn't parse it here. I'll reword.
doc/samtools.1
Outdated
| decoding. For BAM this will be encoding only. In CRAM the threads | ||
| are dynamically shared between encoder and decoder. | ||
| .TP | ||
| .BI filter STRING |
doc/samtools.1
Outdated
| Math ops: *, /, % Multiply, division and (integer) modulo | ||
| Math ops: +, - Addition / subtractin | ||
| Bit-wise: & Integer AND | ||
| Bit-wise | Integer OR |
There was a problem hiding this comment.
XOR (^) has higher precedence than OR (|).
This is a validation of the sam_passes_filter function, meaning it can
be used in conjunction with -U to output the reads that do not match
the filter.
Infact both can be used together. For example
samtools view \
--input-fmt-option "filter=rlen-qlen >= 10" \
-e 'ncigar <= 10' in.bam -o out.sam -U outU.sam
will pre-filter to only reads where the length on the reference is at
least 10 bases longer than the length of the sequence. Those that
pass arr then written to either out.sam or outU.sam depending on
whether they have <= 10 cigar operations.
|
Thanks Valeriu. Updated. Will take a look at htslib one again. |
Add filter expressions to samtools.1 man page
Also add -e FILTER to samtools view.
This is a validation of the new
sam_passes_filterfunction, meaning it can be used in conjunction with -U to output the reads that do not match the filter.Infact both can be used together. For example
will pre-filter to only reads where the length on the reference is at least 10 bases longer than the length of the sequence. Those that pass are then written to either out.sam or outU.sam depending on whether they have <= 10 cigar operations.