Skip to content

samtools view filtering and documentation#1346

Closed
jkbonfield wants to merge 2 commits intosamtools:developfrom
jkbonfield:filter_doc
Closed

samtools view filtering and documentation#1346
jkbonfield wants to merge 2 commits intosamtools:developfrom
jkbonfield:filter_doc

Conversation

@jkbonfield
Copy link
Copy Markdown
Contributor

@jkbonfield jkbonfield commented Nov 25, 2020

Add filter expressions to samtools.1 man page

Also add -e FILTER to samtools view.

This is a validation of the new sam_passes_filter function, meaning it can be used in conjunction with -U to output the reads that do not match the filter.

Infact both can be used together. For example

samtools view \
    --input-fmt-option "filter=rlen-qlen >= 10" \
    -e 'ncigar <= 10' in.bam -o out.sam -U outU.sam

will pre-filter to only reads where the length on the reference is at least 10 bases longer than the length of the sequence. Those that pass are then written to either out.sam or outU.sam depending on whether they have <= 10 cigar operations.

sam_view.c Outdated
case 1: no_pg = 1; break;
case 'z':
if (!(settings.filter = hts_filter_init(optarg))) {
fprintf(stderr, "main_samview: Failed to parse filter\n");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print_error()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I just copied the local style from about 10 lines above (-x), but can change it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 /fprintf.*main_samview/ vs 29 /print_error/ in sam_view.c — gotta keep that ratio improving one line at a time 😄

@jmarshall
Copy link
Copy Markdown
Member

What's the reason for choosing -z? e.g. samtools view -e is available (though -e is used in calmd, is a no-longer-documented legacy option for mpileup, and an undocumented option for phase — so perhaps it's that you were aiming for something universally available in samtools?).

@jkbonfield
Copy link
Copy Markdown
Contributor Author

To be honest no real reason! I initially started with -z because so many others were in use that I just went through the alphabet (backwards), but it kind of stuck. I'm not planning on adding this option elsewhere. View is different as it's the most commonly used tool and as you pointed out it needs different logic to cope with the -U option anyway.

Indeed -e is a better choice.

@jkbonfield jkbonfield force-pushed the filter_doc branch 2 times, most recently from 2295408 to cb58fdd Compare December 1, 2020 10:10
@daviesrob daviesrob changed the title Filter doc samtools view filtering and documentation Dec 1, 2020
char** remove_aux;
int multi_region;
char* tag;
hts_filter_t *filter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "htslib/hts_expr.h" for this.

sam_view.c Outdated
}
}

if (settings->filter && !sam_passes_filter(h, b, settings->filter))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sam_passes_filter returns -1 on error, this check should take this result into account. Thus,

if (settings->filter && sam_passes_filter(h, b, settings->filter) < 1)
    return 1;

might be more appropriate.

sam_view.c Outdated
case 1: no_pg = 1; break;
case 'e':
if (!(settings.filter = hts_filter_init(optarg))) {
print_error("main_samview", "Failed to parse filter");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to also print the filter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's wrong anyway as it doesn't parse it here. I'll reword.

doc/samtools.1 Outdated
decoding. For BAM this will be encoding only. In CRAM the threads
are dynamically shared between encoder and decoder.
.TP
.BI filter STRING
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.BI filter= STRING

doc/samtools.1 Outdated
Math ops: *, /, % Multiply, division and (integer) modulo
Math ops: +, - Addition / subtractin
Bit-wise: & Integer AND
Bit-wise | Integer OR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XOR (^) has higher precedence than OR (|).

This is a validation of the sam_passes_filter function, meaning it can
be used in conjunction with -U to output the reads that do not match
the filter.

Infact both can be used together.  For example

    samtools view \
        --input-fmt-option "filter=rlen-qlen >= 10" \
        -e 'ncigar <= 10' in.bam -o out.sam -U outU.sam

will pre-filter to only reads where the length on the reference is at
least 10 bases longer than the length of the sequence.  Those that
pass arr then written to either out.sam or outU.sam depending on
whether they have <= 10 cigar operations.
@jkbonfield
Copy link
Copy Markdown
Contributor Author

Thanks Valeriu.

Updated. Will take a look at htslib one again.

@valeriuo
Copy link
Copy Markdown
Contributor

valeriuo commented Jan 7, 2021

Rebased and merged as 1e3f85d and 9abe06b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants