Add a check for multi-ref BED files in ampliconclip & ampliconstats.#1398
Add a check for multi-ref BED files in ampliconclip & ampliconstats.#1398daviesrob merged 2 commits intosamtools:developfrom
Conversation
Long term we wish to make these cope, but for now we take a more defensive approach by recognising the case where we would give incorrect results. This extends bed_pair_list_t with a fixed length buffer, which is a temporary fudge. It'll need something like the pooled string allocator and a ref name pointer within bed_pair_t to properly deal with multi-ref BEDs. Ampliconstats now also has code to filter out any reads that are for another chromosome, so if we have a multi-ref BAM and a single-ref BED then it'll still work correctly.
11ed59c to
d9bd540
Compare
| char strand; | ||
|
|
||
| if (sscanf(line.s, "%*s %"SCNd64" %"SCNd64" %*s %*s %c", &left, &right, &strand) != 3) { | ||
| if (sscanf(line.s, "%255s %"SCNd64" %"SCNd64" %*s %*s %c", |
There was a problem hiding this comment.
So, there are two sscanf statements depending on whether the strand is being read or not. Should both be changed to handle the ref name.
There was a problem hiding this comment.
Also, in my copyright police role, the years need changing.
There was a problem hiding this comment.
Probably yes - I didn't notice the second sscanf. I'll do both.
|
As well as the copyright date, could you add something to the NEWS file for this as well, please? |
50c3a45 to
ec2c3da
Compare
|
Hmm, feeding |
ec2c3da to
62685d9
Compare
62685d9 to
8e2b2e0
Compare
|
Ok I've done that. What it really needs is create and destroy functions so that the memory is handled by function calls rather than the callers doing it all manually. However it was done in a bit of a rush at the time of writing, and we're in a rush now too as we don't want to hold up the release. When we fix it to handle multiple refs we can refactor it more, but I'm hesitant to rush out a bunch of complex changes at the last minute. |
|
I think it's fine for now, thanks. |
(See #1396. This isn't a fix, but it detects the problem and responds more gracefully than a crash.)
Long term we wish to make these cope, but for now we take a more
defensive approach by recognising the case where we would give
incorrect results.
This extends bed_pair_list_t with a fixed length buffer, which is a
temporary fudge. It'll need something like the pooled string
allocator and a ref name pointer within bed_pair_t to properly deal
with multi-ref BEDs.
Ampliconstats now also has code to filter out any reads that are for
another chromosome, so if we have a multi-ref BAM and a single-ref BED
then it'll still work correctly.