Skip to content

Add a check for multi-ref BED files in ampliconclip & ampliconstats.#1398

Merged
daviesrob merged 2 commits intosamtools:developfrom
jkbonfield:amplicon-multi-ref-bed
Mar 16, 2021
Merged

Add a check for multi-ref BED files in ampliconclip & ampliconstats.#1398
daviesrob merged 2 commits intosamtools:developfrom
jkbonfield:amplicon-multi-ref-bed

Conversation

@jkbonfield
Copy link
Copy Markdown
Contributor

(See #1396. This isn't a fix, but it detects the problem and responds more gracefully than a crash.)

Long term we wish to make these cope, but for now we take a more
defensive approach by recognising the case where we would give
incorrect results.

This extends bed_pair_list_t with a fixed length buffer, which is a
temporary fudge. It'll need something like the pooled string
allocator and a ref name pointer within bed_pair_t to properly deal
with multi-ref BEDs.

Ampliconstats now also has code to filter out any reads that are for
another chromosome, so if we have a multi-ref BAM and a single-ref BED
then it'll still work correctly.

Long term we wish to make these cope, but for now we take a more
defensive approach by recognising the case where we would give
incorrect results.

This extends bed_pair_list_t with a fixed length buffer, which is a
temporary fudge.  It'll need something like the pooled string
allocator and a ref name pointer within bed_pair_t to properly deal
with multi-ref BEDs.

Ampliconstats now also has code to filter out any reads that are for
another chromosome, so if we have a multi-ref BAM and a single-ref BED
then it'll still work correctly.
@jkbonfield jkbonfield force-pushed the amplicon-multi-ref-bed branch from 11ed59c to d9bd540 Compare March 16, 2021 10:40
char strand;

if (sscanf(line.s, "%*s %"SCNd64" %"SCNd64" %*s %*s %c", &left, &right, &strand) != 3) {
if (sscanf(line.s, "%255s %"SCNd64" %"SCNd64" %*s %*s %c",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are two sscanf statements depending on whether the strand is being read or not. Should both be changed to handle the ref name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in my copyright police role, the years need changing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes - I didn't notice the second sscanf. I'll do both.

@daviesrob
Copy link
Copy Markdown
Member

As well as the copyright date, could you add something to the NEWS file for this as well, please?

@jkbonfield jkbonfield force-pushed the amplicon-multi-ref-bed branch from 50c3a45 to ec2c3da Compare March 16, 2021 11:25
@daviesrob
Copy link
Copy Markdown
Member

Hmm, feeding ampliconclip a BED file with multiple references triggers a double free - in load_bed_file_pairs() then in bam_clip(). I think this is actually a pre-existing bug in ampliconclip, but now easier to trigger. If the free in load_bed_file_pairs() is needed to avoid a memory leak in ampliconstats, setting it to NULL after would be the best fix.

@jkbonfield jkbonfield force-pushed the amplicon-multi-ref-bed branch from ec2c3da to 62685d9 Compare March 16, 2021 12:18
@jkbonfield jkbonfield force-pushed the amplicon-multi-ref-bed branch from 62685d9 to 8e2b2e0 Compare March 16, 2021 12:19
@jkbonfield
Copy link
Copy Markdown
Contributor Author

jkbonfield commented Mar 16, 2021

Ok I've done that.

What it really needs is create and destroy functions so that the memory is handled by function calls rather than the callers doing it all manually. However it was done in a bit of a rush at the time of writing, and we're in a rush now too as we don't want to hold up the release.

When we fix it to handle multiple refs we can refactor it more, but I'm hesitant to rush out a bunch of complex changes at the last minute.

@daviesrob
Copy link
Copy Markdown
Member

I think it's fine for now, thanks.

@daviesrob daviesrob merged commit 4ad1b5e into samtools:develop Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants