Skip to content

Intercept badly formatted FASTA query #691

@yannickwurm

Description

@yannickwurm

If a user submits a one-sequence query where the identifier and sequence are not separated by a newline, Seqserv crashes.

E.g., sometimes users incorrectly store what should be the following query fasta:

>identifier123
AGAGCTAGCTAGCTACGATCGATCGATGCAAGTAGTACtAGCTCGA

They might store it (and paste it into seqserv's query like so:

>identifier123 AGAGCTAGCTAGCTACGATCGATCGATGCAAGTAGTACtAGCTCGA

This overcomes our internal checks - it's identified as nucleotide in Seqserv's javascript in-browser testing, and starting with '>', it's identified as FASTA. But it's not correct FASTA. BLAST sees this as being an identifier and a long description, and no sequence.

Blast will provide this type of error:
Screenshot 2023-10-10 at 10 56 04

We should elegantly catch that and return that to the user, rather than just failing (we currently fail with "there is a problem - likely with BLAST binaries"

I don't think this occurs that frequently. Two options:

  • extra FASTA validation on the client side (would want to be lightweight)
  • better error catching when BLAST fails, and reporting BLAST's server-side error to the client.

My intuition is that the 2nd is better.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions