DNA Scanner

Due to only one available spot for video submission, here are both links. BackEnd Video -> https://youtu.be/9mNGSLErebc

FrontEnd Video -> https://youtu.be/P3kQIlXXLmo

Inspiration

With our collective lack of web and app development experience, our group wanted to focus on an area of coding that we were confident in. Thus, we landed on this notion of creating an algorithm. Through ideation, we stumbled upon DNA sequencing as our topic of interest, allowing us to go down this avenue of pattern matching. Although you do not often have a string of your DNA chromosome sequences lying around, we thought this would be a fun project to work toward an easily accessible mode of gene mutation detection, which could find uses in healthcare spaces.

What it does

Backend - Scanner takes three inputs (the chromosome being tested, the path to the reference fasta file, and the path to the "patient" fasta file). It searches through the patient’s DNA and compares it with the reference to identify if a mutation (found from the CSV file) is present in the patient. It looks for three unique mutation types: Insertion-Deletion, Deletion, and Single Nucleotide variant. If a match is found, it adds the match to a list of findings and, after going through the whole file, outputs the findings. These findings help identify genetic mutations that can lead to disorders or diseases. There are other files, such as MutationSimulator, which constructs a mutated chromosome for testing, and databaseMaker, which creates the database of pathogenic variants that we would hope to detect.

The front end takes a rudimentary version of the backend and implements the DNA Sequence Scanner to display a series of mutations and correlated information. Users can input the data sequence they want analyzed and the resulting data is presented in a clear and modern design with easy to use functional elements.

How we built it

Backend - Wrote code to make database based off of a variant_summary.txt.gz file found on the National Library of Medicine. Also took reference chromosomes including chr1 (the largest chromosome with over 248 million base pairs). Coded a mutationSimulater which would take a input reference chromosome and add chosen mutation to it for testing proof of concept.

FrontEnd - For the UI, I used HTML, CSS, and JavaScript with Chart.js to display visual results like mutation scores. I also added user login functionality with Flask-Login, so users can save and view their previous DNA scans. Although I planned to use AI for more insights, I used a mock AI message because I ran out of my API Key quota. The focus was on making the app easy to use, with features like dark mode, theme switching, file uploads, and a clean layout. The backend handles DNA pattern matching, and the frontend helps show the results clearly. I hosted everything locally, but it could be deployed online with services like Heroku or Vercel.

Challenges we ran into

Becoming comfortable with languages we weren't familiar with. Coming up with ideas and methods for how to compare Fasha files. Building a mutated Chromosome with code. Writing appropriate algorithms on how to recognize mutations. Debugging across multiple files and functions, trying to narrow down the problem, isolate the problem, and fix the problem. Coming up with usable tests was also a struggle because it got to a point where we couldn't write testable sequences anymore because we were required to use the coordinates stated in the csv sheet (which prompted the need for a mutationSimulater) so tests had to be delayed which put debugging on crunch time which forced compromises.

Specifically for the front-end, one of the biggest challenges was integrating AI insights with limited API access. I thought this would be an interesting element to add by using OpenAI’s API for real-time DNA analysis suggestions. Still, due to quota limits, I had to pivot and implement a mock AI response system to maintain functionality. In addition to that, styling the user interface was another challenge as I spent extra time refining gradients, hover effects, and animations to enhance the user experience. Additionally, handling DNA file uploads required careful parsing so that sequences were read correctly from a FASTA format. Managing user authentication and securely storing user history in a database was also very technically challenging, and I had to learn and apply Flask-Login.

Accomplishments that we're proud of

I'm proud of making a mutation simulator, because I thought it would be much easier to find something online that would do this for me. So, I might be the first. I don't know for a fact, though, but during my time researching how to get a mutated chromosome, nothing helpful ever came up. I'm proud that my code can run through hundred of millions of lines of chars in a couple seconds.

I’m proud of being able to create a fully functional and user-friendly DNA analysis tool. It was exciting to see the DNA pattern matching work correctly and display results visually with highlights and charts. I’m also proud of learning how to add features like user login, dark mode, and file uploads, which made the app more complete and professional. Even though I couldn’t fully use AI, I adapted and still provided helpful insights for users.

What we learned

I learned some of genetics' complexities and what FASTA files are. I learned a lot of things about GitHub. I learned how to build CSV files with a text file. I learned how to integrate through a CSV file. I learned how to splice/parse strings obtained from a CSV file into multiple groups with import re to store them into various variables. I learned how to simulate a mutated chromosome. I learned where to find a whole bunch of reference chromosomes. I learned KMP algorithm, although we did not need to integrate it.

What's next for DNA Scanner

If there's ever a database of patient chromosomes, then testing could be more efficient. Eventually, it could be used in medical fields to detect genetic mutations leading to disorders and diseases.

Successfully integrating front-end and back-end together

wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz file needed for database (only to use in Ubuntu)