Predictive model for DNA sequencing and explaining bias

Inspiration

The motivation for our project lies in detecting bias in DNA sequence prediction which demands the need to ensure that the prediction algorithms and models are not biased towards certain organisms or populations, which can lead to inaccurate or incomplete results.

What it does

Predicts the genome of a particular DNA sequence provided as input

How we built it

Using python, elements of NLP and supervised machine learning

Challenges we ran into

DNA sequencing is an incredibly complex and data-intensive process, and developing predictive models that can accurately analyze this data requires sophisticated algorithms and advanced computational techniques. Hence, it was a challenge to learn new things that were previously unknown to us and come up with an innovative way to describe the bias and how it may be affecting the model.

Accomplishments that we're proud of

Building a robust model and finding a very intricate bias which would need newer algorithms to detect

What we learned

Given how complex DNA sequencing prediction is, we learnt how absolutely minute differences are able to change the entire genome of an organism and how difficult it is to detect these differences with current systems. We also identified that using kmers analysis, which is the industry standard, isn't necessarily the best option and that is where the inherent bias in current technology lies.

What's next for Predictive model for DNA sequencing and explaining bias

The ultimate goal is to completely remove kmers as a technique in genome identification and develop a new form of analysis which circumvents the current biases

Built With

genbank
python

Updates

Neelanjan Mitra started this project — Feb 26, 2023 10:28 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.