Inspiration
The motivation for our project lies in detecting bias in DNA sequence prediction which demands the need to ensure that the prediction algorithms and models are not biased towards certain organisms or populations, which can lead to inaccurate or incomplete results.
What it does
Predicts the genome of a particular DNA sequence provided as input
How we built it
Using python, elements of NLP and supervised machine learning
Challenges we ran into
DNA sequencing is an incredibly complex and data-intensive process, and developing predictive models that can accurately analyze this data requires sophisticated algorithms and advanced computational techniques. Hence, it was a challenge to learn new things that were previously unknown to us and come up with an innovative way to describe the bias and how it may be affecting the model.
Accomplishments that we're proud of
Building a robust model and finding a very intricate bias which would need newer algorithms to detect
What we learned
Given how complex DNA sequencing prediction is, we learnt how absolutely minute differences are able to change the entire genome of an organism and how difficult it is to detect these differences with current systems. We also identified that using kmers analysis, which is the industry standard, isn't necessarily the best option and that is where the inherent bias in current technology lies.
What's next for Predictive model for DNA sequencing and explaining bias
The ultimate goal is to completely remove kmers as a technique in genome identification and develop a new form of analysis which circumvents the current biases
Built With
- genbank
- python
Log in or sign up for Devpost to join the conversation.