Inspiration
In today's world, data has the most important value. Every company, online or in-person, collects a lot of data daily. However, none have complete knowledge of how much of the data they collect and store has sensitive information. Because of this, companies are unable to share datasets with collaborators, including industries, government, and academia. With the implementation of GDPR, instead of companies trying to understand their data, have started to hide and stop sharing it. This is a big problem for every field where computation science can make a great impact.
What it does
Our project tries to tackle this problem and takes an approach where we build an algorithm first to identify the type of any given data files in a dataset, convert them to a proper data type, and then parse them with an efficient algorithm that could detect if the file contains any sensitive information or not.
How we built it
We use state-of-the-art packages and libraries to identify the correct data type of a file with the utmost precision. We then build algorithms that parse through these various possible data type files and use the most promising Named Entity Recognition models in addition to our own designed algorithm that can detect efficiently and accurately if a file contains sensitive information or not. To top it all, everything the algorithm does is done locally without contacting any services online!
Challenges we ran into
The dataset provided to us for training our algorithm was limited and did not cover all test cases. We had to think of various possible edge cases that could be possible in the real world, making the files sensitive. As the data is sensitive, it cannot contact any APIs or other online means to process, and therefore developing an algorithm that could locally do all this was a huge challenge with a team of developers who met a few hours ago!
Accomplishments that we're proud of
We are proud to gather and work as a team, who met a few hours ago and developed a working algorithm that could easily classify any data file containing sensitive information locally!
What we learned
Apart from brainstorming, problem-solving, and making decisions in a team, we all definitely gained a lot of technical knowledge and understood more about what kind of data could make a file sensitive.
What's next for GDPRGuard
The team sees GDPRGuard as solving the issues of classifying sensitive data files helping the industry to make them more aware of the data they own, and helping companies to make collaborations and data sharing easy by following GDPR!
Built With
- ner
- python
- scapy


Log in or sign up for Devpost to join the conversation.