Inspiration

Almost anywhere you look on the web, you will find no shortage of advertisements. While some may in fact be simple ads, in this day and age, it would not be uncommon to find some malware or scams among them. When you aren't exploring the frontier of the Internet, these dangerous websites can infiltrate your E-mail inbox. As the number of hacks and breaches increase, cybersecurity has become a much greater concern to each and every one of us. Motivated to counter one of these vectors of attack - phishing URLs - we created a web tool that uses machine learning to detect whether or not any given URL is likely to lead to a phishing website or not.

What it does

Our tool lets the user input a URL, and displays how likely that URL is to lead to a phishing website.

How we built it

We first converted a dataset in a .arff format into .json and .txt, which would be parseably by ml5.js. We also built a series of functions that check the legitimacy of a website based on thirty rules, ranging from domain expiry dates to urls that contain url shortening services, the output of which could then be run through the trained model.

Challenges we ran into

Understanding the machine learning pipeline, using ml5.js to execute our task. There was no tool to automatically parse information from urls or website html or certificate authorities, so we had to do all of that ourselves. We tried emailing the people who originally created our training dataset, but there was no response.

Accomplishments that we're proud of

Creating an MVP, understand the ML process

What we learned

p5.js, HTML, ml5.js, how to train a machine learning model

What's next for clueless Phishing Website Detector

Providing fully automated detection of each of the 30 parameters just from the inputted URL.

Built With

Share this project:

Updates