The Problem and our Purpose

Due to the ever present popularity of social media apps and the recent innovations in generative AI, phishing emails have become increasingly more sophisticated and difficult to detect. According to CNBC, from the fourth quarter of 2022 to November 2023, there was a 1265% increase in malicious phishing emails (see here). To combat this, our team developed PhishNet-Combatant, a tool which aids in determining whether or not an email sent is a phishing email by providing PhishNet's built-in machine learning classification model, link analysis, and email domain analysis services.

Using our Project

Interfacing with PhishNet-Combatant is simple. Just forward an email which you suspect might be phishing to phishnetcombatant@gmail.com and wait a few seconds for a reply.

Methodologies

PhishNet’s custom AI classification model. To create PhishNet’s custom AI model we deploy transfer learning by fine-tuning DistillBERT, an older (2019) smaller (250 million parameter) model which was created to make improvements on the previously existing BERT model. The dataset we used consisted of 18,600 emails, 39% of which were phishing emails. The model was trained in Google’s Colab on a T4 GPU and it took 1 hour. We rely heavily on a notebook released by DIMA806 on Kaggle, a senior data scientist in Denmark. The Colab can be found on our repository, and at the top DIMA806 is cited.

Link Analysis. To find links in each forwarded email we deploy RegEx, along with the library URLExtract. Once links are extracted, we process them with ipqualityscore’s API for determining link legitimacy. It returns JSON describing many different attributes per each link, which we cut down and organize in a digestible format.

Email Domain Analysis. In order to process domains, we run a function which identifies suspicious emails on the basis of domain lengths and abnormal characters. We then deploy ipqualityscore’s API to further determine if the email is disposable and if it was found in recent leaks.

What's next for PhishNet-Combatant

We are committed to a substantial overhaul of the metrics employed in gauging the threat levels associated with emails. Given the relentless advancements in artificial intelligence and technology that invariably fuel attempts to pilfer information from users, it is imperative that PhishNet is continuously improving. For our machine learning model we have finetuned DistillBurt, which is a smaller older language model. Using newer models will improve the accuracy and consistency of PhishNet, ensuring future reliability. We also hope to increase our capabilities to scan attached files for potential malware, improving our defenses against evolving cyber threats. Moreover, we believe releasing a Chrome extension which is capable of automatically analyzing emails (without forwarding) could be a convenient and useful safeguard against what may not obviously be phishing attacks.

Built With

Share this project:

Updates