Inspiration

I'm learning cybersecurity, and I am liking it. So, alongside my interest in Machine Learning and Data Mining, I wanted to combine these two disciplines, and see how machine learning can be useful for cybersecurity.

What it does

The main IPython Notebook converts a Wireshark capture file (.pcapng) which is converted into a .csv file. Using pandas, the Notebook reads this file, and compiles charts that exhibit how many TCP packets are sent in a TCP flood attack, a kind of DDoS attack. Wireshark also takes into account ARP packets which retain a substantial number, but they are ignored.

Classification of TCP attack is determined via SVC.

On running the project

To run the official simulation of the attack, first install VirtualBox and Linux Kali as the OS of the virtual machine. Afterwards, install Wireshark on the machine. Then, install hping3 using this terminal command:

sudo apt-get install hping3

Once done, run this command:

hping3 -c 1300 -d 120 -p 80 --flood --rand-source <IP address>

Basically, this tells you to send 1300 TCP packets of byte size 120 to port 80, and send them as fast as possible (via the --flood tag) at the IP address of your choice (be ethical about this) using the hping3 software. Make sure Wireshark is capturing these packets.

You can stop Wireshark at around 40-45 seconds, and stop running the command on the terminal afterwards. Then, save the capture as a .pcapng file, and upload it to an online repository like Google Drive or GitHub. Make sure you place this file in the same directory as the Notebook. Convert this file to .csv by selecting the "Export packet dissections" option on Wireshark. Finally, you can run the Notebook.

How I built it

I coded the notebook into cells that each imply an overall function. That is, if there are one or more lines of code that perform a specific goal, a cell is created for them, with the exception of the last line showcasing a result. An example from the code is the cell containing the line "ddos_df.isnull().sum." This line returns how many null values are in the ddos_df DataFrame only, so a cell is created to imply this function.

The packages used for this notebook are Pandas and Seaborn.

Challenges I ran into

One challenge that I've ran into was VirtualBox's performance. VirtualBox can sometimes be slow to process your requests, especially those with Wireshark and uploads to Google Drive.

Another challenge was compiling the right information. I have to select the right attributes (columns) in order to present the attack detection accurately. Besides attribute selection, there is also compiling data for charts.

Another challenge was compiling clases for SVC and PCA calculations.

What's next for the project

In the future, I hope that the Notebook will be able to capture actual packets using pyshark. This way, DDoS attacks can be detected in real-time.

Alongside this possibility, the Notebook should be able to detect other DDoS attack types, such as an SNMP flood, and extend to detecting different types of malware.

Finally, I hope that SVC and PCA will be implemented as intended.

Built With

Share this project:

Updates