Adversarial Robustness Toolbox (ART)
~~ Evasion Attacks and Defense on CNN classification model

Impact

Evasion attacks are made to drastically reduce the accuracy of a model by manipulating the input data. This works by adding small perturbations to the input, resulting in incorrect predictions and misclassification. Moreover, many AI applications have potential threats, such as security systems that can incorrectly authorize if an image is predicted in the culprit's desire. It's essential to understand and replicate these attacks in order to develop secure and robust AI systems. This project aims to display the effects and the process of how an evasion attack is possible.

To begin, we need to have a model to attack hence we create a Convolutional Neural Network (CNN) that can predict handwritten numbers for our project. First, the dataset gets loaded, preprocessed, and split into training and testing datasets. Next, we build our CNN model and train it with 5 epochs.

Model Creation

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2,2)),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D((2,2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_test, y_test))

.
.
.

Epoch 5/5 
469/469 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.9923 - loss: 0.0244 - val_accuracy: 0.9903 - val_loss: 0.0301

We achieve an amazing accuracy of 99.23% at the end of our training process, to test our highly accurate model let's take a sample image from our testing dataset.

Our model is working as expected!

Implementing Evasion Attack (FGSM)

Firstly, we need to wrap our model with a classifier to not affect our original model and allow us to test adversarial attacks.

classifier = TensorFlowV2Classifier(
    model=model,
    nb_classes=10,
    input_shape=(28, 28, 1),
    loss_object=tf.keras.losses.CategoricalCrossentropy(),
    clip_values=(min_pixel_value, max_pixel_value),
)

# Create Evasion(FGSM) attack
epsilon = 0.1
attack = FastGradientMethod(estimator=classifier, eps=epsilon)

With this completed, we can now start our attacks. We will assess 3 attacks with varying perturbations to showcase how increasing the noise added to our input data (epsilon, ε) fools our model to predict incorrectly. The findings are summarized as follows:

Weak attack intensity (ε = 0.1):
With a small amount of noise in our input data, our CNN model is able to predict the image as expected. However, the model's accuracy has decreased gradually from 99.23% to 88.19%.
Medium attack intensity (ε = 0.2):
With a moderate amount of noise introduced, our CNN model struggles but manages to predict the image correctly. This time, the model's accuracy has reduced immensely to 45.40%.
Strong attack intensity (ε = 0.3):
In this attack our image is distorted highly with our attack, with an accuracy of a mere 11.17% the model fails to predict the image correctly.

Our CNN model has shown us how its vulnerabilities can be exploited!

Findings

Comparison between these images helps to understand how an attack causes disturbance in the image to mislead the model.

With the help of this graph, we can visualize how easy and quick it is to attack Convolutional Neural Network (CNN) models in general. The graph depicts the correlation between our model's accuracy and the perturbation in our input data.

Defense ~ Model Training with Adversarial Data

After creating our model, it is crucial to train its defense against such evasion attacks. Utilizing the AdversarialTrainer function from the ART library, we can train our model on the previous attacks.

trainer = AdversarialTrainer(classifier, attacks=[attack, attack2, attack3], ratio=0.5)
trainer.fit(x_train, y_train, nb_epochs=5, batch_size=128)

As seen in the code above, all 3 attack strengths are used to train our model to be robust. The ratio used is 0.5 to ensure both the original and adversarial data are being used equally for training.

Model Robustness Evaluation

Moving forward, after successfully completing the training, the newly guarded model must be evaluated and the results obtained are as follows:

Accuracy on original test data: 99.37%
Accuracy on adversarial test data with epsilon 0.1: 97.86%
Accuracy on adversarial test data with epsilon 0.2: 94.42%
Accuracy on adversarial test data with epsilon 0.3: 89.70%

Our model has been trained well and gives us great accuracy albeit attempted attacks! The difference before and after adversarial training can be seen in the plotting graph below:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
attack_predict.ipynb		attack_predict.ipynb
defense_predict.ipynb		defense_predict.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Robustness Toolbox (ART)
~~ Evasion Attacks and Defense on CNN classification model

Impact

Model Creation

Implementing Evasion Attack (FGSM)

Findings

Defense ~ Model Training with Adversarial Data

Model Robustness Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adversarial Robustness Toolbox (ART) ~~ Evasion Attacks and Defense on CNN classification model

Impact

~~ Model Creation ~~

Implementing Evasion Attack (FGSM)

Findings

Defense ~ Model Training with Adversarial Data

Model Robustness Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Adversarial Robustness Toolbox (ART)
~~ Evasion Attacks and Defense on CNN classification model

Model Creation

Packages