Title

Network Must Have Spine

Who

Zezhi Wang(zwang251), Xiaoyan Zhao(xzhao58), Changhao Wu(cwu55)

Final Writeup

https://docs.google.com/document/d/15ZKrxLp29mBcNvalItXkKxljWbF6IoUSodBzlLijBM8/edit?usp=sharing

Introduction

In this project, we solve a traditional problem in the deep learning area, which is image classification. Although deeper and larger network architectures have been proposed and achieved better performance on image classification problems, emerging problems such as gradient vanishing and computation power consumption have worsened due to the larger size of parameters and number of network layers. Inspired by the spinal architecture of humans, H M Dipu Kabir et al. designed a new linear layer as the substitute of the fully connected layers for CNNs. With such a linear layer, a CNN can achieve state-of-the-art performance by using fewer parameters, which, in turn, reduces the computation overhead.

This project is to reimplement the network architecture described in the paper, SpinalNet: Deep Neural Network with Gradual Input, and test our implementation on the dataset such as ImageNet and Google Quick Draw. The reason why we choose this paper is two-fold. First, because the authors of this paper addressed an emerging and significant problem in the deep learning area, which is high computation overhead. People tend to set up deeper networks with larger numbers of parameters to achieve better performance, which leads to large energy and hardware resource consumption. However, this paper demonstrated that with careful design of the network architecture and usage of a reasonable number of parameters, the performance can still be better than the state-of-the-art in most cases. Second, the design of this network is inspired by the function of the human nervous system, which, we think, is an interesting topic and a good direction to design a neural network.

Related Work

There are several recent works rearchitecting network structure to reduce the parameters and improve the computation efficiency. For example, Gao Huang et al. proposed a network architecture, DenseNet, that can alleviate the vanishing gradient problem and reduce the computation power consumption while maintaining the performance that is comparable with or even better than the state-of-the-art on common image classification benchmark tasks. The design principle of such an architecture is to ensure maximum connectivity between layers in the network. Gao Huang et al. connect all layers directly with every other layer. In this way, the features extracted from previous layers can easily propagate to subsequent layers. The experiments have shown that since subsequent layers can easily access the previous feature maps, fewer parameters are required to achieve comparable performance.

In terms of the current implementation of SpinalNet, we have found two code repositories: https://github.com/dipuk0506/SpinalNet and https://github.com/Mechachleopteryx/SpinalNet. Both of them implement the spinalnet in Pytorch, so we decide to reimplement it in Tensorflow.

Data

We will be using the dataset ImageNet. No significant preprocessing would be required, as it can be obtained via the tensorflow-datasets python package. There are 1,281,167 training data, 100,000 test data, and 50,000 validation data. We may use some fractions of it.

For the stretch goal, we would also be using Google Quick Draw Dataset. This dataset contains 50 million drawings across more than 300 categories. The original dataset is a combination of timestamped vectors, which may need some pre-processing. Luckily google provided a preprocessed numpy bitmap version which we can use directly. We randomly selected 24 categories starting with different initials (q and x has no associated category), with approximately 120k drawings each (we would be using only a fraction of them).

Methodology

Generally speaking, Spinal layers are fully connected layers acting as hidden layers (traditional structures of hidden layers are allocated to Input row, Intermediate row, and output row) to reduce the number of parameters and improve the performance. It takes inputs gradually and repetitively to simulate human brains and reflexes.

There are two approaches we would use to train our model. The first approach is training from scratch. We are planning to train the model in batches with Adam optimizer, as usual. The second approach would be to perform transfer learning. The authors mentioned that adding SpinalNet onto ResNet performs worse than vanilla ResNet in the beginning, and that may be the result of smaller gradients due to more layers. Therefore in our stretch goal, we are planning to pull some pre-trained models from Keras pre-trained models and then train with the Spinal Layers.

The hardest part of implementing this model are the Spinal Layers, and how we should combine them with other networks. The design of the network is inspired by human nervous systems, which means it may be a little bit hard for us to have a clear understanding of the architecture due to limited knowledge in the field of biology. Then, combining them with existing models may require us to have certain degrees of understanding with models like ResNet, VGG, etc. Also, we may pull some of the models as pre-trained blackboxes, and this may introduce more confusion when we try to integrate them with Spinal Layers. Hence we think these parts would be the most challenging parts of the implementation.

Metrics

Accuracy would be the appropriate metric for this project and we would expect an improvement in accuracy as we add Spinal Layer to CNN. We would train via Transfer learning from pretrained models using larger datasets mentioned in the Data section like ImageNet, Google Quick Draw.

In the original project, authors of the paper demonstrated the result of their model by running several benchmark datasets using PyTorch CNN with and without SpinalNet Fully Connected Layer(FC). The authors also analyzed the main results on MNIST, Fashion-MNIST, KMNIST, QMNIST, EMNIST, CIFAR-10, CIFAR-100, Caltech-101, Bird225, Stanford Cars, SVHN, CINIC-10, STL-10, Oxford 102 Flower, Fruits 360. To ensure a fair comparison, the authors eliminate all other factors by letting Spinal FC be the only difference. The authors quantify the results of their model by computing accuracy, error reduction, the number of parameters model yields.

Ethics

Our model serves as a classification tool that is helpful for reading handwritten letters. More generally, SpinalNet requires fewer parameters to reach satisfactory results and therefore efficient for training large datasets. Thus, it is also more environmentally friendly. We picked ImageNet, Google Quick Draw as our DataSet which consisted of handwritten digits. Generally speaking, this kind of dataset needs to be collected in a privacy-preserving manner. They should do Data Anonymization and remove information that could identify people’s identity. For example, if the digits are maintained from people’s home addresses, they need to be shuffled and removed some critical digits so that no information can be extracted. All of those benchmarks that we picked are well-known, well-explored datasets that have been used by various networks. The SpiralNet could be a building block of many deep neural networks and do not have any direct applications on controversial ethical use.

Division of labor

The workload is divided evenly among the team members. We plan to read the paper, discuss the architecture, implement the code, and work on the writeups with equal amounts of effort.

Built With

Share this project:

Updates

posted an update

Reflection

Introduction

In this project, we solve a traditional problem in the deep learning area, which is image classification. Although more and more deeper and larger network architectures have been proposed and achieved better performance on image classification problems, emerging problems such as gradient vanishing and computation power consumption have worsened due to the larger size of parameters and number of network layers. Inspired by spinal architecture of humans, H M Dipu Kabir et al. designed a new linear layer as the substitute of the fully connected layers for CNNs. With such a linear layer, a CNN can achieve state-of-the-art performance by using less parameters, which, in turn, reduces the computation overhead. This project is to reimplement the network architecture described in the paper, SpinalNet: Deep Neural Network with Gradual Input, and test our implementation on the dataset such as, ImageNet and Google Quick Draw. The reason why we choose this paper is two-fold. First, because the authors of this paper addressed an emerging and significant problem in the deep learning area, which is high computation overhead. People tend to set up deeper networks with larger numbers of parameters to achieve better performance, which leads to large energy and hardware resource consumption. However, this paper demonstrated that with careful design of the network architecture and usage of a reasonable number of parameters, the performance can still be better than the state-of-the-art in most cases. Second, the design of this network is inspired by the function of the human nervous system, which, we think, is an interesting topic and good direction to design a neural network.

Challenges

After we finished building the network based on the architecture provided by the paper, we used MNIST dataset to test our neural network. Although we set the same hyperparameters, including learning rate, CNN’s kernel size and stride length, the network’s loss did not decrease. We think this problem is caused by different approaches between PyTorch and Tensorflow to initialize layers’ parameters. After we tuned the learning rate and tried different scales and distributions of data initialization, we finally were able to train SpinalNet using MNIST. Once we verified the architecture, we changed the dataset to Google Quick Draw. However, this dataset was too large to finish the training in a short period of time. In order to get a preliminary result, we picked 24 classes from the dataset, and processed the data again. In this way, we were able to finish the training in an hour.

Insights

We have completed the implementation of VGG model and SpinalVGG model and run both models with Google Quick Draw dataset. We currently reached the accuracy of 89.94% for SpinalVGG which is only slightly higher than the model without adding the spinalNet. Although the results are on the right track, we are expecting to see more improvements in accuracy by adding the spinalNet. The authors of the paper indicated that models with spinal nets would show more advantage as more epochs are trained. Therefore, we are planning on training more epochs to see how the results are.

Plan

We are on track with our project. As we finished the implementation of VGG and SpinalVGG and ran brief experiments to make sure they are working as expected. In the remaining times of this semester, we would spend more time figuring out the transfer learning approach. The author substituted the final feed-forward layer of pre-trained models (available for PyTorch) from torch vision, and we found it not as simple for Keras pre-trained models for TensorFlow. We are currently not thinking about potential changes, as we feel like we are on the right track working towards our end goal.

Log in or sign up for Devpost to join the conversation.