Title
Network Must Have Spine
Who
Zezhi Wang(zwang251), Xiaoyan Zhao(xzhao58), Changhao Wu(cwu55)
Final Writeup
https://docs.google.com/document/d/15ZKrxLp29mBcNvalItXkKxljWbF6IoUSodBzlLijBM8/edit?usp=sharing
Introduction
In this project, we solve a traditional problem in the deep learning area, which is image classification. Although deeper and larger network architectures have been proposed and achieved better performance on image classification problems, emerging problems such as gradient vanishing and computation power consumption have worsened due to the larger size of parameters and number of network layers. Inspired by the spinal architecture of humans, H M Dipu Kabir et al. designed a new linear layer as the substitute of the fully connected layers for CNNs. With such a linear layer, a CNN can achieve state-of-the-art performance by using fewer parameters, which, in turn, reduces the computation overhead.
This project is to reimplement the network architecture described in the paper, SpinalNet: Deep Neural Network with Gradual Input, and test our implementation on the dataset such as ImageNet and Google Quick Draw. The reason why we choose this paper is two-fold. First, because the authors of this paper addressed an emerging and significant problem in the deep learning area, which is high computation overhead. People tend to set up deeper networks with larger numbers of parameters to achieve better performance, which leads to large energy and hardware resource consumption. However, this paper demonstrated that with careful design of the network architecture and usage of a reasonable number of parameters, the performance can still be better than the state-of-the-art in most cases. Second, the design of this network is inspired by the function of the human nervous system, which, we think, is an interesting topic and a good direction to design a neural network.
Related Work
There are several recent works rearchitecting network structure to reduce the parameters and improve the computation efficiency. For example, Gao Huang et al. proposed a network architecture, DenseNet, that can alleviate the vanishing gradient problem and reduce the computation power consumption while maintaining the performance that is comparable with or even better than the state-of-the-art on common image classification benchmark tasks. The design principle of such an architecture is to ensure maximum connectivity between layers in the network. Gao Huang et al. connect all layers directly with every other layer. In this way, the features extracted from previous layers can easily propagate to subsequent layers. The experiments have shown that since subsequent layers can easily access the previous feature maps, fewer parameters are required to achieve comparable performance.
In terms of the current implementation of SpinalNet, we have found two code repositories: https://github.com/dipuk0506/SpinalNet and https://github.com/Mechachleopteryx/SpinalNet. Both of them implement the spinalnet in Pytorch, so we decide to reimplement it in Tensorflow.
Data
We will be using the dataset ImageNet. No significant preprocessing would be required, as it can be obtained via the tensorflow-datasets python package. There are 1,281,167 training data, 100,000 test data, and 50,000 validation data. We may use some fractions of it.
For the stretch goal, we would also be using Google Quick Draw Dataset. This dataset contains 50 million drawings across more than 300 categories. The original dataset is a combination of timestamped vectors, which may need some pre-processing. Luckily google provided a preprocessed numpy bitmap version which we can use directly. We randomly selected 24 categories starting with different initials (q and x has no associated category), with approximately 120k drawings each (we would be using only a fraction of them).
Methodology
Generally speaking, Spinal layers are fully connected layers acting as hidden layers (traditional structures of hidden layers are allocated to Input row, Intermediate row, and output row) to reduce the number of parameters and improve the performance. It takes inputs gradually and repetitively to simulate human brains and reflexes.
There are two approaches we would use to train our model. The first approach is training from scratch. We are planning to train the model in batches with Adam optimizer, as usual. The second approach would be to perform transfer learning. The authors mentioned that adding SpinalNet onto ResNet performs worse than vanilla ResNet in the beginning, and that may be the result of smaller gradients due to more layers. Therefore in our stretch goal, we are planning to pull some pre-trained models from Keras pre-trained models and then train with the Spinal Layers.
The hardest part of implementing this model are the Spinal Layers, and how we should combine them with other networks. The design of the network is inspired by human nervous systems, which means it may be a little bit hard for us to have a clear understanding of the architecture due to limited knowledge in the field of biology. Then, combining them with existing models may require us to have certain degrees of understanding with models like ResNet, VGG, etc. Also, we may pull some of the models as pre-trained blackboxes, and this may introduce more confusion when we try to integrate them with Spinal Layers. Hence we think these parts would be the most challenging parts of the implementation.
Metrics
Accuracy would be the appropriate metric for this project and we would expect an improvement in accuracy as we add Spinal Layer to CNN. We would train via Transfer learning from pretrained models using larger datasets mentioned in the Data section like ImageNet, Google Quick Draw.
In the original project, authors of the paper demonstrated the result of their model by running several benchmark datasets using PyTorch CNN with and without SpinalNet Fully Connected Layer(FC). The authors also analyzed the main results on MNIST, Fashion-MNIST, KMNIST, QMNIST, EMNIST, CIFAR-10, CIFAR-100, Caltech-101, Bird225, Stanford Cars, SVHN, CINIC-10, STL-10, Oxford 102 Flower, Fruits 360. To ensure a fair comparison, the authors eliminate all other factors by letting Spinal FC be the only difference. The authors quantify the results of their model by computing accuracy, error reduction, the number of parameters model yields.
Ethics
Our model serves as a classification tool that is helpful for reading handwritten letters. More generally, SpinalNet requires fewer parameters to reach satisfactory results and therefore efficient for training large datasets. Thus, it is also more environmentally friendly. We picked ImageNet, Google Quick Draw as our DataSet which consisted of handwritten digits. Generally speaking, this kind of dataset needs to be collected in a privacy-preserving manner. They should do Data Anonymization and remove information that could identify people’s identity. For example, if the digits are maintained from people’s home addresses, they need to be shuffled and removed some critical digits so that no information can be extracted. All of those benchmarks that we picked are well-known, well-explored datasets that have been used by various networks. The SpiralNet could be a building block of many deep neural networks and do not have any direct applications on controversial ethical use.
Division of labor
The workload is divided evenly among the team members. We plan to read the paper, discuss the architecture, implement the code, and work on the writeups with equal amounts of effort.
Built With
- python
- tensorflow


Log in or sign up for Devpost to join the conversation.