Code of RNA transformer model. This Transformer model can be used to prdict RNA 3D structure if you have an RNA sequence. Model gives output as 3D point x,y,z for each neucliotide from the RNA sequence given to the model.
The goal of studying this model architecture is to understand the way we can model RNA sequences using transformer architecture. We are already familiar with modeling the English language using transformer encoder but this is something curious to explore. the RNA sequence modeling using transfomer encoder. Unlike generic transformer architecture described in "Attention is All You Need" paper this RNAModel has following major differences which helps to model RNA sequences using trasnformer.
- Outer product mean helps to learn the pairwise relationships between elements from the RNA sequence.
- It highlights the complementarity and dependency relationships between neucleiotide.
- It also do expansion and compression of the features space to understand the complex patterns.
-
This module helps to enrich pairwise representations by incorporating triangular relationships between neucleiotides.
-
It helps to understand the non-local interactions.
This two techniques described well in blog post here. (Link)
- Linear Layer's : PyTorch uses a method called Kaiming uniform initialization by default
- Embeddings : For the nn.Embedding layer, PyTorch initializes the weights from a uniform distribution.
- LayerNorm : nn.LayerNorm layers initialize their weights to 1 and biases to 0 by default.
- Original RibonanzaNet has decoder output dimensions as 2 cause it was used to predict the reactivity of the neucleiotides.
- Updating it to 3 cause we are training the model for 3D RNA structure predicition. Our new model will output 3 logits for x,y,z respectively for 3D dimensions.
- Original outer product mean module was implemented using einstein equation of matrix multiplication.
- We have updated it to do the same using simple metric multiplications.
For first experiment I used the training data of kaggle competition itself here is the (Link). For second experiment I used 10,000 samples from publically available dataset of 4 lakh RNA sequences. It took around 24 hrs. to train the model on this 10000 samples.
| Specification | Value |
|---|---|
| GPU Model | Quadro RTX 8000 |
| CUDA Version | 12.2 |
| GPU Memory | 49152 MiB (48 GB) |
| Experiment Name | No. of samples in training | LB score(TM-Score) |
|---|---|---|
| Experiment 1 | 844 | 0.161 |
| Experiment 2 | 10000 | 0.279 |
