This project implements a pipeline to analyze influenza virus sequences and predict potential mutations using a Codon-based Graph Neural Network (GNN). The pipeline:
- Reads metadata from a CSV file and sequences from a FASTA file.
- Normalizes and filters metadata based on available sequences.
- Parses collection dates and selects a baseline sequence (earliest date).
- Performs pairwise sequence alignment (demo).
- Converts nucleotide sequences into codon-based graphs (nodes are one-hot encoded codons with sequential edges).
- Trains a Graph Convolutional Network (GCN) to predict mutation probabilities (dummy labels in this demo).
🚀 This is a demo pipeline. In a production scenario, you may refine the alignment, label generation, and training processes.
git clone https://github.com/your-repo/Influenza-GNN.git
cd Influenza-GNNsource venv/bin/activate # On Windows use: venv\Scripts\activatepip install -r requirements.txtbiopython (for sequence handling and alignment) torch and torch_geometric (for GNN implementation) pandas (for metadata processing) If torch_geometric fails to install, refer to: PyG Installation Guide
FASTA File (sequences.fasta): Contains virus sequences. Metadata CSV (metadata.csv): Should include an "Accession" column and "Collection_Date".
python main.py✅ Load & normalize sequences ✅ Filter metadata based on available sequences ✅ Parse collection dates & select baseline sequence ✅ Perform pairwise sequence alignment (demo) ✅ Convert sequences to codon-based graphs ✅ Train a simple two-layer GNN model
Trained GNN Model (Model.pt): Can be used for inference. Printed Logs: Displays alignment results, training progress, and mutation probabilities.
- Update file paths inside main.py:
fasta_file = "sequences.fasta"
metadata_file = "metadata.csv"- Modify GNN hyperparameters:
model = MutationGCN(in_channels=64, hidden_channels=32, num_classes=1)
optimizer = optim.Adam(model.parameters(), lr=0.01)- Define mutation labels (dummy labels used in this demo):
graph.y = torch.zeros((num_nodes, 1), dtype=torch.float)To load a pre-trained model (Model.pt):
from main import load_model
model = load_model("Model.pt")We welcome contributions! To contribute:
- Fork the repository and create a new branch:
git checkout -b feature-name- Make changes & commit:
git commit -m "Added feature X"- Push to GitHub & create a pull request.
- If you encounter any bugs, open an issue with a detailed description.
This project is licensed under the MIT License.
For questions, reach out via:
📧 Email: duashmita@gmail.com 🔗 GitHub Issues: Open an Issue Happy coding! 🚀