This repository contains an implementation of Phylo2Vec which includes:
cfg/: Example configuration filesdata/: Placeholder folder to contain sequence files in FASTA format.examples/: Example notebooks for different datasetshc/: Phylogenetic tree optimisation via hill-climbing optimisation- Branch length and nucleotide subsitution model optimisation relies on RAxML-NG
tests/: Placeholder folder for unit teststrees/: Placeholder folder to contain tree files as Newick strings.utils/: Utility functions including definitions of Phylo2Vec and transforms from commonly used tree formats to Phylo2Vec (and vice versa).
A quick demo detailing hill-climbing optimisation with Phylo2Vec is available on the demo.ipynb notebook.
A more minimalistic demo with an updated defiition of Phylo2Vec is available on Colab:
To reproduce the environment, run:
conda env create -f env.yml
To run hill climbing-based optimisation using Phylo2Vec, run:
conda activate phylo
python -m hc.main
- Download a binary of RAxML-NG at: https://github.com/amkozlov/raxml-ng. For Windows, consider using the Windows Subsystem for Linux.
The following datasets were used:
primates: https://evolution.gs.washington.edu/book/datasets.htmlfluA: https://github.com/4ment/phylostan/tree/master/examplesM501: DS2 dataset in https://github.com/zcrabbit/vbpi-gnn/tree/main/data/hohna_datasets_fastah3n2_na_20,zika: https://github.com/neherlab/treetime_examplesyeast: https://cran.r-project.org/web/packages/phangorn/index.html (comes with pre-loaded datasets includingyeast)
As mentioned in the submission, we plan to add more optimiation schemes using Phylo2Vec, e.g., MCTS or gradient descent.