Paper

TM-Vec: template modeling vectors for fast homology detection and alignment: https://www.biorxiv.org/content/10.1101/2022.07.25.501437v1

Embed sequences with TM-vec

Installation

First create a conda environment with python>=3.9 installed. If you are using cpu, use

conda create -n tmvec python -c pytorch

Once your conda enviroment is installed, install tmvec via:

pip install git+https://github.com/valentynbez/tmvec.git

If you are using a GPU, you may need to reinstall the gpu version of pytorch. See the pytorch webpage for more details.

Run TM-Vec from the command line

If the computer is connected to the internet, then all the models will be downloaded automatically. If the computer is not connected to the internet, then the models will need to be downloaded manually, and the paths to the models will need to be specified.

tmvec build-db \
    --input-fasta small_embed.fasta \
    --output db_test/small_fasta

To query a sequences against a database use:

tmvec search \
    --query small_embed.fasta \
    --database db_test/small_fasta.npz \
    --output db_test/result.tsv

We suggest to make first runs on a smaller batches with internet connection. After first run models will be downloaded to cache directory, and afterwards can be manually inputted into CLI in case computing nodes do not have access to the internet.

Embed sequences with pLMs (experimental)

In order to run this command, run pip install -e git+https://github.com/user/project.git#egg=tmvec[embed].

Available models:

ProtT5
ESM
Ankh

tmvec embed --input-fasta small_embed.fasta \
    --model-type esm \
    --model-path facebook/esm2_t6_8M_UR50D \
    --cache-dir cache \
    --output-file small-embed.h5py

Parameter model-path can be both huggingface repo or a path in a local filesystem. If repo is provided, the model will be downloaded to cache-dir.

CPU/GPU difference

For CPU execution, we utilize ONNX protein language models, which give a slight speedup. For GPU, we use a standard ProtT5 with torch.compile directive, which seems to be faster than ONNX.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
benchmarks		benchmarks
data_visualization		data_visualization
notebooks		notebooks
scripts		scripts
src/tmvec		src/tmvec
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper

Installation

Run TM-Vec from the command line

Embed sequences with pLMs (experimental)

CPU/GPU difference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Paper

Installation

Run TM-Vec from the command line

Embed sequences with pLMs (experimental)

CPU/GPU difference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages