Skip to content

mmseqs createindex --split not generating correct number of splits #432

@nick-youngblut

Description

@nick-youngblut

Expected Behavior

I expect --split 16 for mmseqs createindex to generate 16 *.idx files. Instead, I'm getting 18:

mmseqs_tax_target/mmseqs_tax.db.idx.0
mmseqs_tax_target/mmseqs_tax.db.idx.1
mmseqs_tax_target/mmseqs_tax.db.idx.10
mmseqs_tax_target/mmseqs_tax.db.idx.11
mmseqs_tax_target/mmseqs_tax.db.idx.12
mmseqs_tax_target/mmseqs_tax.db.idx.13
mmseqs_tax_target/mmseqs_tax.db.idx.14
mmseqs_tax_target/mmseqs_tax.db.idx.15
mmseqs_tax_target/mmseqs_tax.db.idx.16
mmseqs_tax_target/mmseqs_tax.db.idx.17
mmseqs_tax_target/mmseqs_tax.db.idx.2
mmseqs_tax_target/mmseqs_tax.db.idx.3
mmseqs_tax_target/mmseqs_tax.db.idx.4
mmseqs_tax_target/mmseqs_tax.db.idx.5
mmseqs_tax_target/mmseqs_tax.db.idx.6
mmseqs_tax_target/mmseqs_tax.db.idx.7
mmseqs_tax_target/mmseqs_tax.db.idx.8
mmseqs_tax_target/mmseqs_tax.db.idx.9

Pipeline software (eg., snakemake) generally requires keeping track of all (important) output files produced; otherwise, untracked output files can accidentally be deleted, which is is causing some downstream problems (eg., seg-fault errors for mmseqs taxonomy).

Steps to Reproduce (for bugs)

mmseqs createindex --threads 8   --split 16 mmseqs_tax.db mmseqs_tax_target/tmp/

Your Environment

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2021.1.19            h06a4308_1
gawk                      5.1.0                h516909a_0    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libidn2                   2.3.0                h516909a_0    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libunistring              0.9.10               h14c3975_0    conda-forge
mmseqs2                   13.45111             h95f258a_1    bioconda
openssl                   1.1.1k               h7f98852_0    conda-forge
pigz                      2.6                  h27826a3_0    conda-forge
seqkit                    0.15.0                        0    bioconda
seqtk                     1.3                  h5bf99c6_3    bioconda
wget                      1.20.1               h22169c7_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge

OS: Ubuntu 18.04.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions