Due to a recent cyberattack, our primary dbCAN web server is currently offline, and you will not be able to access the online database. Our IT team is actively working to resolve the issue. We apologize for any inconvenience this may cause.
In the meantime, you can still obtain the dbCAN database using our AWS S3 backup. Recommended methods:
1. Use the run_dbcan database command (recommended):
run_dbcan database --db_dir db --aws_s3This command will download and organize the database files automatically.
2. Download via wget (not for folders):
Please note that wget cannot directly download an entire folder from an S3 bucket. It can only fetch individual files. To download all files, you will need to list the files and download them one by one or use AWS CLI. If you still want to download using wget, you must specify each file’s URL directly, for example:
wget https://dbcan.s3.us-west-2.amazonaws.com/db_v5-2_9-13-2025/some_fileIf you want to download the entire folder, please use the AWS CLI as follows:
aws s3 cp s3://dbcan/db_v5-2_9-13-2025/ ./db --recursiveFor more details on database downloads, please refer to our documentation.
If you have any questions or need help, feel free to open an issue.
10/20/2025:
- SignalP6.0 Topology Annotation: Added support for SignalP6.0 signal peptide prediction. Use
--run_signalpflag inCAZyme_annotationcommand to enable topology annotation. Results are automatically added to the overview.tsv file. - Global Logging System: Implemented comprehensive logging system with
--log-level,--log-file, and--verboseoptions for better debugging and monitoring. - Database Download Command: Added new
databasecommand for easy database downloading. Supports both HTTP and AWS S3 sources (use--aws_s3flag for faster downloads). Use--cgc/--no-cgcto control CGC-related database downloads. - Code Structure Improvements: Continued refactoring with object-oriented programming, improved modularity, and centralized configuration management.
5/12/2025:
dev-dbcan branch is used to test new functions and fix issues. After testing, this branch will be merged into the main branch and update docker/conda/pypi. If you want to use those beta functions, please replace the code folder (dbcan) with your current package.
3/16/2025:
- Rewrite the structure of run_dbcan 4.0 (suggested by Haidong), using object-oriented programming (OOP) to improve maintainability and readability.
- Added new function: cgc_circle, which can visualize CGC in genome.
Future plans Add prediction of food consumption through CAZyme. If you have new suggestions, please contact Dr. Yanbin Yin (yyin@unl.edu), Xinpeng Zhang (xzhang55@huskers.unl.edu), and Dr. Haidong Yi (hyi@stjude.org).
Notice
This is the updated version of run_dbcan 4.0. Many changes have been made and described in https://run-dbcan.readthedocs.io/en/latest/. From now on, this repo is the official run_dbcan site, and the site at run_dbcan 4.0 will be no longer maintained.
run_dbcan is the standalone version of the dbCAN3 annotation tool for automated CAZyme annotation. This tool, known as run_dbcan, incorporates pyHMMER (replacing HMMER for better performance), Diamond, and dbCAN_sub for annotating CAZyme families, and integrates CAZyme Gene Clusters (CGCs) and substrate predictions.
The tool provides the following main commands:
database- Download dbCAN databases (supports HTTP and AWS S3)CAZyme_annotation- Annotate CAZymes using Diamond, pyHMMER, and dbCAN-subgff_process- Generate GFF files for CGC identificationcgc_finder- Identify CAZyme Gene Clusters (CGCs)substrate_prediction- Predict substrate specificities of CGCscgc_circle_plot- Generate circular plots for CGCseasy_CGC- Complete CGC analysis pipeline (annotation + GFF processing + CGC identification)easy_substrate- Complete CGC analysis with substrate predictionPfam_null_cgc- Annotate null genes in CGCs using Pfam
All commands support global logging options: --log-level, --log-file, and --verbose.
For usage discussions, visit our issue tracker. To learn more, read the dbcan doc. If you're interested in contributing, whether through issues or pull requests, please review our contribution guide.
Please cite the following dbCAN publications if you use run_dbcan in your research:
dbCAN3: automated carbohydrate-active enzyme and substrate annotation
Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin,
Nucleic Acids Research, 2023;, gkad328, doi: 10.1093/nar/gkad328.
dbCAN2: a meta server for automated carbohydrate-active enzyme annotation
Han Zhang, Tanner Yohe, Le Huang, Sarah Entwistle, Peizhi Wu, Zhenglu Yang, Peter K Busk, Ying Xu, Yanbin Yin
Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W95–W101, doi: 10.1093/nar/gky418.
dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation
Le Huang, Han Zhang, Peizhi Wu, Sarah Entwistle, Xueqiong Li, Tanner Yohe, Haidong Yi, Zhenglu Yang, Yanbin Yin
Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D516–D521, doi: 10.1093/nar/gkx894*.
