cdhit
cdhit copied to clipboard
how to run cd-hit-454 with very large file
I am trying cd-hit-454 to remove duplicate sequences in a file that is 850992KB of size with the following command.
cd-hit-454 -i /home/kumarm/CD-HIT/marge1.fasta -o /home/kumarm/CD-HIT/454_reads_95 -c 0.99 -n 10 -d 0 -M 0 -T 8 -B 1
ERROR: Fatal Error: in diag_test_aapn_est, MAX_DIAG reached Program halted !!
#define MAX_SEQ 655360
#define MAX_DIAG (MAX_SEQ<<1) // MAX_DIAG be twice of MAX_SEQ
You would need to edit the file below and change MAX_SEQ and recompile.
https://github.com/weizhongli/cdhit/blob/97ece86dc11f43e87b1ecb1f335c13dbc3ec341e/cdhit-common.h