cdhit icon indicating copy to clipboard operation
cdhit copied to clipboard

how to run cd-hit-454 with very large file

Open sekhwal opened this issue 6 years ago • 1 comments

I am trying cd-hit-454 to remove duplicate sequences in a file that is 850992KB of size with the following command.

cd-hit-454 -i /home/kumarm/CD-HIT/marge1.fasta -o /home/kumarm/CD-HIT/454_reads_95 -c 0.99 -n 10 -d 0 -M 0 -T 8 -B 1

ERROR: Fatal Error: in diag_test_aapn_est, MAX_DIAG reached Program halted !!

sekhwal avatar Jul 19 '19 21:07 sekhwal

#define MAX_SEQ 655360
#define MAX_DIAG (MAX_SEQ<<1)              // MAX_DIAG be twice of MAX_SEQ

You would need to edit the file below and change MAX_SEQ and recompile.

https://github.com/weizhongli/cdhit/blob/97ece86dc11f43e87b1ecb1f335c13dbc3ec341e/cdhit-common.h

tseemann avatar Oct 02 '19 06:10 tseemann