Hi there! In jermp/sshash#39, @jermp suggested that I use ggcat to produce input datasets for sshash.
I tried using ggcat, but unfortunately something seems wrong:
$ gzip -d se.ust.k31.fa.gz
$ ggcat build -k 31 -j 8 --eulertigs se.ust.k31.fa
...
Final output saved to: output.fasta.lz4
$ lz4 -d output.fasta.lz4
Decoding file output.fasta
Error 68 : Unfinished stream
This is using se.ust.k31.fa.gz as an input dataset for ggcat. Ultimately, I want to apply ggcat to compute eulertigs of Homo_sapiens.GRCh38.dna.toplevel.fa.gz for k=127, but with that dataset I too end up having Unfinished stream errors, and the resulting file is much smaller than I anticipate. Could you please advice if I'm doing anything wrong here?
Hi there! In jermp/sshash#39, @jermp suggested that I use ggcat to produce input datasets for sshash.
I tried using ggcat, but unfortunately something seems wrong:
$ gzip -d se.ust.k31.fa.gz $ ggcat build -k 31 -j 8 --eulertigs se.ust.k31.fa ... Final output saved to: output.fasta.lz4 $ lz4 -d output.fasta.lz4 Decoding file output.fasta Error 68 : Unfinished streamThis is using se.ust.k31.fa.gz as an input dataset for ggcat. Ultimately, I want to apply ggcat to compute eulertigs of Homo_sapiens.GRCh38.dna.toplevel.fa.gz for
k=127, but with that dataset I too end up havingUnfinished streamerrors, and the resulting file is much smaller than I anticipate. Could you please advice if I'm doing anything wrong here?