Add GC content parameter --mingc --maxgc#30
Conversation
|
Yeah, the performance is an issue. I will make it an optional parameter then. |
|
I updated the code to check if |
|
I have made some changes to how you implemented this. Would you agree that this is equivalent to your code? |
|
Yeah, I will run some benchmark using huge |
|
Hi @wdecoster I ran some tests on a 44G FASTQ file from Drosophila melanogaster ONT sequencing and found some interesting results.
I found that the GC filter does not significantly affect the run time. The |
|
Oh yes indeed, that is interesting. I agree that the GC calculation doesn't slow things down when it is not needed, so that is great. Out of curiosity, did you also benchmark it when you did specify a min or max gc? It seems gunzip does a far better job than the decompression done by Rust. I wonder if it is because you are then running the decompression in another process, piping the output, but I don't know how things get implemented in flate2. But things like this are also precisely the reason why I initially chopper only read from stdin. Unix tools are intended to do one thing well, and just one thing, and piping tools into the next tool is a great approach. I will add a note regarding your findings to the README. |
add note on speed of decompression while piping
|
Yeah I added the test for
Maybe I will deal with this problem the |

Hi @wdecoster ,
Recently, I added parameters for GC content filter
--mingcand--maxgcfor my own project usage. I also added atestGC.fastqfor testing under/chopper/test-data.It is very useful to deal with high-GC bacteria ONT sequencing.
Hope u like my modifications.