-
Notifications
You must be signed in to change notification settings - Fork 176
Closed
Description
Prerequisites
- make sure you're are using the latest version by
seqkit version - read the usage
Describe your issue
I have a FASTQ file from which I would like to subsample very specific sequences by ID. Additionally, some sequences should be subsampled multiple times. However, using seqkit grep with --pattern-file, it extracts each pattern only once.
seqkit grep -f id_list.txt mock.fq
[INFO] 3 patterns loaded from file
The file contains 4 patterns (2 unique and 1 duplicate). Would it be possible to add a parameter such that the patterns are not converted to unique patterns?
In contrast, seqtk does not only extract unique IDs:
seqtk subseq mock.fq id_list.txt
All 4 patterns are used, so the output contains 4 sequences.
mock.fq
@seq1
GATCGATCGA
+
IIIIIIIIII
@seq2
AGCTAGCTAG
+
IIIIIIIIII
@seq3
TACGTACGTA
+
IIIIIIIIII
@seq4
CGATCGATCG
+
IIIIIIIIII
@seq5
ATCGATCGAT
+
IIIIIIIIII
@seq6
GCTAGCTAGC
+
IIIIIIIIII
@seq7
CATGCATGCA
+
IIIIIIIIII
@seq8
TGCATGCATG
+
IIIIIIIIII
@seq9
AGCTAGCTAG
+
IIIIIIIIII
@seq10
ATCGATCGAT
+
IIIIIIIIII
id_list.txt
seq1
seq1
seq2
seq3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels