-
Notifications
You must be signed in to change notification settings - Fork 4
Add option to disable interleaving when combining word lists #63
Description
Is your feature request related to a problem? Please describe.
When combining multiple word lists with tidy, the tool currently interleaves words from the input files (taking word 1 from file1, word 1 from file2, word 2 from file1, word 2 from file2, etc.). While this behavior might be useful in some scenarios, there's no way to append word lists sequentially while preserving the original order and removing duplicates.
Current behavior:
- dict1.txt contains (all in bold):
- admin
- root
- user
- test
- dict2.txt contains:
- password
- guest
- admin
- backup
tidy --no-sort -o combined.txt dict1.txt dict2.txtOutput:
- admin
- password
- root
- guest
- user
- admin (interleaved)
- test
- backup
Note: "admin" appears twice because interleaving happens before deduplication
Describe the solution you'd like
Add a new flag (e.g., --no-interleave or --sequential) that processes input files sequentially:
- Process the first file completely
- Then process the second file (skipping duplicates)
- Continue with remaining files
Expected behavior with new flag:
tidy --no-sort --no-interleave -o combined.txt dict1.txt dict2.txtOutput:
- admin
- root
- user
- test
- password
- guest
- backup
(dict1 complete, then dict2 without duplicates, preserving order)
Use case
This feature is particularly useful for penetration testing and CTF scenarios where:
- Word lists are often sorted by frequency/relevance (most common words first)
- Users want to combine a primary high-quality list with supplementary lists
- Preserving the priority order of the primary list is critical for efficient fuzzing/bruteforcing
- Example: Combining
rockyou.txt(ordered by frequency) with custom company-specific wordlists
Describe alternatives you've considered
Current workarounds require using external tools:
# Using awk (not cross-platform, lacks tidy's other features)
cat dict1.txt dict2.txt | awk '!seen[$0]++'
# Using grep (very slow with large files)
cat dict1.txt > combined.txt && grep -Fxv -f dict1.txt dict2.txt >> combined.txtThese workarounds don't benefit from tidy's other powerful features like:
--minimum-word-length/--maximum-word-length--remove-prefix/--remove-suffix--lowercase--remove-nonascii- Attribute analysis
Additional context
The --no-sort flag already disables alphabetical sorting, which is great. This new flag would be a natural extension that gives users full control over input processing order.
Suggested implementation approach:
- Add
--no-interleave/--sequentialflag - When enabled, process input files one at a time instead of alternating
- Maintain the existing deduplication logic
- Works in combination with
--no-sortfor full order preservation
Would you be willing to submit a PR for this feature?
Not at the moment