Skip to content

Add option to disable interleaving when combining word lists #63

@JulianGR

Description

@JulianGR

Is your feature request related to a problem? Please describe.

When combining multiple word lists with tidy, the tool currently interleaves words from the input files (taking word 1 from file1, word 1 from file2, word 2 from file1, word 2 from file2, etc.). While this behavior might be useful in some scenarios, there's no way to append word lists sequentially while preserving the original order and removing duplicates.

Current behavior:

  • dict1.txt contains (all in bold):
    • admin
    • root
    • user
    • test
  • dict2.txt contains:
    • password
    • guest
    • admin
    • backup
tidy --no-sort -o combined.txt dict1.txt dict2.txt

Output:

  • admin
  • password
  • root
  • guest
  • user
  • admin (interleaved)
  • test
  • backup

Note: "admin" appears twice because interleaving happens before deduplication

Describe the solution you'd like

Add a new flag (e.g., --no-interleave or --sequential) that processes input files sequentially:

  1. Process the first file completely
  2. Then process the second file (skipping duplicates)
  3. Continue with remaining files

Expected behavior with new flag:

tidy --no-sort --no-interleave -o combined.txt dict1.txt dict2.txt

Output:

  • admin
  • root
  • user
  • test
  • password
  • guest
  • backup

(dict1 complete, then dict2 without duplicates, preserving order)

Use case

This feature is particularly useful for penetration testing and CTF scenarios where:

  • Word lists are often sorted by frequency/relevance (most common words first)
  • Users want to combine a primary high-quality list with supplementary lists
  • Preserving the priority order of the primary list is critical for efficient fuzzing/bruteforcing
  • Example: Combining rockyou.txt (ordered by frequency) with custom company-specific wordlists

Describe alternatives you've considered

Current workarounds require using external tools:

# Using awk (not cross-platform, lacks tidy's other features)
cat dict1.txt dict2.txt | awk '!seen[$0]++'

# Using grep (very slow with large files)
cat dict1.txt > combined.txt && grep -Fxv -f dict1.txt dict2.txt >> combined.txt

These workarounds don't benefit from tidy's other powerful features like:

  • --minimum-word-length / --maximum-word-length
  • --remove-prefix / --remove-suffix
  • --lowercase
  • --remove-nonascii
  • Attribute analysis

Additional context

The --no-sort flag already disables alphabetical sorting, which is great. This new flag would be a natural extension that gives users full control over input processing order.

Suggested implementation approach:

  • Add --no-interleave / --sequential flag
  • When enabled, process input files one at a time instead of alternating
  • Maintain the existing deduplication logic
  • Works in combination with --no-sort for full order preservation

Would you be willing to submit a PR for this feature?

Not at the moment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions