Performance notes / ideas

## Ideas for speeding this up

- [x] Use sparse matrices
- [x] Faster k-mer counting / comparison algorithms (update: I used suffix arrays, although [the way I used them](https://github.com/fedarko/wotplot/blob/1f92745c36ec47d090a7372612b8ae977ddfe4c4/wotplot/_make.py#L83-L250) could probs be sped up)
- [ ] Use input sequence size to determine which k-mer counting method to use (the older naive, memory-inefficient method is faster for small inputs than the suffix-array-based method)
- [ ] Do the conversions of `s1`, `s2`, and `rc(s2)` to bytes at the start of matrix construction, and then do everything thereafter in bytes? (At the very least, don't convert both `s2` and `rc(s2)` to bytes separately; that's silly.)
- [ ] Use Cython / etc.
- [ ] Support FASTA files as input and then process them in chunks or something -- removes need to store massively long sequences in memory (not sure how this would work with pydivsufsort, tho)
- [ ] Use rolling hashes? See https://bioinformatics.stackexchange.com/questions/19/are-there-any-rolling-hash-functions-that-can-hash-a-dna-sequence-and-its-revers (and also LJA paper)
- [x] Replace `rc()` function with `str.maketrans`: https://bioinformatics.stackexchange.com/questions/3583/what-is-the-fastest-way-to-get-the-reverse-complement-of-a-dna-sequence-in-pytho
- If both sequences are equal (i.e. we're creating a self dot plot), use this to speed up dot plot construction. Some ideas:
  - [x] Don't bother creating an extra suffix array
  - [ ] Only fill in one half of the matrix triangle, since the upper and lower triangle in a self dot plot should be symmetric? (this might be hard to do using the suffix array approach, tho)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance notes / ideas #2

Ideas for speeding this up

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance notes / ideas #2

Description

Ideas for speeding this up

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions