Feature request: empirical sequencing-error-rate estimation

## Background

I'm the maintainer of [atropos](https://github.com/jdidion/atropos), a fork of cutadapt that I'm now winding down. Before archiving, I'm surfacing a few features atropos has that cutadapt doesn't, in case any are interesting upstream.

## Proposal

Add an empirical per-base error-rate estimator — either as a standalone `cutadapt error` subcommand or as an automatic pre-pass that informs the default adapter-match error tolerance (`-e`).

### Two possible methods

1. **Quality-based**: sum per-base `10^(-Q/10)` across the input, divide by base count. Streams, no calibration needed, but inflated by quality-score miscalibration.
2. **[Wang et al. 2012](https://doi.org/10.1186/1471-2105-13-185) shadow regression**: regress mismatching-read count against unique-read count across read-length prefixes, solve for per-base error. Works without an alignment reference.

### Why this is useful

Users choosing `-e` currently guess. If cutadapt could tell them "the empirical error rate is 1.2% — your `-e 0.1` is well above that", it would make stringency choices much easier to justify and catch obvious run-quality problems.

### Prior art

- atropos `error` subcommand implements both: https://github.com/jdidion/atropos/blob/master/atropos/commands/error/__init__.py
- The shadow-regression implementation in atropos currently shells out to R; a pure-Python/Cython port would be cleaner for cutadapt.

Happy to help if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: empirical sequencing-error-rate estimation #881

Background

Proposal

Two possible methods

Why this is useful

Prior art

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: empirical sequencing-error-rate estimation #881

Description

Background

Proposal

Two possible methods

Why this is useful

Prior art

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions