Skip to content

add streaming uniwig alongside current batch parallel implementation#236

Merged
nsheff merged 1 commit intodevfrom
streaming-uniwig
Mar 6, 2026
Merged

add streaming uniwig alongside current batch parallel implementation#236
nsheff merged 1 commit intodevfrom
streaming-uniwig

Conversation

@nsheff
Copy link
Copy Markdown
Member

@nsheff nsheff commented Feb 28, 2026

Adds a streaming mode to uniwig (--streaming) that computes coverage counts with O(smooth_size) memory instead of O(chromosome_size). Processes BED input line-by-line using a sliding VecDeque window, supports all three count types (start/end/core), outputs WIG or bedGraph, and handles gzip transparently. Works with stdin/stdout for piping.

A key addition is sparse output — the existing batch mode always writes every position from 1 to chrom_size, even where counts are zero. Streaming defaults to emitting only non-zero positions, which dramatically reduces output size (1.2 GB vs 5.8 GB on 10M records) and is the main reason it's faster despite using a single thread.

The --dense flag controls gap handling: 0 (default) = sparse, -1 = fully dense (match batch behavior), and any positive N fills gaps ≤N bases wide. Default of 100 is essentially free.

Also fixes a bug in the batch path where all 3 count types were always computed regardless of -u flag — this was inflating batch runtimes by ~2.5x.

Usage

# Stream a BED file, output start counts as WIG
gtars uniwig --streaming -f regions.bed --chromref hg38.chrom.sizes -u start

# Pipe from stdin, output to stdout as bedGraph
cat sorted.bed | gtars uniwig --streaming --stdout -y bedgraph --chromref hg38.chrom.sizes

# Dense output (fill all gaps between data clusters)
gtars uniwig --streaming -f regions.bed --chromref hg38.chrom.sizes --dense -1

# Fill only small gaps (≤100bp)
gtars uniwig --streaming -f regions.bed --chromref hg38.chrom.sizes --dense 100

Benchmarks

Synthetic BED data, 10M records, 24 chromosomes, full hg38 sizes. --smoothsize 25 --stepsize 1 -u start -y wig:

Mode Cores Time Memory Output Size
Batch 6 42.8s 8.0 GB 5.8 GB (dense)
Streaming sparse 1 28.1s 3.9 MB 1.2 GB
Streaming dense 1 205.6s 1.5 GB 5.8 GB

Sparse streaming is faster (28s vs 43s), uses 2000x less memory (4 MB vs 8 GB), and produces 5x smaller output (1.2 GB vs 5.8 GB) — all on a single thread vs batch's 6 cores. Dense-for-dense, per-core throughput is roughly equivalent.

@nsheff nsheff marked this pull request as ready for review February 28, 2026 13:00
@nsheff
Copy link
Copy Markdown
Member Author

nsheff commented Feb 28, 2026

@donaldcampbelljr take a look at this, I'm proposing we replace the batch implementation with this streaming implementation. I wanted to leave them side by side at first to do the benchmarking.

@donaldcampbelljr
Copy link
Copy Markdown
Member

Fill only small gaps (≤100bp)

gtars uniwig --streaming -f regions.bed --chromref hg38.chrom.sizes --dense 100

Could you offer more clarity on how this may affect downstream tools that use these file types for input? I'm struggling to visualize how the output file is affected and am concerned it might break some use cases. I didn't see any output test files that demonstrate the different output options for quick comparison/grokking of how this is working.

@nsheff nsheff merged commit 1960aed into dev Mar 6, 2026
@nsheff nsheff deleted the streaming-uniwig branch March 6, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants