Skip to content

Expose more gtars functionality via python#241

Merged
nsheff merged 5 commits intodevfrom
overlaps-updates
Mar 10, 2026
Merged

Expose more gtars functionality via python#241
nsheff merged 5 commits intodevfrom
overlaps-updates

Conversation

@nsheff
Copy link
Copy Markdown
Member

@nsheff nsheff commented Mar 6, 2026

Working on improving performance of bedshift; need some work on gtars guts to python.

@nsheff nsheff requested a review from nleroy917 March 6, 2026 22:44
@nsheff nsheff marked this pull request as ready for review March 6, 2026 22:44
Copy link
Copy Markdown
Member

@nleroy917 nleroy917 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just really a small question inside MultiChromOverlapper

/// **Note:** All operations produce regions with `rest: None`. Metadata from
/// the `rest` field of input regions is not preserved — operations like `reduce()`
/// merge multiple regions, so there is no unambiguous `rest` to carry forward.
pub trait IntervalRanges {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what the motivation was to make this a trait? Is it implemented elsewhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make the trait... this PR isn't introducing the trait, it was there.

Looks like @sanghoonio created the IntervalRanges trait in commit 6f465f2 on Feb 18, with the initial 5 GenomicRanges operations, and followed up with a second commit on Feb 20

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it doesn't make sense for this to be a trait, at this point, but I'll let @sanghoonio respond.

return RegionSet::from(Vec::<Region>::new());
}

let index = other.clone().into_multi_chrom_overlapper(OverlapperType::AIList);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought... this operation can be expensive since you're building the index. Does it make more sense for this function to take as input a MultiChromOverlapper? That way if you are performing many intersect calls, you can build the index once and pass in by reference?

I guess just proposing to decouple the indexing building from the intersection?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a legitimate concern. but it points at something more fundamental. we need to re-evaluate the whole trait/regionset/MCO relationships, which I can't really tackle at this point. So will address this in the future.

@nsheff nsheff merged commit 76753b7 into dev Mar 10, 2026
@nsheff nsheff deleted the overlaps-updates branch March 10, 2026 16:12
sanghoonio added a commit that referenced this pull request Apr 19, 2026
Surgically reverts the four per-file summary statistics added in #254
(`calc_inter_peak_spacing`, `calc_peak_clusters`, `calc_density_vector`,
`calc_density_homogeneity`) and the `--cluster-radii` / `--cluster-min-size`
CLI flags that drove them. On review these APIs are either thin scalar
reductions over existing primitives (`calc_neighbor_distances`,
`region_distribution_with_chrom_sizes`) or — in the case of
`calc_peak_clusters` — parameterized by a biologically-underdetermined
stitching radius. Not load-bearing for any current consumer.

## Removed

- gtars-genomicdist: 4 trait methods and 4 result structs (`SpacingStats`,
  `ClusterStats`, `DensityVector`, `DensityHomogeneity`), ~28 unit tests.
- gtars-cli: `--cluster-radii`, `--cluster-min-size` flags and the four
  JSON output fields in `gtars genomicdist`.
- gtars-python: `inter_peak_spacing()`, `peak_clusters()`, `density_vector()`,
  `density_homogeneity()` on `PyRegionSet`; 4 `#[pyclass]` result wrappers;
  `.pyi` stubs; 11 pytest cases.
- gtars-r: `interPeakSpacing`, `peakClusters`, `densityVector`,
  `densityHomogeneity` S4 generics and methods; 4 `r_calc_*` Rust wrappers;
  8 `man/*.Rd` files; NAMESPACE exports; testthat cases.
- gtars-wasm: `interPeakSpacing`, `peakClusters`, `densityVector`,
  `densityHomogeneity` methods on `JsRegionSet`.

## Kept (also from #254, orthogonally good)

- `IntervalRanges::gaps(chrom_sizes)` rewrite. Real bug fix — the old
  signature couldn't emit trailing gaps and was silently wrong at every
  chromosome end. The breaking API change stands.
- `cluster()` binding parity: R `clusterRegions(maxGap)` and WASM
  `cluster(max_gap)`. These expose a pre-existing Rust primitive
  (`IntervalRanges::cluster` from #241) to R and WASM for parity with
  Python. Independent of the stats wrapper critique.
- `calcDinuclFreq` R wrapper arity fix (side fix in #254).
- Version bumps (genomicdist 0.8.0, cli/python/wasm/R 0.9.0). The `gaps()`
  signature change is breaking on its own and independently warrants
  the minor bump.

## Verification

- `cargo test -p gtars-genomicdist` — 275 passed, 0 failed
- `cargo check --workspace` — clean
- `cargo check -p gtars-js --target wasm32-unknown-unknown` — clean
- `pytest tests/test_regionset.py` — 17 passed
- `Rscript -e 'devtools::test()'` — 259 passed, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants