Repeatable unidimensional data clustering inspired by Ckmeans.1d.dp
Install the gem and add to the application's Gemfile by executing:
bundle add ckmeansIf bundler is not being used to manage dependencies, install the gem by executing:
gem install ckmeans# Fixed cluster count (K known in advance)
Ckmeans::Clusterer.new(data, 3).clusters
Ckmedian::Clusterer.new(data, 3).clusters
# Automatic K selection (tries K from kmin to kmax, picks optimal)
Ckmeans::Clusterer.new(data, 1, 10).clusters
Ckmedian::Clusterer.new(data, 1, 10).clusters- Ckmeans - Minimizes squared distances (L2). Good for normally distributed data.
- Ckmedian - Minimizes absolute distances (L1). More robust to outliers and data bursts.
# For clean numerical data
temperatures = [20.1, 20.2, 25.5, 25.6, 30.1, 30.2]
Ckmeans::Clusterer.new(temperatures, 1, 5).clusters
# => [[20.1, 20.2], [25.5, 25.6], [30.1, 30.2]]
# For data with outliers (e.g., photo timestamps with bursts)
timestamps = photos.map(&:taken_at).map(&:to_i)
Ckmedian::Clusterer.new(timestamps, 1, 20).clustersBy default, both algorithms use a fast heuristic for estimating K. For datasets with many duplicates, tight clusters, or outliers, use :stable for more robust estimation:
# Stable estimation (uses statistical mixture models)
Ckmeans::Clusterer.new(data, 1, 10, :stable).clusters
Ckmedian::Clusterer.new(data, 1, 10, :stable).clustersWhen to use :stable:
- Small to medium datasets (< 1000 points)
- Many duplicate values
- Clusters with very different sizes
- Photo/event timeline clustering (bursts and gaps)
Expert users: :stable is an alias for :gmm (Gaussian Mixture Model) in Ckmeans and :lmm (Laplace Mixture Model) in Ckmedian.
The gem is available as open source under the terms of the LGPL v3 License.