datatasketches

package module
v0.1.0-RC1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 31, 2025 License: Apache-2.0 Imports: 0 Imported by: 0

README

Go Go Report Card Release GoDoc License

Apache® DataSketches™ Core Go Library Component

This is the core Go component of the DataSketches library. It contains some of the sketching algorithms and can be accessed directly from user applications.

Note that we have a parallel core component for C++, Java and Python implementations of the same sketch algorithms, datasketches-cpp and datasketches-java.

Please visit the main DataSketches website for more information.

If you are interested in making contributions to this site please see our Community page for how to contact us.

Major Sketches

Type Implementation Status
Cardinality
CpcSketch ⚠️
HllSketch ⚠️
ThetaSketch ⚠️
TupleSketch ⚠️
Quantiles
CormodeDoublesSketch
CormodeItemsSketch
KllDoublesSketch
KllFloatsSketch
KllSketch ⚠️
ReqFloatsSketch
Frequencies
FreqLongsSketch ⚠️
FreqItemsSketch ⚠️
CountMinSketch ⚠️
Sampling
ReservoirLongsSketch
ReserviorItemsSketch
VarOptItemsSketch
Membership
BloomFilter 🚧

Specialty Sketches

Type Interface Name Status
Cardinality/FM85 UniqueCountMap
Cardinality/Tuple
FdtSketch
ArrayOfDoublesSketch
DoubleSketch
IntegerSketch
ArrayOfStringsSketch
EngagementTest3

❌ = Not yet implemented

⚠️ = Implemented but not officially released

🚧 = In progress

Build & Runtime Dependencies

This code requires Go 1.24

Compilation and Test

Test can be run using go test command

go test ./...

A Dockerfile is also provided with the necessary env to build and test the project.

./build/Dockerfile
./build/run-docker-test.sh

Documentation

Overview

Package datatasketches is the parent package for all sketch families and common code areas.

The Sketching Core Library provides a range of stochastic streaming algorithms that are particularly useful when integrating this technology into systems that must deal with massive data. The library is designed to be easy to use, highly performant, and memory efficient.

Directories

Path Synopsis
Package filters provides probabilistic membership data structures for efficient set membership testing with controlled false positive rates.
Package filters provides probabilistic membership data structures for efficient set membership testing with controlled false positive rates.
Package frequencies is dedicated to streaming algorithms that enable estimation of the frequency of occurrence of items in a weighted multiset stream of items.
Package frequencies is dedicated to streaming algorithms that enable estimation of the frequency of occurrence of items in a weighted multiset stream of items.
Package hll is dedicated to streaming algorithms that enable estimation of the cardinality of a stream of items.
Package hll is dedicated to streaming algorithms that enable estimation of the cardinality of a stream of items.
binomialproportionsbounds
Package binomialproportionsbounds computes an approximation to the Clopper-Pearson confidence interval for a binomial proportion.
Package binomialproportionsbounds computes an approximation to the Clopper-Pearson confidence interval for a binomial proportion.
Package kll is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.</p>
Package kll is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.</p>

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL