Skip to content

San7o/hll.h

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hll.h
=====

Configurable, header-only implementation of HyperLogLog in C99.

Author: Giovanni Santini
Mail: giovanni.santini@proton.me
License: MIT


Documentation
-------------

This header-only C99 library implements the HyperLogLog algorithm
for approximating the cardinality (number of distinct elements) of
large multisets.

The algorithm uses a hash function to map input elements to uniformly
distributed bit patterns and records the maximum run of leading zeros
across many independent registers. From this, it computes an estimate
of the set’s cardinality with low memory footprint and tunable accuracy.

Features:
  - Header-only, portable C99 implementation
  - Adjustable precision/space-accuracy tradeoff
  - Suitable for large-scale data streams

Reference:
  S. Heule, M. Nunkesser, A. Hall,
  "HyperLogLog in Practice: Algorithmic Engineering of a State of
   The Art Cardinality Estimation Algorithm"


Usage
-----

Do this:

  #define HLL_IMPLEMENTATION

before you include this file in *one* C or C++ file to create the
implementation.

i.e. it should look like this:

  #include ...
  #include ...
  #include ...
  #define HLL_IMPLEMENTATION
  #include "hll.h"

You can tune the library by #defining certain values. See the
"Config" comments under "Configuration" section in the header.

You need to initialize a hll_t with `hll_init` and eventually
destroy it with `hll_destroy`. To add a value, use `hll_add`. To
get an estimate of the cardinality of the values, use
`hll_count`. See the function definitions for more information.

Check some examples at the end of the header.


Code
----

The official git repository of micro-hll.h is hosted at:

    https://github.com/San7o/micro-hll.h

This is part of a bigger collection of header-only C99 libraries
called "micro-headers", contributions are welcome:

    https://github.com/San7o/micro-headers



TODO
----

The library should be tested more. Right now, some precision values
do not work well so the optimal precision should be handpicked
manually.
Only the dense representation is implemented.
Additionally, no empirical bias correction is applied.

About

Configurable, header-only implementation of HyperLogLog in C99.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors