A simple implementation of a Bloom Filter data structure in C for fast dictionary word lookups.
A Bloom Filter is a space-efficient probabilistic data structure that tells you if an element is "definitely not in a set" or "possibly in a set". It can have false positives but never false negatives.
- Fast word existence checking
- Memory efficient (uses ~1.4 MB for 1M+ words)
- 1% false positive rate
- Uses djb2 and sdbm hash functions
bloom_filter.c- Main implementationbloom_filter.h- Header file with definitionswords.txt- Dictionary file (~21MB)
gcc bloom_filter.c -lm -o bloom_filterRun the program:
./bloom_filterThen type words to check if they exist in the dictionary:
Enter a word to search in dictionary: hello
(True Positive) Word hello exist.
Definition: used as a greeting or to begin a phone conversation
Time taken: 0.001234 seconds
Enter a word to search in dictionary: asdfgh
(True Negative) Word asdfgh does not exist.
Time taken: 0.000567 seconds
- True Positive: Word exists in both bloom filter and dictionary
- False Positive: Bloom filter says yes, but word not actually in dictionary (rare)
- True Negative: Word definitely doesn't exist
- Expected elements: 1,000,000
- False positive rate: 1% (0.01)
- Hash functions: 7 (calculated automatically)
- Memory usage: ~1.4 MB
- Dictionary words are loaded and hashed into a bit array
- When you search a word, it's hashed the same way
- If any required bits are 0, the word definitely doesn't exist
- If all required bits are 1, the word probably exists (checked against actual dictionary)
This makes lookups extremely fast while using minimal memory!