New
đ New product!
Weâre excited to introduce Jobcurator, an openâsource Machine Learning library designed to clean, normalize, structure, compress, and sample large job datasets â before they reach your AI, ranking, or search systems.
Jobcurator acts as a data curation layer, sitting upstream of your pipelines to ensure job data quality from the very first step.

đ Why itâs a big deal for HrFlow.ai users?
At HrFlow.ai, our mission is to solve unemployment, one API at a time â and highâquality job data is a critical part of the equation.
If you work as a Data Scientist, Data Engineer, or Product Manager on a job board, aggregator, or programmatic distribution platform, you already know the problem:
- Job feeds are messy, highly redundant, poorly structured, and hard to scale efficiently.
- Raw job data is rarely ready for intelligent processing.
Jobcurator was built to fix this.
It transforms noisy job streams into highâquality, deduplicated, and diverse datasets, ready to power:
- AI ranking & matching
- Job recommendation systems
- Search engines
- Analytics pipelines
Results: Better data in â better AI out.
⨠Whatâs included in this release?
Jobcurator provides a powerful yet lightweight toolkit to process job feeds at scale.
Core capabilities
- Clean & organize job data â remove duplicates and normalize information
- AIâready outputs â optimized for ranking, matching, and search systems
- Smart compression â keep the best jobs while preserving diversity
- Robust similarity detection â multiâprobe LSH with geoâaware clustering
- Incremental processing â handle new batches over time
Flexible architecture
- Multiple backends available:
- SimHash
- MinHash
- FAISS
- Optional outlier detection
- Supports local or SQLâbased storage for incremental processing
Built for scale
- Handles millions of jobs
- Memoryâfriendly â no heavy infrastructure required
- No GPUs, no embeddings needed
- Tiny footprint â ~255 KB installed
- Optimized for largeâscale processing with minimal overhead
đ§ How does it work?
Jobcurator can be integrated easily into existing data pipelines.
At a high level:
- Ingest raw job data (job feeds, aggregators, crawlers, APIs)
- Choose a similarity backend (SimHash, MinHash, FAISS, etc.)
- Run deduplication and smart compression
- Output a clean, structured, and diverse job dataset
- Send the curated data to your AI, search, or ranking systems
Jobcurator works out of the box, with sensible defaults â and can be progressively customized as your needs grow.
đ Code examples and advanced configurations are available in the GitHub documentation.

đĄUseful Links
- Jobcurator Full documentation on GitHub
- Sign up to HrFlow.ai
- API Authentification
- Create and configure a Board
đŁď¸ Our roadmap is public!
Looking for a specific feature or improvement?
Submit a request or upvote an existing one here:







Once configured, the Hiring Agent will process your talent pool, search based on job criteria, and deliver a ranked list of matching profiles, ready for evaluation and outreach.













