Skip to content

brandonhimpfen/awesome-synthetic-data

Repository files navigation

Awesome Synthetic Data Awesome Lists

DOI
GitHub Sponsor   Buy Me a Coffee   Ko-Fi   PayPal

📌 This repository is archived with Zenodo and can be cited using the DOI above.

A curated list of tools, models, datasets, and resources for generating, evaluating, and applying synthetic data — artificial data created to augment, protect, or replace real-world datasets for AI, analytics, and research.

Support ongoing maintenance and curation via GitHub Sponsors.

Contents

Synthetic Data Generators

  • SDV (Synthetic Data Vault) – Most popular framework for generating synthetic tabular, relational, and time-series data.
  • Gretel – Tools for privacy-preserving synthetic tabular and text data using ML/DL models.
  • Synthesized.io – Synthetic tabular data generation with differential privacy.
  • ydata-synthetic – GAN-based synthetic data generator for tabular and time-series data.
  • CTGAN – GAN-based framework from SDV for high-quality tabular synthetic data.
  • Copulas – Library for modeling multivariate distributions for synthetic data generation.
  • Synthetic Data from HuggingFace – LLM-based text generation for domain-specific corpora.

Privacy & Compliance–Focused Tools

  • OpenDP SmartNoise – Differential privacy tools for generating and evaluating synthetic data.
  • Mostly AI – Commercial platform for privacy-preserving tabular synthetic data.
  • Tonic.ai – Developer-focused synthetic data tool with privacy constraints.
  • Hazy – Enterprise-grade platform for secure synthetic data pipelines.

Simulation Engines

  • CARLA Simulator – Autonomous driving simulator for synthetic sensor data.
  • AirSim – Drone, robotics, and autonomous vehicle simulation.
  • NVIDIA Isaac Sim – High-fidelity robotics simulation with synthetic data generation.
  • Unreal Engine – Popular for synthetic visual datasets in research.
  • Unity Perception – Synthetic computer vision datasets using Unity.

Image, Video & Multimodal Generators

Evaluation & Benchmarking

Datasets

Learning Resources

Related Awesome Lists

Contribute

Contributions are welcome. Please ensure your submission fully follows the requirements outlined in CONTRIBUTING.md, including formatting, scope alignment, and category placement.

Pull requests that do not adhere to the contribution guidelines may be closed.

License

CC0

About

A curated list of tools, models, datasets, and resources for generating, evaluating, and applying synthetic data.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages