🔒 pg_anon

Anonymization tool for PostgreSQL.
Share real data without exposing personal or confidential information.

✨ Overview

pg_anon helps you safely clone your production database for testing or development.
During the process, all sensitive fields are replaced with realistic but fake values —
keeping your data structure intact and your privacy protected.

⚙️ Requirements

Python: 3.11+
PostgreSQL: 9.6+
PostgreSQL client utilities (must match the server’s major version):
- pg_dump – uses for export the database schema
- pg_restore – uses for restore the database schema into the target database

For details, see: Installation and configuring

🧩 Terminology

Term	Description
Personal (sensitive) data	Data that must not be shared with third parties. Includes personal or confidential business information.
Source database	The original database that contains sensitive data.
Target database	An empty database where anonymized data will be restored.
Meta-dictionary	A Python file describing rules for detecting sensitive data. Created manually and used as the basis for generating the sensitive dictionary during scanning. See more
Prepared sensitive dictionary	A Python file that defines which tables and fields contain sensitive data and how to anonymize them. Created automatically or manually. See more
Prepared non-sensitive dictionary	A Python file listing schemas, tables, and fields without sensitive data. Used to speed up repeated scans. See more
Table dictionary	A Python file listing tables. Used to include or exclude tables from dump & restore operations. See more
Create-dict (scan)	The process of scanning the source database to detect sensitive fields and create dictionary files. See more
Dump	Exporting data from the source database into files using a dictionary. This is where anonymization occurs. See more
Restore	Importing anonymized data from files into the target database. See more
Anonymization (masking)	Full process of cloning and sanitizing data (`dump → restore`), replacing sensitive values with random or hashed ones.
Anonymization function	A PostgreSQL function (built-in or from `anon_funcs` schema) that replaces sensitive values with random or hashed data. New functions can be added to extend anonymization logic.

🚀 Quick Start

Before you start

In this guide, a privileged user will be created and test databases with data will be set up.

It is recommended to follow this quick start guide in a non-production environment.

Prerequisites:

A working PostgreSQL instance
PostgreSQL client utilities installed

If running as not superuser

Ensure you can install pgcrypto extension in your quickstart source database

1. Preparing pg_anon

git clone https://github.com/TantorLabs/pg_anon.git pg_anon
cd pg_anon
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python3 -m pg_anon --version

2. Preparing DB environment

Create a DB user for the quick start guide

sudo su - postgres -c "psql -p 5432 -U postgres -c \"CREATE USER anon_test_user WITH PASSWORD 'mYy5RexGsZ' SUPERUSER;\""

Prepare the SQL script to initialize the databases

cat > /tmp/db_env.sql << 'EOL'
DROP DATABASE IF EXISTS pg_anon_quick_start_source_db;
CREATE DATABASE pg_anon_quick_start_source_db
WITH
OWNER = anon_test_user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
template = template0;

DROP DATABASE IF EXISTS pg_anon_quick_start_target_db;
CREATE DATABASE pg_anon_quick_start_target_db
WITH
OWNER = anon_test_user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
template = template0;
EOL

Initialize source and target databases

sudo chown postgres:postgres /tmp/db_env.sql
sudo su - postgres -c "psql -p 5432 -U postgres -f /tmp/db_env.sql"

Load test environment into source DB

cp $(pwd)/tests/sql/init_env.sql /tmp/init_env.sql
sudo chown postgres:postgres /tmp/init_env.sql
sudo su - postgres -c "psql -p 5432 -d pg_anon_quick_start_source_db -U postgres -f /tmp/init_env.sql"

3. Initializing the service schema for pg_anon

python3 -m pg_anon --mode=init \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_source_db \
	--db-host=127.0.0.1 \
	--db-port=5432

4. Scan your source database

python3 -m pg_anon --mode=create-dict \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_source_db \
	--db-host=127.0.0.1 \
	--db-port=5432 \
	--meta-dict-file=tests/input_dict/test_meta_dict.py \
	--output-sens-dict-file=test_sens_dict_output.py \
	--output-no-sens-dict-file=test_no_sens_dict_output.py \
	--processes=2

5. Visualizing anonymization rules

Run pg_anon in view-fields mode to see which fields will be anonymized and which fields will be dumped as-is.

python3 -m pg_anon --mode=view-fields \
	--db-host=127.0.0.1 \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_source_db \
	--db-port=5432 \
	--prepared-sens-dict-file=test_sens_dict_output.py \
	--fields-count=20

Run pg_anon in view-data mode to preview anonymized data in a specific table.

python3 -m pg_anon --mode=view-data \
	--db-host=127.0.0.1 \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_source_db \
	--db-port=5432 \
	--prepared-sens-dict-file=test_sens_dict_output.py \
	--schema-name=public \
	--table-name=contracts \
	--limit=10 \
	--offset=0

6. Create an anonymized backup

python3 -m pg_anon --mode=dump \
	--db-host=127.0.0.1 \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_source_db \
	--db-port=5432 \
	--prepared-sens-dict-file=test_sens_dict_output.py \
	--output-dir=/tmp/quick_start_dump \
	--clear-output-dir

7. Restore the anonymized backup into the target DB

python3 -m pg_anon --mode=restore \
	--db-host=127.0.0.1 \
	--db-port=5432 \
	--db-user=anon_test_user \
	--db-user-password=mYy5RexGsZ \
	--db-name=pg_anon_quick_start_target_db \
	--input-dir=/tmp/quick_start_dump \
	--drop-custom-check-constr \
	--verbose=debug

📘 Documentation Index

Section	Description
💽 Installation & Configuration	How to install and configure `pg_anon`
⚙️ How It Works	Describing anonymizations process into `pg_anon`
🛠️ Debugging	How to debug anonymizations process
💬 FAQ	Common questions and troubleshooting tips
📚 SQL Functions Library	Built-in SQL functions for anonymization
🔌 API	Available endpoints, request/response formats, and usage examples
💡 Contributing	Info about contributing

📘 Operations

Operation	Description
🏗️ Init	Initialize schema `anon_funcs` with sql functions. It used for scan and dump processes
🔍 Create-dict (Scan)	Analyze your database and detect sensitive data
💾 Dump	Export and anonymize data using prepared dictionaries
📂 Restore	Load anonymized data into a target database
🔬 View Fields	Inspect anonymized fields or test the anonymization pipeline
📊 View Data	Inspect anonymized data or test the anonymization pipeline

📘 Dictionary Schemas

Dictionary type	Description
🗂️ Meta Dictionary	Structure of the meta-dictionary used for scanning
🔐 Sensitive Dictionary	Structure of sensitive dictionaries
📋 Non-sensitive Dictionary	Structure of non-sensitive dictionaries
📑 Tables dictionary	Dictionary structure for partial dump/restore operations

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
docker		docker
docs		docs
images		images
pg_anon		pg_anon
rest_api		rest_api
tests		tests
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
init.sql		init.sql
pg_anon.py		pg_anon.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔒 pg_anon

✨ Overview

⚙️ Requirements

🧩 Terminology

🚀 Quick Start

Before you start

Prerequisites:

If running as not superuser

1. Preparing pg_anon

2. Preparing DB environment

3. Initializing the service schema for pg_anon

4. Scan your source database

5. Visualizing anonymization rules

6. Create an anonymized backup

7. Restore the anonymized backup into the target DB

📘 Documentation Index

📘 Operations

📘 Dictionary Schemas

About

Uh oh!

Releases 20

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔒 pg_anon

✨ Overview

⚙️ Requirements

🧩 Terminology

🚀 Quick Start

Before you start

Prerequisites:

If running as not superuser

1. Preparing pg_anon

2. Preparing DB environment

3. Initializing the service schema for pg_anon

4. Scan your source database

5. Visualizing anonymization rules

6. Create an anonymized backup

7. Restore the anonymized backup into the target DB

📘 Documentation Index

📘 Operations

📘 Dictionary Schemas

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages