-
Notifications
You must be signed in to change notification settings - Fork 56
ποΈ Create SQL Script for Database Schema Validation and StatisticsΒ #7865
Copy link
Copy link
Closed
Description
π― Objective
Create a comprehensive SQL script that validates the database schema by counting all tables and views, extracting sample data from each, and generating statistics reports. This script will provide a baseline for documentation validation and schema health checks.
Recommended Agent: @stack-specialist
π Background
Currently, we lack automated tools to:
- Count and list all tables and views in the database
- Extract representative sample data for validation
- Generate statistics about schema composition
- Validate data consistency and quality
- Provide baseline metrics for documentation
This script will be the foundation for validating DATABASE_VIEW_INTELLIGENCE_CATALOG.md and other documentation.
π Current State
Schema Size:
full_schema.sql: 12,934 linesrefresh-all-views.sql: ~28 materialized views- PostgreSQL 16 database with extensive view hierarchy
Missing Capabilities:
- No automated schema inventory
- No sample data extraction tool
- No statistics generation
- No baseline for validation
β Acceptance Criteria
- SQL script created at
service.data.impl/src/main/resources/schema-validation.sql - Script counts all tables (base tables, not views)
- Script counts all views (regular and materialized separately)
- For each table/view: name, type, row count, column count
- Sample data extraction (top 5-10 rows) for each table/view
- Statistics: total size, largest tables, empty tables/views
- Output formatted as JSON or CSV for easy processing
- Documentation added to README-SCHEMA-MAINTENANCE.md
- Script tested against actual database
π οΈ Implementation Guidance
Script Structure
-- schema-validation.sql
-- Database Schema Validation and Statistics Report
-- Generated: [timestamp]
-- Part 1: Count all objects
\echo '=== DATABASE OBJECT COUNTS ==='
-- Count base tables
SELECT 'Base Tables' AS object_type, COUNT(*) AS count
FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';
-- Count regular views
SELECT 'Regular Views' AS object_type, COUNT(*) AS count
FROM information_schema.views
WHERE table_schema = 'public';
-- Count materialized views
SELECT 'Materialized Views' AS object_type, COUNT(*) AS count
FROM pg_matviews
WHERE schemaname = 'public';
-- Part 2: List all tables with row counts
\echo '\n=== TABLE INVENTORY WITH ROW COUNTS ==='
DO $$
DECLARE
r RECORD;
row_count INTEGER;
BEGIN
FOR r IN
SELECT schemaname, tablename
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY tablename
LOOP
EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.tablename) INTO row_count;
RAISE NOTICE 'Table: %.% | Rows: %', r.schemaname, r.tablename, row_count;
END LOOP;
END $$;
-- Part 3: List all views with row counts
\echo '\n=== VIEW INVENTORY WITH ROW COUNTS ==='
DO $$
DECLARE
r RECORD;
row_count INTEGER;
BEGIN
-- Regular views
FOR r IN
SELECT schemaname, viewname AS name
FROM pg_views
WHERE schemaname = 'public'
ORDER BY viewname
LOOP
BEGIN
EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.name) INTO row_count;
RAISE NOTICE 'View: %.% | Rows: %', r.schemaname, r.name, row_count;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'View: %.% | ERROR: %', r.schemaname, r.name, SQLERRM;
END;
END LOOP;
-- Materialized views
FOR r IN
SELECT schemaname, matviewname AS name
FROM pg_matviews
WHERE schemaname = 'public'
ORDER BY matviewname
LOOP
BEGIN
EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.name) INTO row_count;
RAISE NOTICE 'Materialized View: %.% | Rows: %', r.schemaname, r.name, row_count;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Materialized View: %.% | ERROR: %', r.schemaname, r.name, SQLERRM;
END;
END LOOP;
END $$;
-- Part 4: Sample data extraction
\echo '\n=== SAMPLE DATA FROM KEY VIEWS ==='
-- Sample from each important view (top 5 rows)
-- Add specific queries for key views
SELECT * FROM view_riksdagen_politician LIMIT 5;
SELECT * FROM view_riksdagen_party LIMIT 5;
SELECT * FROM view_riksdagen_vote_data_ballot_politician_summary_daily
WHERE vote_date >= CURRENT_DATE - INTERVAL '7 days' LIMIT 5;
-- Part 5: Schema statistics
\echo '\n=== SCHEMA STATISTICS ==='
-- Largest tables by row count
SELECT
schemaname,
tablename,
n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC
LIMIT 20;
-- Tables with no data
SELECT tablename
FROM pg_tables t
LEFT JOIN pg_stat_user_tables s ON t.tablename = s.relname
WHERE t.schemaname = 'public'
AND (s.n_live_tup = 0 OR s.n_live_tup IS NULL)
ORDER BY tablename;
-- View dependency count
SELECT
dependent_view.relname AS view_name,
COUNT(DISTINCT source_table.relname) AS dependency_count
FROM pg_depend
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
JOIN pg_class AS dependent_view ON pg_rewrite.ev_class = dependent_view.oid
JOIN pg_class AS source_table ON pg_depend.refobjid = source_table.oid
WHERE dependent_view.relkind IN ('v', 'm')
GROUP BY dependent_view.relname
ORDER BY dependency_count DESC
LIMIT 20;Enhanced Features
JSON Output Option:
-- Generate JSON report
COPY (
SELECT json_build_object(
'tables', (SELECT json_agg(t) FROM (
SELECT tablename, n_live_tup AS rows
FROM pg_stat_user_tables
WHERE schemaname = 'public'
) t),
'views', (SELECT json_agg(v) FROM (
SELECT viewname
FROM pg_views
WHERE schemaname = 'public'
) v),
'materialized_views', (SELECT json_agg(mv) FROM (
SELECT matviewname
FROM pg_matviews
WHERE schemaname = 'public'
) mv)
)
) TO '/tmp/schema_report.json';CSV Output Option:
-- Export to CSV
COPY (
SELECT 'TABLE' AS type, tablename AS name, n_live_tup AS rows
FROM pg_stat_user_tables
WHERE schemaname = 'public'
UNION ALL
SELECT 'VIEW', viewname, NULL
FROM pg_views
WHERE schemaname = 'public'
ORDER BY name
) TO '/tmp/schema_inventory.csv' WITH (FORMAT CSV, HEADER);Usage Instructions
# Run the validation script
psql -U postgres -d cia_dev -f service.data.impl/src/main/resources/schema-validation.sql > schema_report.txt 2>&1
# Generate JSON report
psql -U postgres -d cia_dev -c "SELECT json_build_object(...)" > schema_report.json
# Generate CSV inventory
psql -U postgres -d cia_dev -c "COPY (...) TO STDOUT WITH CSV HEADER" > schema_inventory.csvπ Expected Output
-
Text Report (
schema_report.txt):- Object counts summary
- Full table/view inventory with row counts
- Sample data from key views
- Statistics (largest tables, empty objects, dependencies)
-
JSON Report (
schema_report.json):- Structured data for programmatic processing
- Complete schema metadata
- Suitable for automated validation
-
CSV Inventory (
schema_inventory.csv):- Spreadsheet-compatible format
- Easy to compare with documentation
- Suitable for gap analysis
π Related Files
- README-SCHEMA-MAINTENANCE.md - Add usage documentation here
- full_schema.sql - Schema to validate
- refresh-all-views.sql - Views to count
π Success Metrics
- Script successfully runs against database
- Accurate counts of all objects
- Sample data extracted without errors
- Reports generated in multiple formats
- Documentation updated with usage instructions
- Baseline established for validation work
Estimated Effort: 4-6 hours
Priority: High
Domain: database, validation, automation
Reactions are currently unavailable