🗄️ Create SQL Script for Database Schema Validation and Statistics

## 🎯 Objective

Create a comprehensive SQL script that validates the database schema by counting all tables and views, extracting sample data from each, and generating statistics reports. This script will provide a baseline for documentation validation and schema health checks.

**Recommended Agent**: @stack-specialist

## 📋 Background

Currently, we lack automated tools to:
- Count and list all tables and views in the database
- Extract representative sample data for validation
- Generate statistics about schema composition
- Validate data consistency and quality
- Provide baseline metrics for documentation

This script will be the foundation for validating DATABASE_VIEW_INTELLIGENCE_CATALOG.md and other documentation.

## 📊 Current State

**Schema Size**:
- `full_schema.sql`: 12,934 lines
- `refresh-all-views.sql`: ~28 materialized views
- PostgreSQL 16 database with extensive view hierarchy

**Missing Capabilities**:
- No automated schema inventory
- No sample data extraction tool
- No statistics generation
- No baseline for validation

## ✅ Acceptance Criteria

- [ ] SQL script created at `service.data.impl/src/main/resources/schema-validation.sql`
- [ ] Script counts all tables (base tables, not views)
- [ ] Script counts all views (regular and materialized separately)
- [ ] For each table/view: name, type, row count, column count
- [ ] Sample data extraction (top 5-10 rows) for each table/view
- [ ] Statistics: total size, largest tables, empty tables/views
- [ ] Output formatted as JSON or CSV for easy processing
- [ ] Documentation added to README-SCHEMA-MAINTENANCE.md
- [ ] Script tested against actual database

## 🛠️ Implementation Guidance

### Script Structure

```sql
-- schema-validation.sql
-- Database Schema Validation and Statistics Report
-- Generated: [timestamp]

-- Part 1: Count all objects
\echo '=== DATABASE OBJECT COUNTS ==='

-- Count base tables
SELECT 'Base Tables' AS object_type, COUNT(*) AS count
FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';

-- Count regular views
SELECT 'Regular Views' AS object_type, COUNT(*) AS count
FROM information_schema.views
WHERE table_schema = 'public';

-- Count materialized views
SELECT 'Materialized Views' AS object_type, COUNT(*) AS count
FROM pg_matviews
WHERE schemaname = 'public';

-- Part 2: List all tables with row counts
\echo '\n=== TABLE INVENTORY WITH ROW COUNTS ==='

DO $$
DECLARE
    r RECORD;
    row_count INTEGER;
BEGIN
    FOR r IN 
        SELECT schemaname, tablename 
        FROM pg_tables 
        WHERE schemaname = 'public'
        ORDER BY tablename
    LOOP
        EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.tablename) INTO row_count;
        RAISE NOTICE 'Table: %.% | Rows: %', r.schemaname, r.tablename, row_count;
    END LOOP;
END $$;

-- Part 3: List all views with row counts
\echo '\n=== VIEW INVENTORY WITH ROW COUNTS ==='

DO $$
DECLARE
    r RECORD;
    row_count INTEGER;
BEGIN
    -- Regular views
    FOR r IN 
        SELECT schemaname, viewname AS name
        FROM pg_views 
        WHERE schemaname = 'public'
        ORDER BY viewname
    LOOP
        BEGIN
            EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.name) INTO row_count;
            RAISE NOTICE 'View: %.% | Rows: %', r.schemaname, r.name, row_count;
        EXCEPTION WHEN OTHERS THEN
            RAISE NOTICE 'View: %.% | ERROR: %', r.schemaname, r.name, SQLERRM;
        END;
    END LOOP;
    
    -- Materialized views
    FOR r IN 
        SELECT schemaname, matviewname AS name
        FROM pg_matviews 
        WHERE schemaname = 'public'
        ORDER BY matviewname
    LOOP
        BEGIN
            EXECUTE format('SELECT COUNT(*) FROM %I.%I', r.schemaname, r.name) INTO row_count;
            RAISE NOTICE 'Materialized View: %.% | Rows: %', r.schemaname, r.name, row_count;
        EXCEPTION WHEN OTHERS THEN
            RAISE NOTICE 'Materialized View: %.% | ERROR: %', r.schemaname, r.name, SQLERRM;
        END;
    END LOOP;
END $$;

-- Part 4: Sample data extraction
\echo '\n=== SAMPLE DATA FROM KEY VIEWS ==='

-- Sample from each important view (top 5 rows)
-- Add specific queries for key views
SELECT * FROM view_riksdagen_politician LIMIT 5;
SELECT * FROM view_riksdagen_party LIMIT 5;
SELECT * FROM view_riksdagen_vote_data_ballot_politician_summary_daily 
WHERE vote_date >= CURRENT_DATE - INTERVAL '7 days' LIMIT 5;

-- Part 5: Schema statistics
\echo '\n=== SCHEMA STATISTICS ==='

-- Largest tables by row count
SELECT 
    schemaname,
    tablename,
    n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC
LIMIT 20;

-- Tables with no data
SELECT tablename
FROM pg_tables t
LEFT JOIN pg_stat_user_tables s ON t.tablename = s.relname
WHERE t.schemaname = 'public' 
  AND (s.n_live_tup = 0 OR s.n_live_tup IS NULL)
ORDER BY tablename;

-- View dependency count
SELECT 
    dependent_view.relname AS view_name,
    COUNT(DISTINCT source_table.relname) AS dependency_count
FROM pg_depend
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
JOIN pg_class AS dependent_view ON pg_rewrite.ev_class = dependent_view.oid
JOIN pg_class AS source_table ON pg_depend.refobjid = source_table.oid
WHERE dependent_view.relkind IN ('v', 'm')
GROUP BY dependent_view.relname
ORDER BY dependency_count DESC
LIMIT 20;
```

### Enhanced Features

**JSON Output Option**:
```sql
-- Generate JSON report
COPY (
  SELECT json_build_object(
    'tables', (SELECT json_agg(t) FROM (
      SELECT tablename, n_live_tup AS rows 
      FROM pg_stat_user_tables 
      WHERE schemaname = 'public'
    ) t),
    'views', (SELECT json_agg(v) FROM (
      SELECT viewname 
      FROM pg_views 
      WHERE schemaname = 'public'
    ) v),
    'materialized_views', (SELECT json_agg(mv) FROM (
      SELECT matviewname 
      FROM pg_matviews 
      WHERE schemaname = 'public'
    ) mv)
  )
) TO '/tmp/schema_report.json';
```

**CSV Output Option**:
```sql
-- Export to CSV
COPY (
  SELECT 'TABLE' AS type, tablename AS name, n_live_tup AS rows
  FROM pg_stat_user_tables
  WHERE schemaname = 'public'
  UNION ALL
  SELECT 'VIEW', viewname, NULL
  FROM pg_views
  WHERE schemaname = 'public'
  ORDER BY name
) TO '/tmp/schema_inventory.csv' WITH (FORMAT CSV, HEADER);
```

### Usage Instructions

```bash
# Run the validation script
psql -U postgres -d cia_dev -f service.data.impl/src/main/resources/schema-validation.sql > schema_report.txt 2>&1

# Generate JSON report
psql -U postgres -d cia_dev -c "SELECT json_build_object(...)" > schema_report.json

# Generate CSV inventory
psql -U postgres -d cia_dev -c "COPY (...) TO STDOUT WITH CSV HEADER" > schema_inventory.csv
```

## 📊 Expected Output

1. **Text Report** (`schema_report.txt`):
   - Object counts summary
   - Full table/view inventory with row counts
   - Sample data from key views
   - Statistics (largest tables, empty objects, dependencies)

2. **JSON Report** (`schema_report.json`):
   - Structured data for programmatic processing
   - Complete schema metadata
   - Suitable for automated validation

3. **CSV Inventory** (`schema_inventory.csv`):
   - Spreadsheet-compatible format
   - Easy to compare with documentation
   - Suitable for gap analysis

## 🔗 Related Files

- [README-SCHEMA-MAINTENANCE.md](https://github.com/Hack23/cia/blob/master/service.data.impl/README-SCHEMA-MAINTENANCE.md) - Add usage documentation here
- [full_schema.sql](https://github.com/Hack23/cia/blob/master/service.data.impl/src/main/resources/full_schema.sql) - Schema to validate
- [refresh-all-views.sql](https://github.com/Hack23/cia/blob/master/service.data.impl/src/main/resources/refresh-all-views.sql) - Views to count

## 📊 Success Metrics

- Script successfully runs against database
- Accurate counts of all objects
- Sample data extracted without errors
- Reports generated in multiple formats
- Documentation updated with usage instructions
- Baseline established for validation work

**Estimated Effort**: 4-6 hours  
**Priority**: High  
**Domain**: database, validation, automation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗄️ Create SQL Script for Database Schema Validation and Statistics #7865

🎯 Objective

📋 Background

📊 Current State

✅ Acceptance Criteria

🛠️ Implementation Guidance

Script Structure

Enhanced Features

Usage Instructions

📊 Expected Output

🔗 Related Files

📊 Success Metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🗄️ Create SQL Script for Database Schema Validation and Statistics #7865

Description

🎯 Objective

📋 Background

📊 Current State

✅ Acceptance Criteria

🛠️ Implementation Guidance

Script Structure

Enhanced Features

Usage Instructions

📊 Expected Output

🔗 Related Files

📊 Success Metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions