Conversation
…umentation Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
…OIN strategy Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR creates a new database view view_riksdagen_politician_decision_pattern to track individual politician decision patterns from proposal data, complementing the existing party-level decision flow view added in the same changelog version. The view enables politician-level proposal success rate analysis, committee specialization identification, and legislative productivity tracking.
Key Changes
- New database view: Aggregates decision data by politician, committee, decision_type, and time period with approval/rejection rates and committee activity metrics
- Performance optimization: Adds index on
document_person_reference_da_0(person_id)for efficient politician-specific queries - Comprehensive documentation: Adds 248 lines of documentation including purpose, column descriptions, 5 example queries, and intelligence framework integration
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| service.data.impl/src/main/resources/db-changelog-1.35.xml | Added view creation changeSet (151 lines), performance index changeSet, and post-flight validation for the politician decision pattern view |
| DATABASE_VIEW_INTELLIGENCE_CATALOG.md | Added comprehensive documentation entry for the new view including column definitions, example queries, and intelligence applications; updated metadata and inventory counts |
| | `decision_types` | TEXT | Comma-separated decision types | 'Motion, Proposition' | | ||
| | `earliest_decision_date` | DATE | First decision in period | '2024-10-01' | | ||
| | `latest_decision_date` | DATE | Last decision in period | '2024-10-31' | | ||
|
|
There was a problem hiding this comment.
Documentation Terminology Mismatch with View
The documentation uses "decisions" terminology (e.g., line 1295: "Total decisions in period"), which matches the view's column names. However, this creates an inconsistency with the related view_riksdagen_party_decision_flow documentation, which likely uses "proposals" terminology.
Additionally, the example values show "Motion, Proposition" for decision_types (line 1303), but given that the view GROUPs BY decision_type, this column will only ever contain a single value, not a comma-separated list. The example is misleading.
Recommendation:
- Update the example for
decision_typesto show a single value like 'Motion' or 'Proposition', not 'Motion, Proposition' - Add a note that
decision_typeswill always be a single value equal to thedecision_typecolumn due to the GROUP BY clause
| | `decision_types` | TEXT | Comma-separated decision types | 'Motion, Proposition' | | |
| | `earliest_decision_date` | DATE | First decision in period | '2024-10-01' | | |
| | `latest_decision_date` | DATE | Last decision in period | '2024-10-31' | | |
| | `decision_types` | TEXT | Decision type (always a single value) | 'Motion' | | |
| | `earliest_decision_date` | DATE | First decision in period | '2024-10-01' | | |
| | `latest_decision_date` | DATE | Last decision in period | '2024-10-31' | | |
| > **Note:** The `decision_types` column will always contain a single value equal to the `decision_type` column, due to the GROUP BY clause in the view definition. |
There was a problem hiding this comment.
Fixed in commit 76c75f7. Removed decision_types column from documentation since it was removed from the view along with decision_type from GROUP BY. Added note explaining the view aggregates all decision types together.
| | `referred_back_decisions` | BIGINT | Decisions referred back | 3 | | ||
| | `other_decisions` | BIGINT | Other decision outcomes | 2 | | ||
| | `approval_rate` | NUMERIC(5,2) | Percentage approved | 71.11 | | ||
| | `rejection_rate` | NUMERIC(5,2) | Percentage rejected | 17.78 | |
There was a problem hiding this comment.
Unclear Documentation for committees_active Column
Line 1302 describes committees_active as "Number of committees active in" with example value "3", but this description is incomplete and potentially misleading due to the window function implementation.
Issue: The view uses a window function COUNT(DISTINCT dpd.committee) OVER (PARTITION BY pd.person_id) for this column (view line 447), which means:
- It counts the total number of distinct committees the politician has worked in across ALL time periods
- This value is repeated for each row of the same politician
- It is NOT specific to the aggregation period (month/year/committee) of the current row
Recommendation: Update the description to clarify, e.g.:
"Total number of distinct committees the politician is active in (across all time periods in the dataset)"
| | `rejection_rate` | NUMERIC(5,2) | Percentage rejected | 17.78 | | |
| | `rejection_rate` | NUMERIC(5,2) | Percentage rejected | 17.78 | | |
| | `committees_active` | INTEGER | Total number of distinct committees the politician is active in (across all time periods in the dataset) | 3 | |
| <comment> | ||
| Create indexes for performance optimization of politician decision pattern queries | ||
|
|
||
| Index on person_id + date for efficient temporal queries by politician. |
There was a problem hiding this comment.
Index Comment Inaccurate
The comment on line 522 states "Index on person_id + date for efficient temporal queries by politician" but the actual index created on line 528-530 is only on person_id, not a composite index including a date column.
Actual index: CREATE INDEX IF NOT EXISTS idx_person_ref_person_id ON document_person_reference_da_0(person_id)
This is a single-column index, not a composite index with date.
Recommendation: Update the comment to accurately reflect the index structure:
"Index on person_id for efficient politician-specific queries"
Note: If temporal queries are important, consider whether a composite index on (person_id, date_column) would provide better performance, though the current index is still useful for filtering by person_id.
| Index on person_id + date for efficient temporal queries by politician. | |
| Index on person_id for efficient politician-specific queries. |
There was a problem hiding this comment.
Fixed in commit 76c75f7. Updated comment to accurately reflect single-column index on person_reference_id (not a composite index with date column).
| ON dprc.hjid = dsc.document_person_reference_co_1 | ||
|
|
||
| INNER JOIN document_person_reference_da_0 dpr | ||
| ON dpr.document_person_reference_li_1 = dprc.hjid | ||
|
|
||
| -- Join to person_data for politician identification | ||
| INNER JOIN person_data pd | ||
| ON pd.id = dpr.person_id | ||
|
|
There was a problem hiding this comment.
[nitpick] Join Type Inconsistency with Related View
The politician decision pattern view uses INNER JOIN for the person reference tables (lines 471-479), while the party decision flow view (lines 186-190) uses LEFT JOIN for the same tables. This inconsistency means:
- Party view: Includes proposals even without person/party attribution (LEFT JOIN)
- Politician view: Excludes proposals without person attribution (INNER JOIN)
While the INNER JOIN is logical for a politician-focused view (we need person_id), this creates a data discrepancy where the same proposal data will be counted differently in the two views.
Recommendation: Add a comment explaining why INNER JOIN is used here vs. LEFT JOIN in the party view, to document this intentional difference for future maintainers.
| ) AS rejection_rate, | ||
|
|
||
| -- Decision type variety | ||
| STRING_AGG(DISTINCT dpd.decision_type, ', ' ORDER BY dpd.decision_type) AS decision_types, |
There was a problem hiding this comment.
Window Function with GROUP BY Conflict
Line 447 uses a window function COUNT(DISTINCT dpd.committee) OVER (PARTITION BY pd.person_id) to calculate committees_active, but this is incompatible with the GROUP BY clause on lines 489-499 which groups by multiple dimensions including committee.
Problem: The window function partitions only by person_id, but the GROUP BY includes committee, decision_type, decision_month, etc. This will cause the committees_active value to be repeated for each row of the same person, which is the total across ALL their grouped rows, not specific to the current aggregation level.
Expected behavior: Each row represents a person's activity in a specific committee during a specific month. The committees_active column will show the person's total committee count across all time, which is misleading in this context.
Solution: Either:
- Remove the window function and calculate
committees_activeat query time when needed - Make it a simple count without the window function (though this would always be 1 given the GROUP BY includes committee)
- Document that this represents total committees across all periods, not for the specific aggregation period
| STRING_AGG(DISTINCT dpd.decision_type, ', ' ORDER BY dpd.decision_type) AS decision_types, | ||
|
|
||
| -- Latest and earliest dates for the aggregation | ||
| MIN(dd.made_public_date) AS earliest_decision_date, |
There was a problem hiding this comment.
STRING_AGG with ORDER BY on Non-Grouped Column
Line 450 uses STRING_AGG(DISTINCT dpd.decision_type, ', ' ORDER BY dpd.decision_type), which orders by dpd.decision_type. However, since we're using DISTINCT and the GROUP BY clause already includes dpd.decision_type (line 495), this STRING_AGG will only aggregate a single value per group.
Impact: Each grouped row will have only one decision_type value (because we group by it), so decision_types will always contain just that single decision_type, not a comma-separated list of multiple types.
Solution:
- Remove
dpd.decision_typefrom the GROUP BY clause (line 495) if you want to aggregate multiple decision types per person/committee/month - Or remove the
decision_typescolumn since it's redundant with thedecision_typegrouping column
Note: The party decision flow view (line 116) groups by decision_type and doesn't attempt to aggregate it, which is more consistent.
There was a problem hiding this comment.
Fixed in commit 76c75f7. Removed decision_type from both SELECT and GROUP BY clauses since grouping by it made the STRING_AGG redundant (would always contain a single value). View now aggregates all decision types together at politician-committee-month granularity.
|
@copilot apply changes based on the comments in this thread also follow https://github.com/Hack23/cia/blob/master/service.data.impl/README-SCHEMA-MAINTENANCE.md and update full-schema.sql copilot have acess to postgres setup https://github.com/Hack23/cia/blob/master/.github/workflows/copilot-setup-steps.yml |
…pdate index comment - Removed decision_type from SELECT and GROUP BY to fix STRING_AGG issue where it would only aggregate a single value - Updated index comment to accurately reflect single-column index on person_id (not person_id + date) - Updated documentation to remove decision_type column and add note about aggregation scope - View now aggregates all decision types together at politician-committee-month granularity Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…fix person_id reference - Added view_riksdagen_politician_decision_pattern to full_schema.sql (line 8945) - Added idx_person_ref_person_id index to full_schema.sql (line 11770) - Fixed changelog to use person_reference_id instead of person_id (correct column name in document_person_reference_da_0 table) - Updated index definition to use person_reference_id - Verified build succeeds Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
All review comments addressed in commits 76c75f7 and 86e835b. Updated |
- Removed WHERE clause filtering on decision_type in query #4 (Ministry Proposal Support Patterns) - Query now shows all decision types aggregated by ministry/committee - All example queries now use only columns that exist in the view Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
- Changed SELECT to use 'pd.id AS person_id' (person_data table has 'id' column, not 'person_id') - Updated WHERE clause to use 'pd.id IS NOT NULL' - Updated GROUP BY to use 'pd.id' - Now matches full_schema.sql implementation exactly Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
|



✅ Create Politician Decision Pattern View - FINAL
All Issues Resolved ✅
Review Comments:
Final Implementation
View Structure:
Files:
db-changelog-1.35.xml- Changelog with correct column referencesDATABASE_VIEW_INTELLIGENCE_CATALOG.md- Complete documentationfull_schema.sql- View and index definitionsCommits:
Testing:
Intelligence Value
⭐⭐⭐⭐⭐ VERY HIGH - Enables politician effectiveness tracking and committee specialization analysis.
Status: READY FOR MERGE
Original prompt
This section details on the original issue you should resolve
<issue_title>Create Politician Decision Pattern View from DOCUMENT_PROPOSAL_DATA</issue_title>
<issue_description>## 🎯 Objective
Create a database view tracking individual politician decision patterns from DOCUMENT_PROPOSAL_DATA, enabling analysis of politician-level proposal success rates, committee work effectiveness, and legislative productivity.
📋 Background
While
view_riksdagen_politician_documenttracks document authorship, we lack decision outcome intelligence at the politician level. This view complements Issue #7918 (party-level) by providing individual politician decision analytics.Use Cases:
Context from Documentation:
📊 Current State
✅ Acceptance Criteria
view_riksdagen_politician_decision_patterndatabase viewdb-changelog-1.35.xml(same changelog as Create Party Decision Flow View from DOCUMENT_PROPOSAL_DATA #7918)🛠️ Implementation Guidance
Files to Modify:
service.data.impl/src/main/resources/db-changelog-1.35.xml(append changeSet)DATABASE_VIEW_INTELLIGENCE_CATALOG.md- Add under "Politician Views" sectionSample Query:
Edge Cases:
🤖 Recommended Agent
Agent: @hack23-intelligence-operative
Rationale: Requires political science expertise to design meaningful politician-level decision metrics and integrate with existing politician intelligence views.
For implementation, the Intelligence Operative will:
view_riksdagen_politician_summary📚 Related Documentation
🏷️ Labels
feature,database,intelligence,politician-analysis...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.