Inspiration
Tax season is dreaded by millions of Americans. The current solutions fall into two camps:
- DIY software that bombards users with 100+ confusing questions and offers no real guidance
- Professional CPAs charging $300-500+ for simple returns, creating an accessibility barrier
I asked myself: What if filing taxes was as simple as having a conversation?
The inspiration came from witnessing my friends struggle with tax software, making critical mistakes because they didn't understand terms like "AGI," "standard deduction," or "filing status." I realized that with modern AI, I could build a system that:
- Understands tax law as well as a CPA
- Extracts data from W-2s and other documents automatically
- Explains calculations in plain English
- Fills official IRS forms without errors
- Costs less than using a software
Province was born from the belief that everyone deserves access to accurate, affordable tax filing.
What It Does
Province is a comprehensive AI tax filing platform with three revolutionary capabilities:
1. Conversational Tax Intake
Instead of endless form fields, Province's AI agent has a natural conversation with taxpayers:
Agent: "Hi! Let's get your taxes done. What's your filing status?"
User: "I'm single"
Agent: "Great! Do you have any children or dependents?"
User: "Two kids, ages 5 and 8"
Agent: "Perfect! They qualify for the Child Tax Credit. Now, do you have your W-2?"
The agent:
- Asks one question at a time to avoid overwhelming users
- Explains why each piece of information is needed
- Validates responses in real-time
- Adapts the conversation flow based on user's situation
- Maintains conversational state across sessions
2. Intelligent Document Processing
Province automatically extracts tax data from uploaded documents:
- Bedrock Data Automation: Uses Intelligent Document Processing Pipeline to extract data correctly in both PDF and PNG/JPEG forms
- Document Types: W-2, 1099-INT, 1099-MISC, and other tax documents
- Smart Validation: Verifies extracted data against IRS requirements
- Knowledge Base: Cross-references IRS tax rules stored in ElasticSearch for accuracy
Example: Upload a W-2 → Province extracts wages, federal withholding, employer info, and state data—all validated against IRS formats.
3. AI-Powered Form Filling with Zero Manual Mapping
This is where Province truly innovates. Traditional tax software requires developers to manually map every field on every form, a process that takes hours per form and breaks whenever the IRS updates forms.
My breakthrough: An autonomous FormMapping pipeline that uses agentic reasoning to map PDF form fields automatically:
Upload IRS Form 1040 (141 fields)
↓
Lambda extracts field names (f1_01, f1_02, c1_1, etc.)
↓
FormMappingAgent analyzes with Claude 3.5 Sonnet
↓
Generates semantic mapping:
"taxpayer_first_name" → "topmostSubform[0].Page1[0].f1_04[0]"
"wages_line_1a" → "topmostSubform[0].Page1[0].f1_32[0]"
↓
Validates & caches mapping in DynamoDB
↓
Future forms fill instantly in ~100ms
4. Tax Calculation Engine
Province calculates taxes with CPA-level accuracy:
- Applies 2024 tax brackets based on filing status
- Calculates standard deductions ($14,600 for Single, $29,200 for MFJ)
- Computes Child Tax Credit ($2,000 per qualifying child)
- Determines refund or amount owed
- Every calculation is explained with references to IRS rules
Example output:
Based on your wages of $55,151.93 and filing status Single:
- Adjusted Gross Income (AGI): $55,151.93
- Standard Deduction: $14,600.00
- Taxable Income: $40,551.93
- Tax Liability: $4,634.23
- Federal Withholding: $16,606.17
→ REFUND: $11,971.94
6. Multi-Agent Architecture
Province uses specialized AI agents, each expert in their domain:
- Intake Agent: Collects filing information conversationally
- Tax Planner Agent: Calculates taxes using IRS rules
- FormMapping Agent: Maps PDF fields using agentic reasoning
- Review Agent: Validates completed returns for errors
All agents collaborate using AWS Bedrock Agent runtime with Claude 3.5 Sonnet.
How I Built It
Technology Stack
Frontend (Next.js 15 + TypeScript)
- Framework: Next.js 15 with App Router and Turbopack
- UI Components: Custom component library built on Radix UI primitives
- Styling: Tailwind CSS 4 with custom design system
- Authentication: Clerk for organization-based access control
- PDF Rendering: PDF.js for client-side form viewing
- State Management: React hooks + server components
- Real-time: WebSocket integration for live agent responses
Key components:
Form1040Viewer: Cursor-style version navigator with tooltipsMainEditor: Multi-tab interface for documents and formsStartScreen: Dashboard with calendar, deadlines, and past filingsTaxFormsViewer: Interactive PDF viewer with field annotations (Phase 2)
Backend (Python 3.11 + FastAPI)
- Framework: FastAPI with async/await throughout
- AWS Services:
- AWS Bedrock: Claude 3.5 Sonnet v2 for all AI operations
- AWS Bedrock Data Automatiation: Intelligent Document Processing for document extraction
- DynamoDB: NoSQL storage for form mappings, documents, user data
- S3: Object storage with versioning for forms and templates
- Lambda: Serverless processing for form template ingestion
- ElasticSearch: Knowledge base with IRS tax rules and regulations
- EventBridge: S3 event triggers for automated processing
- PDF Processing: PyMuPDF (fitz) for form field extraction and filling
- Agent Framework: Strands SDK for multi-agent orchestration
Infrastructure (AWS CDK)
- Infrastructure as Code: AWS CDK (TypeScript) for reproducible deployments
- Multi-Region: Bedrock cross-region inference (us-east-1, us-west-2)
- Rate Limiting: Handles 2 RPM Bedrock limits with exponential backoff
- Cost Optimization: DynamoDB pay-per-request, S3 Intelligent-Tiering
Key Technical Innovations
1. Agentic Form Mapping
Traditional approach (broken):
# Manual mapping - breaks on every IRS form update
field_mapping = {
"first_name": "f1_01",
"last_name": "f1_02",
# ... 139 more manual mappings
}
Province's agentic approach:
class FormMappingAgent:
def map_form_fields(self, form_type, tax_year, fields):
# Phase 1: Initial comprehensive mapping
mapping = self._initial_mapping(form_type, tax_year, fields)
# Phase 2: Gap analysis
coverage = self._calculate_coverage(mapping, fields)
# Phase 3: Iterative gap filling until 90%+ coverage
while coverage < 90:
gaps = self._identify_gaps(mapping, fields)
mapping = self._fill_gaps(gaps, mapping)
coverage = self._calculate_coverage(mapping, fields)
# Phase 4: Validation
self._validate_mapping(mapping, fields)
return mapping # Cached forever in DynamoDB
2. Conversational State Management
Province maintains rich conversation state across multiple interactions:
conversation_state = {
"session_id": "tax_session_20241019_153000",
"filing_status": "Single",
"dependents": 2,
"w2_data": {
"wages": 55151.93,
"federal_withholding": 16606.17,
"employer": "TechCorp Inc"
},
"tax_calculation": {
"agi": 55151.93,
"taxable_income": 40551.93,
"refund": 11971.94
},
"filled_form": {
"version": "v033",
"filled_at": "2025-10-19T15:30:47Z"
}
}
The agent uses this state to:
- Skip already-answered questions
- Provide context-aware responses
- Resume interrupted sessions
- Track progress through the filing process
3. Lambda-Triggered Form Processing Pipeline
User uploads PDF → S3 EventBridge trigger → Lambda invokes
↓
FormTemplateProcessor extracts 141 fields
↓
Calls FormMappingAgent (Claude 3.5 Sonnet)
↓
Generates semantic mapping with 90%+ coverage
↓
Saves to DynamoDB with metadata
↓
Form ready for instant filling (100ms)
4. Multi-Agent Collaboration with Strands
Province uses the Strands SDK to orchestrate multiple specialized agents:
tax_service = TaxService()
tax_service.agent = Agent(
model="us.anthropic.claude-3-5-sonnet-20240620-v1:0",
system_prompt=agent_instructions,
tools=[
ingest_documents_tool, # W-2 processing
calc_1040_tool, # Tax calculations
fill_form_tool, # Form filling
save_document_tool, # Document storage
manage_state_tool, # Conversation state
list_version_history_tool # Version tracking
]
)
# Agent automatically chooses which tools to use
response = await tax_service.agent.invoke_async(user_message)
Agent decision-making flow:
- User: "I uploaded my W-2"
- Agent selects
ingest_documents_tool - Tool extracts wages and withholding
- Agent asks: "I found wages of $55,151. Is this correct?"
- User: "Yes, what's my refund?"
- Agent selects
calc_1040_tool - Tool calculates: Refund = $11,971.94
- Agent responds with breakdown
5. ElasticSearch Knowledge Base Integration
Province doesn't hallucinate tax rules. It references actual IRS documentation:
User question → Bedrock Agent → ElasticSearch retrieval
↓
Finds IRS Publication 17, Section 3.2.1
↓
"Standard deduction for Single filers in 2024 is $14,600"
↓
Agent responds with source citation
Knowledge Base Contents:
- IRS tax rules and publications
- Tax brackets for all filing statuses
- Deduction and credit eligibility rules
- Form instructions (1040, W-2, 1099-INT, etc.)
Challenges I Ran Into
1. AWS Bedrock Rate Limits (2 RPM)
Problem: Claude 3.5 Sonnet v2 has a strict 2 requests per minute limit, causing form mapping to fail when processing multiple fields.
Solution: Implemented exponential backoff with retry logic:
Result: Successfully processes forms with 5+ iterations, respecting rate limits.
2. PDF Form Field Mapping
Problem: IRS forms use cryptic field names like topmostSubform[0].Page1[0].f1_32[0] instead of semantic names like "wages_line_1a". Manual mapping was error-prone and broke on form updates.
Solution: Built the FormMappingAgent using agentic reasoning:
- Phase 1: AI analyzes all 141 fields comprehensively
- Phase 2: Identifies unmapped gaps in coverage
- Phase 3: Iteratively fills gaps until 90%+ coverage
- Phase 4: Validates mapping before caching
Result: Achieved 100% accuracy on Form 1040 with zero manual intervention.
3. Unreliable PDF Field Labels
Problem: PyMuPDF's nearby_label extraction was unreliable, often returning incorrect or empty labels.
Solution: Used a multi-signal approach:
- Field number patterns: f1_32 → likely line 1a (field 32)
- Y-position: Sort fields top-to-bottom to infer order
- Page context: Page 1 = personal info, Page 2 = tax calculations
- AI reasoning: Claude understands tax form structure
Result: AI correctly maps fields even with bad labels by using context.
5. Form Versioning and S3 Organization
Problem: S3 stores objects flatly, making version management complex. How do we track 33 versions of Form 1040 without chaos?
Solution: Designed a hierarchical S3 key structure:
filled_forms/{taxpayer_name}/{form_type}/{tax_year}/v{NNN}_{form_type}_{timestamp}.pdf
Benefits:
- Organized: Easy to list all versions for a taxpayer
- Sortable: Version numbers ensure correct ordering
- Queryable: Fast lookups by form type and year
- Scalable: Supports millions of users
Version API:
GET /api/v1/forms/1040/{engagement_id}/versions?tax_year=2024
Response:
{
"total_versions": 33,
"versions": [
{"version": "v033", "last_modified": "2025-10-19 15:20:47", "size": 338186},
{"version": "v032", "last_modified": "2025-10-19 15:15:32", "size": 337912},
...
]
}
Accomplishments That I'm Proud Of
1. 100% Form Filling Accuracy
I achieved 21/21 fields filled correctly on Form 1040—matching the accuracy of manual data entry, but 100x faster.
2. Conversational UX
Most "conversational" tax software still feels robotic. Province's agents:
- Ask one question at a time
- Explain tax concepts in plain English
- Validate responses and provide helpful feedback
- Maintain context across entire filing session
What I Learned
1. Agentic AI > Single-Shot Prompting
Key insight: Complex tasks need iterative reasoning, not one-shot prompts.
Before (failed):
"Map all 141 fields on this form in one response"
→ Result: 40% coverage, many errors
After (success):
Phase 1: Map comprehensively → 60% coverage
Phase 2: Identify gaps → 20 unmapped fields
Phase 3: Fill gaps iteratively → 90%+ coverage
Phase 4: Validate → 100% accuracy
2. Context > Instructions
Key insight: AI performs better with rich context than verbose instructions.
Bad prompt:
"Extract the first name from field f1_04. It should be in ALL CAPS..."
(500 words of instructions)
Good prompt:
Here's Form 1040 with 141 fields sorted by position:
[field_name, type, page, y_position, nearby_label]
f1_04, Text, Page 1, y=95.3, "First name"
...
Map to semantic names using IRS form line numbers.
3. PDF Forms Are Messy
Key insight: Don't trust field labels. Use multiple signals.
IRS forms have:
- Cryptic field names:
topmostSubform[0].Page1[0].f1_32[0] - Unreliable labels:
nearby_labeloften empty or wrong - Inconsistent numbering: f1_32 might be line 1a or line 3b depending on form
Solution: Combine field number, Y-position, page number, and AI reasoning.
What's Next for Province
Phase 1: State Tax Returns
Goal: Support all 50 state tax returns.
Phase 2: Prior Year Returns & Amendments
Goal: File taxes for previous years and amend errors.
Built With
- agentcore
- amazon-web-services
- bedrock
- dynamodb
- lambda
- s3
- strands


Log in or sign up for Devpost to join the conversation.