DocsGuard

Result

Inspiration

Every year, hundreds of thousands of immigration applications are delayed or denied — not because of ineligibility, but because of small, avoidable mistakes. A date of birth entered as MM/DD on one form and DD/MM on another. A middle name present on a passport but missing on a petition. First and last name transposed between documents. These aren't rare edge cases — they're the norm for anyone assembling a multi-document filing.

We've seen it firsthand. Immigration attorneys spend hours manually cross-referencing forms before submission. Families wait months for responses only to receive an RFE (Request for Evidence) over a single mismatched field. The problem isn't complexity — it's that humans are bad at finding inconsistencies across dozens of pages of dense, repetitive text.

We asked: what if AI could do that cross-referencing in seconds?

What it does

DocsGuard is an AI-powered document consistency analyzer built for immigration applicants, attorneys, and paralegals. You upload any combination of immigration forms — passports, I-485s, I-130s, N-648s, G-1450s, birth certificates, employment authorization cards — and within seconds the system tells you exactly what doesn't match and why it matters.

Here's what happens under the hood:

Document ingestion. Users upload up to 10 documents at once through a simple drag-and-drop interface. Files are encrypted and stored per-user in Amazon S3. Supported formats include PDF, JPEG, PNG, and TIFF — anything that could come out of a scanner or a government agency portal.
Field extraction. AWS Textract processes each document using FORMS analysis mode, which extracts key-value pairs natively — it doesn't just read text, it understands which label belongs to which value. Family Name: Smith. Date of Birth: 02/05/1999. A-Number: 546780743.
Cross-document comparison. Amazon Nova (multimodal) receives both the extracted text and rendered page images for every document. It analyzes every possible pair and compares field by field — names, dates, identification numbers, addresses, employer information, signature dates.
Smart inconsistency detection. The system distinguishes real problems from formatting noise. A-123 456 789 and A123456789 are the same number — not flagged. Smith and Smithh are different names — flagged as HIGH severity. First and last name appearing in swapped order between two forms is detected as a transposition, not a name mismatch.

How we built it

We split the project into three layers — frontend, backend, and cloud infrastructure — and built each one to be independently deployable.

Frontend — Vue 3 + Vite The user interface is a single-page application built with Vue 3, Pinia for state management, and Vue Router for navigation. Authentication state (Cognito JWT tokens) is managed globally through a Pinia store with route guards that redirect unauthenticated users before any protected view loads. All API calls go through a centralized Axios instance that automatically attaches the Bearer token from every request. The app is containerized with a multi-stage Docker build — Node.js compiles the Vite bundle, then a minimal nginx:alpine image serves the static files with SPA fallback routing for deep links.

Backend — Node.js + Express The API layer is a clean, layered Express application: routes → auth middleware → controllers → services. Every protected route passes through a requireAuth middleware that validates the Cognito JWT using aws-jwt-verify before the request reaches any business logic. The service layer is fully decoupled — S3, Textract, Bedrock, and Cognito each have their own service module.

The document processing pipeline works like this:

Multer handles multipart file uploads with in-memory buffering and server-side MIME type filtering Files are stored in Amazon S3 under a documents/{userId}/{uuid} key path, encrypted server-side with AES-256, so documents from different users are always isolated AWS Textract (FORMS analysis mode) extracts structured key-value pairs from each document — it understands form layouts natively, not just raw text pdfjs-dist + node-canvas renders PDF pages to images at high resolution for visual analysis Both the extracted text and rendered images are passed together to Amazon Nova via Bedrock, giving the model two complementary signals for each document Amazon Bedrock + Nova The analysis prompt follows a strict two-step protocol: first extract all named fields from each document, then compare every pair. We invested significant time in prompt engineering — particularly in teaching the model to normalize ID numbers (strip A- prefixes, ignore separators) before comparing them, while still treating single-character name differences as real inconsistencies. The response is constrained to a specific JSON schema.

On top of the model response, we added two deterministic post-processing passes in code: a normalization filter that removes formatting-only false positives from Nova's output, and a name transposition detector that catches first/last name swaps purely through string comparison — not relying on the model for that judgment at all.

Bedrock Guardrails We configured a custom Guardrail that blocks any request unrelated to document analysis, filters harmful content, and anonymizes PII (emails, phone numbers) in responses. The guardrail runs on every single Bedrock invocation.

Authentication — Amazon Cognito User registration and login flow through Cognito's USER_PASSWORD_AUTH flow. The backend calls Cognito directly via the AWS SDK — no third-party auth libraries. Access tokens, ID tokens, and refresh tokens are returned on login and the access token is verified cryptographically on every API call.

Infrastructure as Code — AWS CloudFormation Every AWS resource is defined as a CloudFormation template with numbered deployment scripts:

01_s3.sh — encrypted S3 bucket with versioning and lifecycle policies 02_iam.sh — least-privilege IAM policy scoped to the specific bucket and Bedrock model ARNs 03_bedrock.sh — Guardrail definition with topic blocks, content filters, and PII anonymization rules 04_cognito.sh — User Pool with password policy and JWT configuration 05_ecr.sh — ECR repositories for frontend and backend images 06/07_push.sh — Docker build and push scripts for both services The entire stack can be stood up from scratch in a fresh AWS account with five sequential script calls.

Challenges we ran into

Getting Nova to read immigration forms reliably

Immigration forms are not clean PDFs. They're scanned at varying resolutions, printed on dot-matrix printers from the 90s, stamped, handwritten in margins, and photocopied multiple generations deep. Getting Nova to consistently extract the right value from the right field — and not hallucinate or merge adjacent fields — required far more prompt engineering than we expected. We went through a dozen iterations before landing on a two-step protocol (extract first, compare second) that produced stable, structured output.

False positives from ID number formatting

Early versions of the system flagged nearly every A-Number as inconsistent. A-123 456 789 on one form and A123456789 on another are the same number — but Nova would confidently report them as a mismatch. We had to build a normalization layer that strips formatting characters from all ID-type fields before comparison, then re-inject the "cleaned" result back into the issue list. Getting that logic right without accidentally suppressing real differences (like a digit actually being wrong) was a careful balancing act.

Name transpositions — Nova's blind spot

We discovered that Nova would sometimes mark first and last name fields as "consistent" when they were actually transposed — John Smith on one form and Smith John on the other. The same characters, different fields. From the model's perspective they "matched." We had to build a completely separate, deterministic transposition detector that runs after the model response, compares extracted first and last names pairwise across documents, and injects a HIGH-severity issue when a swap is detected. This was the moment we learned not to trust a language model for comparisons that are fundamentally string equality problems.

Native dependencies in Docker on Alpine Linux

The canvas package — required by pdfjs-dist for rendering PDF pages to images in Node.js — depends on Cairo, Pango, libjpeg, and several other native C libraries. On Alpine Linux these aren't pre-installed and don't have prebuilt binaries, so npm install triggers a full native compilation from source. Getting this to build correctly in a multi-stage Docker image — with build tools in the builder stage and only the runtime .so files in the final image — took several failed builds to get right.

Bedrock response parsing in production

Nova returns analysis as structured JSON — but not always cleanly. In edge cases the model would wrap the JSON in markdown code fences, add a preamble sentence before the opening brace, or occasionally return a response that was valid JSON but didn't match our expected schema. We had to add a regex extraction pass to pull the JSON object out of whatever wrapper Nova put around it, plus a fallback error path when the model returned something unparseable. In production, unpredictability in model output is a first-class engineering problem, not an edge case.

Keeping infrastructure stateless and reproducible

We wanted every AWS resource — S3 bucket, IAM policy, Cognito user pool, Bedrock guardrail, ECR repositories — to be fully reproducible from code, with no manual console steps. Threading CloudFormation stack outputs as inputs into downstream stacks (the S3 bucket ARN flowing into the IAM policy, the IAM user flowing into the Bedrock stack) required careful ordering and Fn::ImportValue references. One wrong dependency and a stack update would fail silently or leave partial state behind.

Accomplishments that we're proud of

A hybrid AI + deterministic analysis pipeline that actually works

The thing we're most proud of is not using AI for everything. The system uses Nova for what language models are genuinely good at — reading complex, variable-format documents and extracting structured meaning from them. But for the comparison logic — is this value the same as that value — we use deterministic code. The normalization pass, the false-positive filter, and the name transposition detector are all pure functions with no model involvement. The result is an analysis pipeline that is both flexible enough to handle any document layout and predictable enough to trust in a legal context. That combination was hard to get right and we think it's the right architectural call.

Real accuracy on real immigration documents

It's easy to demo AI on clean, well-formatted test data. We tested DocsGuard on actual immigration forms — G-1450, N-648, I-485 — the kind you get from government portals and law firm scanners. The system correctly catches A-Number digit mismatches, date format inconsistencies, name spelling differences, and field transpositions on documents that look nearly identical to a tired human reviewer. Seeing it flag a real error on a real form, the first time, without any tuning for that specific form type — that was the moment the project felt real.

Infrastructure that goes from zero to deployed in five commands

Every single AWS resource — S3 bucket, IAM policy scoped to least privilege, Cognito user pool, Bedrock guardrail, ECR repositories, ECS cluster, and ALB — is defined as CloudFormation and deployed by numbered shell scripts. A teammate who has never seen the repo before can run 01_s3.sh through 06_push_frontend.sh in order and have a fully working cloud environment. No manual console steps, no tribal knowledge, no "don't forget to click that checkbox." For a hackathon project, that level of infrastructure discipline is something we deliberately invested in and are genuinely proud of.

Bedrock Guardrails protecting real PII

Our users are uploading some of the most sensitive documents they own — passports, social security cards, immigration filings. We took that seriously. The Bedrock Guardrail we built blocks any request outside the document analysis context, anonymizes emails and phone numbers in model responses, and hard-blocks SSNs and credit card numbers from ever appearing in output. It's not a checkbox — it's a real layer of defense configured specifically for the data our users share with us.

Going from personal frustration to working product in a weekend

The inspiration for this project came from a real experience — one of our teammates spent months navigating the naturalization process, terrified that a small paperwork mistake would cost years of delays. Taking that frustration and turning it into something that could genuinely help other people going through the same process — and shipping a working version of it under hackathon conditions — is the accomplishment we're most proud of when we step back and look at the full picture.

What we learned

AI is a bad comparator but an excellent reader

The biggest engineering lesson of the project: never ask a language model to decide if two values are equal. That sounds obvious in hindsight, but when you're building an AI-powered system, the temptation is to let the model handle everything end-to-end. We learned — through actual bugs in production — that Nova would confidently declare A-123 456 789 and A123456789 as different, or call John Smith and Smith John consistent. String comparison belongs in code. Field extraction and semantic understanding belong in the model. Drawing that line clearly made the system dramatically more reliable.

Prompt structure is an engineering discipline, not just writing

We spent more time on the Bedrock prompt than on any other single component. The breakthrough was treating it like a state machine: Step 1 extracts, Step 2 compares, the output schema is a contract. When we gave the model an open-ended instruction ("find inconsistencies"), results were unpredictable. When we gave it a strict protocol with explicit rules, a defined output format, and examples of what not to flag, it became consistent enough to ship. Prompt engineering at this level is closer to API design than to creative writing.

Multimodal models need high-quality inputs

Garbage in, garbage out applies especially hard to vision models. Early in the project we were rendering PDF pages at 1.5x scale and compressing them as JPEG before sending to Nova. The model was missing single-character differences — an extra letter at the end of a name — because those characters were literally blurry in the image. Switching to higher resolution renders and lossless formats made a meaningful difference in accuracy. The model is only as good as what it can actually see.

Infrastructure as code pays off immediately, not eventually

We've worked on projects where "we'll clean up the infra later" meant nothing was ever reproducible. For DocsGuard we made the call on day one to write every resource as CloudFormation and never click through the console. That decision paid off the same day — when we needed to stand up a second environment for testing, it was one command. When a stack got into a broken state during development, we tore it down and rebuilt it cleanly in minutes. The upfront cost of writing the templates was smaller than the time it would have saved us once.

Handling PII forces you to think differently about every design decision

When the data flowing through your system includes passport numbers, alien registration numbers, and dates of birth, every architectural decision has a new dimension. Where does this data touch disk? Who can query this S3 path? What gets logged to CloudWatch? We found ourselves revisiting decisions we'd normally make automatically — like logging request bodies for debugging — and choosing differently because of the sensitivity of the content. Building for PII from the start is far easier than retrofitting privacy into a system that wasn't designed for it.

A real problem is the best technical forcing function

When the use case is abstract, it's easy to build something that looks good in a demo but falls apart on real data. Because we had a concrete, lived experience behind this project — an actual person who went through the naturalization process — we tested against actual immigration forms from day one. That forced us to confront real edge cases: scanned documents with skewed alignment, forms where the same field has three different labels depending on the version, handwritten values next to printed ones. The realness of the problem made us build something that handles the real world, not just the happy path.

What's next for DocsGuard

Form-type awareness and field-specific validation rules

Right now DocsGuard treats every document as a generic collection of fields and compares them universally. The next evolution is making the system form-aware — recognizing that this is an I-485, that this is a DS-260, and applying validation rules specific to each form type. Some fields are only present on certain forms and their absence is expected, not an inconsistency. Some fields have format requirements defined by USCIS that we can validate independently, not just comparatively. Form recognition would let us go from "these two values differ" to "this value violates the requirement for this specific field on this specific form."

Full petition package analysis

Immigration attorneys don't file two documents — they file packages of 15, 20, sometimes 30 documents at once. The current system handles up to 10 files, but the real workflow is analyzing an entire petition package in a single submission and receiving a consolidated report organized by filing section: identity documents, financial documents, supporting evidence. We want DocsGuard to understand the structure of a complete filing, not just individual document pairs.

Attorney collaboration workflow

A solo applicant using DocsGuard is one use case. An immigration attorney reviewing a client's package with that client in the room is another — and it's where the real volume is. We want to add a shareable review link: the attorney uploads a document set, shares a read-only link with the client, and both parties can annotate issues, mark corrections as resolved, and track the state of the review through to filing. The audit trail of who confirmed what and when has real legal value.

Exportable PDF report for the filing record

The analysis results need to leave the browser. Attorneys need to attach a consistency review summary to the client file, show it to a paralegal, or keep it as evidence of due diligence. A one-click PDF export of the full analysis report — formatted cleanly with document names, field comparisons, severity levels, and recommendations — is the obvious next step and something real users asked for immediately.

Automatic document deletion after analysis

Users are uploading passports and immigration filings. There is no reason DocsGuard needs to retain those documents after the analysis completes. The next version will automatically delete files from S3 the moment the analysis response is returned — the data touches our infrastructure for seconds, not days. Combined with an explicit data retention policy displayed to users before upload, this is both the right thing to do and a meaningful trust signal for users sharing sensitive documents with a service they just discovered.

AWS WAF, Amazon Macie, and enterprise-grade security

Law firms operate under strict compliance requirements. To serve them, DocsGuard needs to match that bar: AWS WAF on the Application Load Balancer with managed OWASP rules, Amazon Macie scanning uploaded documents for unexpected PII patterns, VPC endpoints so all traffic between services stays off the public internet, and a full CloudTrail audit log of every API call. These aren't features — they're table stakes for any firm that has a data processing agreement to maintain.

Expanding beyond immigration

The core capability — cross-document field consistency analysis — is not specific to immigration. Loan applications require the same name, income, and address across a dozen documents. Legal discovery requires consistency across sworn statements and exhibits. Medical prior authorization requires matching diagnoses across referral forms. Immigration is where we started because it's where we felt the pain most directly. But the engine underneath DocsGuard applies anywhere humans fill out the same information multiple times and inconsistencies carry real consequences.

Built With

alb
amazon-web-services
bedrock
bedrockguardrails
cognito
ecr
ecs
express.js
javascript
vue