Home Page
Report

PolicyPulse

What if AI could actually stress-test a law before it passes?

Inspiration

A new tariff drops. You Google it. Page one: a CNN explainer written for nobody in particular, three Reddit threads where everyone is confidently wrong, and a sponsored ad for an immigration lawyer in Tucson. Nowhere, absolutely nowhere, does anyone actually tell you what this means for your rent, your job, or your grocery bill.

That's just how policy works right now. A 200-page bill becomes a tweet, the tweet gets misquoted on cable news, and six months later you're wondering why your health insurance costs went up. We thought there had to be a better way to understand this stuff before it actually affects people.

So we built PolicyPulse. Seven AI agents that pull real government data, argue with each other, and give you a clear picture of how a policy actually hits your wallet. Oh, and we also made the agents pay for premium data using Bitcoin. At a hackathon. On two hours of sleep. Because why not.

What It Does

You type something like "How will the new tariff policy affect my small business importing electronics from China?" and PolicyPulse actually answers it, not with a news summary or a vague explainer, but with real economic reasoning backed by live government data.

Here is what happens under the hood. A Classifier powered by Google ADK and Gemini reads your question and figures out what kind of policy you're asking about. An Analyst Agent then goes and makes 12 to 18 real API calls to sources like FRED, the Bureau of Labor Statistics, Semantic Scholar, and the Congressional Budget Office. Not cached data. Not summaries. Actual numbers from actual government databases.

Then if the free data isn't enough, a Premium Data Agent steps in and autonomously pays for access to gated legal databases and econometric models using Bitcoin Lightning micropayments. You literally watch the payment happen on screen. Ten satoshis. The agent just does it, no human approval needed.

After that, four Sector Agents covering Labor, Housing, Consumer, and Business impacts all run at the same time. Each one digs into its area, builds structured findings, and is required to explain the mechanism behind every single claim it makes. Not just "wages will go up" but exactly why and by how much based on what data. Finally a Synthesis Agent reads all four reports, finds where they agree, flags where they conflict, and produces a unified impact report with an animated Sankey diagram showing how the economic effects flow.

The whole pipeline streams live to your screen so you can actually watch the agents working in real time. Start to finish in about 30 seconds.

How We Built It

Layer	Tech
Agent Orchestration	LangGraph state machines + ReAct agents
Stage 0 Classifier	Google ADK with gemini-2.5-flash for SZNS Solutions track
Backend	FastAPI, Python 3.11, Pydantic, fully async
Data	FRED, BLS, Census, BEA, Semantic Scholar, OpenAlex, Tavily
Lightning / L402	Bitcoin regtest + litd + Aperture + lnget, fully real payments
Frontend	React 19, Next.js, D3 Sankey diagram, TypeScript
Streaming	Server-Sent Events so every tool call and payment shows live
Infra	One `docker compose up` spins up the entire stack

The key design decision that made everything work: agents don't pass raw text to each other. They pass typed data objects with confidence levels and evidence citations attached. And every single causal claim has to include a mechanism. You can't say "X leads to Y" without saying how. That one rule is what separates this from a chatbot that sounds smart but can't actually back anything up.

Challenges We Ran Into

The first big problem was that our agents started making up economics. The Consumer agent, when analyzing a tariff, cheerfully invented something called "wage ripple effects," basically claiming tariffs would increase people's income. They don't. Tariffs raise prices; they don't hand out raises. We had to build a BILATERAL vs. PURE_COST classification system and add a rule that if a transmission channel doesn't actually exist, the agent has to say so explicitly rather than speculate.

The Lightning stack on Apple Silicon was its own adventure. Most Lightning Docker images simply don't support arm64. We ended up building Aperture and lnget from source, ran into Go module issues that prevent normal installation, and wrote a bootstrap script to set up the entire Bitcoin and Lightning environment in the correct sequence. Getting that sequence right took more attempts than we'd like to admit.

The third issue was subtler and cost us the most time. When API calls failed, the system was silently swapping in placeholder data and continuing like nothing happened. So we'd get results that looked reasonable but were actually just fallbacks from error handling. The fix sounds obvious in hindsight: when something fails, say it failed. Don't hide it.

What We're Proud Of

The Lightning payments are genuinely real and that still feels wild to us. An AI agent hits a gated data endpoint, gets told to pay up via HTTP 402, pays a real Lightning invoice, receives a cryptographic token called a macaroon, and uses that token to get the data. No credit card. No API key. No human in the loop. We believe this is one of the first working demos of an AI agent autonomously paying for data mid-task at a hackathon.

We're also proud of the eval framework, and specifically that it saved us from embarrassing ourselves. We built a suite of 9 evaluation sets covering rubric compliance, tool usage patterns, and hallucination detection. Right before the demo, it flagged that our tariff analysis was somehow concluding that tariffs raise wages. It caught the bug. We fixed it. Without evals that thing would have gone on stage and been very publicly wrong.

And honestly the epistemic structure of the whole system is something we feel good about. Every agent has to show its work: the claim, the mechanism, the confidence level, the evidence, the assumptions. The Synthesis agent can read across all four sector reports and identify real contradictions rather than just mashing everything together.

What We Learned

The biggest lesson is that how agents communicate matters more than how smart they are. When we switched from passing text between agents to passing structured Pydantic objects, the whole system became more reliable overnight. You can actually build disagreement detection and confidence aggregation on top of structured data. You can't do any of that with a blob of text.

We also came away believing L402 is ready for real use. The payment protocol itself is clean and well-designed. The hard part was the infrastructure setup, not the protocol. Lightning Labs' tooling is solid enough to work in a hackathon environment, which says a lot.

What's Next for PolicyPulse

The immediate next step is connecting more real premium data sources to the L402 payment layer. The infrastructure is production-ready; we just need more data providers on the other end. We also want to add user profiles so the analysis is personalized to your job, income, and location rather than giving everyone the same generic output.

Longer term, the pattern we built here applies way beyond economic policy. Any domain where AI agents need access to paid data, whether that's legal research, medical literature, financial data, or satellite imagery, could use this same LangGraph plus L402 architecture. We want to clean it up and publish it as an open reference implementation so other teams can build on it.

And eventually we want to take the Lightning integration to mainnet. Real sats, real marketplace, data providers setting their own prices, agents shopping across sources. That's the vision.

The Agents Disagreed. Here's What Happened.

Most AI project demos show you the happy path where everything works perfectly and the output looks great. We want to show you something more interesting.

When we ran PolicyPulse on "Raise the federal minimum wage to $20 per hour," the Labor Agent and the Business Agent came to completely different conclusions and neither one would back down.

The Labor Agent, pulling from FRED wage elasticity data, argued that job losses would be modest. Its reasoning was that in low-wage labor markets, employers already have outsized bargaining power over workers, which keeps wages artificially below where a competitive market would set them. In that situation, a wage floor doesn't kill jobs the way basic supply and demand would suggest. The economic literature from the past 15 years backs this up pretty strongly.

The Business Agent looked at the same policy and saw a 38% wage shock hitting industries like food service that operate on thin margins. Its argument was that businesses in that situation either automate, cut hours, or close. It pulled Census data on business survival rates to support the claim.

Both agents had high confidence. Both had real citations. Neither was wrong.

Here is what made us proud of this moment: the Synthesis Agent didn't average the two answers together or pick the one that sounded more authoritative. It recognized that both claims are correct under different conditions. In dense urban labor markets, the Labor Agent's logic wins. In rural areas with fewer employers and thinner margins, the Business Agent's logic wins. The final report said both things, explained when each one applies, and flagged it as an unresolved geographic tension.

Most AI systems would have smoothed that over and given you a confident-sounding single answer. PolicyPulse showed you the real disagreement because that is actually more useful. Economic policy is genuinely complicated. The honest answer is sometimes "it depends, and here is exactly what it depends on."

We built a system that respects that complexity instead of hiding it.

Built at HooHacks 2026 by Praneeth Gunasekaran, Rudra Desai, Pratham Jangra, and Samank Gupta.