Inspiration

Enterprise accounting systems are fundamentally inefficient. Small and medium-sized businesses still rely on manual data entry, brittle integrations, and fragmented dashboards that create costly errors and delay financial decisions. Traditional software assumes perfectly structured inputs, yet real-world financial data is messy, receipts are photographed, invoices are emailed, and bank statements exist across disconnected systems.

At the same time, modern AI tools have largely remained trapped inside the text box paradigm. Chatbots can provide advice, but they rarely interact directly with real financial workflows or execute financial operations.

We were inspired to build something fundamentally different: an autonomous financial executive powered by multimodal AI.

AutoBooks Finance is designed as a CFO copilot that can see financial documents, hear voice instructions, speak back to users, and initiate treasury operations. Instead of simply answering questions, it becomes an active financial assistant capable of translating visual financial information into structured accounting operations and initiating secure on-chain transactions when authorized.

This approach moves AI beyond passive assistance toward a real-time financial operating layer for modern businesses.


What it does

AutoBooks Finance is a next-generation multimodal AI agent built for the UI Navigator category of the Gemini Live Agent Challenge.

The system observes financial interfaces visually, extracts structured financial data using Gemini multimodal capabilities, applies deterministic accounting validation rules, and can initiate treasury operations through blockchain infrastructure.

Zero-DOM Visual UI Navigation

The agent visually observes financial documents, browser screens, invoices, and receipts and extracts structured accounting data without relying on DOM access or APIs.

Using Gemini's multimodal vision, the system interprets raw screenshots or document images similarly to how a human accountant would, reading totals, vendors, transaction amounts, and metadata directly from visual context.

Because it operates visually, the system is resilient to DOM changes, UI redesigns, and broken integrations.


Deterministic Financial Guardrails

Autonomy without safety is risky in financial systems. AutoBooks Finance introduces a deterministic guardrail engine that translates extracted financial data into computable accounting validation rules aligned with IFRS-inspired accounting structures.

Every financial action must pass rule validation such as:

  • Ownership validation
  • Transaction threshold checks
  • Double-entry accounting verification
  • Classification validation against accounting categories

If anomalies are detected, such as ownership mismatches, suspicious totals, or incomplete financial records, the system triggers a deterministic halt and requires human confirmation before proceeding.

This layered validation approach helps reduce the risk of incorrect financial records being generated by AI outputs.


Multimodal Omnichannel CFO Interface

AutoBooks Finance provides a multimodal user interface.

Users can interact with the system through:

  • Visual input (upload receipts, screenshots, invoices)
  • Voice commands
  • Natural language chat
  • WhatsApp messaging via Twilio

For example, a founder in the field can photograph a receipt, send it through WhatsApp, and receive a voice response confirming the recorded accounting entry. If the system blocks a transaction due to guardrails, the founder can review or authorize the action through a voice confirmation command.

This creates a real-time CFO interface accessible from anywhere.


Autonomous Web3 Treasury Execution

Insights alone are not enough—financial systems must also execute.

AutoBooks Finance connects accounting intelligence with on-chain treasury operations on the Celo blockchain.

Using natural language commands, users can instruct the system to:

  • Execute USDC payroll payments
  • Distribute dividends to shareholder wallets
  • Perform automated treasury transactions

Transactions are signed and broadcast through a secure Web3 microservice after passing validation checks and user authorization, enabling automated financial execution without relying solely on traditional banking APIs.


How we built it

The system is built as a modular architecture hosted across multiple cloud environments, with the core AI infrastructure deployed on Google Cloud.

IFRS Backend (Google Cloud Run + Cloud SQL)

The financial intelligence engine runs on Google Cloud using:

  • Django 5
  • Celery task queues
  • PostgreSQL database

This backend integrates with Google's Gemini models through the Google GenAI SDK, enabling multimodal document analysis and reasoning.

The backend processes visual financial data, validates it through deterministic accounting rules, and produces structured accounting records that are stored in Cloud SQL.


Frontend UI (Next.js)

The frontend dashboard is built using Next.js 15 and provides a modern financial control interface.

Key capabilities include:

  • Real-time financial dashboards
  • Audit trails through the Ledger Overwatch system
  • Document uploads for visual processing
  • Treasury transaction monitoring
  • Multimodal interaction endpoints

Web3 Treasury Node

A separate Node.js microservice handles blockchain execution.

This service manages:

  • Celo wallet infrastructure
  • Transaction signing
  • Secure execution of USDC transfers
  • Treasury automation logic

The microservice communicates with the IFRS backend to execute validated financial operations.


System Flow

  1. User uploads a receipt, screenshot, or invoice
  2. Gemini multimodal vision extracts financial data
  3. Accounting validation rules check the interpretation
  4. Journal entries are generated and stored
  5. Optional treasury commands trigger Web3 execution
  6. Blockchain transaction hashes are returned and logged in the audit trail

This architecture ensures traceability, validation, and controlled execution.


Challenges we ran into

The biggest challenge was balancing AI autonomy with strict financial safety.

Financial systems cannot tolerate hallucinations or ambiguous outputs. We needed a system that could leverage the reasoning power of Gemini while still enforcing deterministic validation.

To solve this, we engineered a two-layer decision architecture:

  1. Gemini performs visual interpretation and reasoning
  2. A deterministic rule engine validates every output

If the reasoning layer produces ambiguous or conflicting interpretations, the system halts automatically for human review.

Another challenge was designing the system to operate without DOM access, forcing the AI to rely entirely on visual understanding rather than structured web data.

This required careful prompt engineering, structured schemas, and robust validation layers.


Accomplishments that we're proud of

We successfully built a system that demonstrates a multimodal AI workflow beyond traditional chat interfaces.

AutoBooks Finance allows a user to:

  • Show the system a financial document
  • Speak a command
  • Receive a spoken response
  • Initiate financial operations from the same interface

The combination of visual UI navigation, voice interaction, deterministic accounting validation, and blockchain treasury execution illustrates a new approach to enterprise financial automation.

We are particularly proud that the system can initiate real blockchain-based financial operations using stablecoins, allowing the AI system to move beyond advisory roles toward controlled financial execution.


What we learned

One of the most important lessons from building this project was that visual UI navigation can be more resilient than traditional API integrations.

Many automation systems fail when:

  • APIs change
  • UI structures update
  • Class names or DOM elements shift

By using Gemini's visual interpretation capabilities, our system operates more like a human accountant, it simply reads the information directly from the screen.

This can significantly improve long-term reliability for automation workflows.

We also learned that combining probabilistic AI reasoning with deterministic validation systems is essential when deploying AI in high-stakes environments like finance.


What's next for Autobooks Finance

Our roadmap focuses on expanding the autonomy and intelligence of the platform.

Upcoming developments include:

  • Autonomous treasury optimization strategies such as yield farming idle corporate liquidity
  • Expansion of the accounting engine to support multi-jurisdictional tax compliance
  • Advanced financial forecasting using multimodal financial data
  • Autonomous workflow automation across enterprise financial systems
  • Deeper blockchain treasury management capabilities

Our long-term vision is to create a highly automated financial operating system for small and medium-sized enterprises, where AI systems assist with accounting, treasury, and compliance workflows while keeping humans in the decision loop.

Built With

Share this project:

Updates