Skip to content

[Umbrella] SurrealDB: Architecture Decision & Known ChallengesΒ #372

@lfnovo

Description

@lfnovo

πŸ—„οΈ SurrealDB: Why We Use It & Known Challenges

This is a tracking issue that documents our database architecture decision and groups related issues. Unlike other umbrella issues focused on community work, this one explains why we made this choice and tracks challenges we're actively working on.

Why SurrealDB?

Open Notebook chose SurrealDB for several strategic reasons:

1. Multi-model in One

SurrealDB combines document store, graph database, and relational features. This matters for Open Notebook because:

  • Sources are documents with metadata
  • Notebooks ↔ Sources relationships are graph-like
  • Embeddings are built-in β€” we already use them extensively for semantic search

2. Graph-First Future

We plan to significantly expand the use of graph relationships in the product. SurrealDB's native graph capabilities will enable:

  • Complex knowledge connections between sources
  • Cross-notebook relationships
  • Semantic linking of concepts

3. AI-Oriented Database

SurrealDB is explicitly focused on AI use cases and actively evolving in this direction:

  • Native vector embeddings (which we already use)
  • ML-friendly query patterns
  • Continuous improvements for AI workloads

4. Frontend-Accessible (like Firebase/Supabase)

SurrealDB can be accessed directly from the frontend, similar to Firebase or Supabase. This enables:

  • Real-time subscriptions from the UI
  • Simplified architecture
  • Future possibilities for offline-first features

5. Simplified Infrastructure

Traditional stacks require multiple services:

Typical stack:          Open Notebook stack:
β”œβ”€β”€ Postgres            └── SurrealDB (does it all)
β”œβ”€β”€ Redis (cache)           β”œβ”€β”€ Data storage
β”œβ”€β”€ Celery (jobs)           β”œβ”€β”€ Background jobs (surreal-commands)
└── Vector DB               β”œβ”€β”€ Vector embeddings
                            └── Real-time subscriptions

This means:

  • Easier self-hosting β€” one database to manage
  • Simpler Docker setup β€” fewer containers
  • Lower resource usage β€” important for local/privacy-focused users

Known Trade-offs

We're aware of these challenges:

Challenge Impact Our Approach
Younger ecosystem Fewer tutorials, smaller community We document more, contribute back
Transaction conflicts Verbose error logs under concurrency Already handled (see below)
Performance tuning Less established best practices Profile and optimize as we go
Enterprise readiness Questions about production scale Monitor closely, have fallback plan

About Transaction Conflicts

This is a known issue that the SurrealDB team is actively addressing in upcoming releases. On our side:

  • Already solved: We use Tenacity for automatic retries
  • Current impact: Mostly log verbosity, not actual failures
  • What we need: Better log management to reduce noise

This is not a blocker β€” it's a managed inconvenience.

Current Decision

Stay with SurrealDB, work through the challenges.

Rationale

  1. Migration cost is high β€” Rewriting data layer + losing graph features
  2. Problems are addressable β€” Transaction conflicts have workarounds
  3. Unique value β€” No other single DB gives us document + graph + jobs
  4. Aligned with our users β€” Privacy-focused users prefer simpler infra

When We'd Reconsider

  • Transaction conflicts become unworkable despite optimizations
  • Performance doesn't improve with query tuning
  • Critical security issue without timely fix
  • A clear alternative emerges with same benefits + maturity

Issues Being Tracked

Critical

Investigation

Performance

Tooling

Related

How We're Addressing These

Transaction Conflicts (#362)

Current: Retry logic with Tenacity in surreal-commands
Planned:
- Confirm impact is log verbosity only (#373)
- Optimize concurrent write patterns
- Batch operations where possible
- Better log filtering

Performance (#351)

Current: Investigating query patterns
Planned:
- Add pagination to source listing
- Optimize indexes
- Consider caching hot paths

Visibility (#186)

Current: Logs only
Planned:
- Command monitor UI in advanced settings
- Show queue depth, success/failure rates

Alternatives We Considered

Option What We'd Gain What We'd Lose
PostgreSQL + pgvector Maturity, ecosystem, proven scale Graph queries, simple jobs (need Celery)
SQLite + LiteFS Ultimate simplicity, zero config Scale, concurrency, graph features
MongoDB + Redis + Celery Familiar stack, lots of tooling Simplicity, our infra advantage
Hybrid (Postgres + Neo4j) Best of both worlds Complexity, ops burden

For Contributors

If you're experiencing SurrealDB issues:

  1. Check if it's known β€” Look at linked issues above
  2. Provide details β€” SurrealDB version, query patterns, data volume
  3. Share workarounds β€” If you found one, others benefit

If you want to help:

  • Performance profiling is welcome
  • Query optimization PRs appreciated
  • Documentation of patterns that work

πŸš€ Opportunity: Schema & Feature Optimization

We started using SurrealDB before fully understanding everything it could do.

This means there's significant opportunity to improve our current schema and leverage features we're not yet using. If you have SurrealDB expertise, you could help with:

  • Schema optimization β€” Better table structures, indexes, relations
  • Query patterns β€” More idiomatic SurrealQL
  • Feature adoption β€” Using capabilities we haven't explored yet
  • Performance tuning β€” Identifying bottlenecks and fixes

This is a great way to contribute if you know SurrealDB well β€” you'd be directly improving the database layer that powers everything.

πŸ’¬ We Want Your Input

Do you have experience with database infrastructure at scale?

We'd love to hear from you. This is an open discussion β€” we're not married to any particular solution. If you have:

  • Experience running SurrealDB in production
  • Suggestions for alternative architectures
  • Ideas for optimizations we haven't considered
  • War stories from similar migrations
  • SurrealDB expertise β€” schema design, query optimization, best practices

Please comment on this issue. We value practical experience over theoretical debates. Tell us what you've seen work (or fail) in real-world scenarios.

πŸ’‘ SurrealDB community: We're actively sharing this in SurrealDB forums. If you landed here from there β€” welcome! We'd especially appreciate your database expertise.

References


Maintainer: @lfnovo

Last updated: 2026-01-01

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: databaseDatabase, SurrealDB, and data layerdocumentationImprovements or additions to documentationumbrellaTracking issue that groups related work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions