Overview

Relevant source files

Purpose and Scope

This page provides an introduction to ArcadeDB, explaining its purpose as a multi-model database management system, its key architectural principles, and how its major components work together. For detailed information on specific subsystems, see the relevant sub-pages: Architecture covers the modular design, Database Engine details the core storage and query processing, Server Infrastructure describes the HTTP API and clustering, Client Interfaces explains connection options, and Development and Operations covers building and deploying ArcadeDB.

What is ArcadeDB?

ArcadeDB is a Multi-Model Database Management System that unifies multiple data paradigms—graph, document, key-value, time-series, vector embeddings, and geospatial—into a single, high-performance engine. Created by Luca Garulli (founder of OrientDB) and written from scratch in Java 21+, ArcadeDB is designed for extreme performance through "Low Level Java" programming techniques that minimize garbage collection overhead and maximize mechanical sympathy.

The system is fully transactional with ACID guarantees, supporting MVCC isolation, Write-Ahead Logging (WAL), and distributed replication with leader-based quorum consensus.

Sources: README.md67-76 README.md74-86 pom.xml25-32

Key Characteristics

Multi-Model Design

ArcadeDB stores all data in a unified record-based format, allowing vertices, documents, key-value pairs, time-series data, and vector embeddings to coexist in the same database and participate in the same queries. This eliminates the impedance mismatch typically encountered when integrating disparate database technologies.

Data Model	Description	Primary Use Cases
Graph	Native vertices and edges with bidirectional links	Social networks, fraud detection, knowledge graphs
Document	Schemaless JSON documents with type definitions	Content management, catalogs, flexible schemas
Key/Value	Fast exact-match lookups by primary key	Caching, session storage, configuration
Time Series	Columnar storage with compression	IoT sensor data, metrics, monitoring
Vector	Similarity search with HNSW indexing	AI/ML embeddings, semantic search
Geospatial	Spatial indexing and proximity queries	Location services, routing, GIS

Sources: README.md77-86

Polyglot Query Support

ArcadeDB natively understands five query languages, all executing against the same underlying engine:

Language	Description	Entry Point
SQL	Extended OrientDB SQL with graph extensions	`com.arcadedb.query.sql.parser` (ANTLR4-generated)
Cypher	OpenCypher with native and legacy execution paths	`com.arcadedb.query.opencypher` package
Gremlin	Apache TinkerPop 3.7.x graph traversal	`com.arcadedb.gremlin.ArcadeGraph`
GraphQL	Schema-first GraphQL queries	`com.arcadedb.graphql` package
MongoDB QL	Subset of MongoDB query operators	`com.arcadedb.mongodbw` wire protocol translator

Sources: README.md88-94 engine/pom.xml73-95 gremlin/pom.xml102-138

Architectural Overview

Module Organization

ArcadeDB is structured as a Maven multi-module project with 22 modules organized into logical layers:

Core modules (pom.xml126-148):

arcadedb-engine: Storage engine, schema, transactions, SQL/Cypher parsers
arcadedb-network: RemoteDatabase HTTP client, binary protocol
arcadedb-server: HttpServer (Netty/Undertow), ServerSecurity, ReplicatedDatabase
arcadedb-console: Interactive/batch Console CLI

Wire protocol modules (shaded JARs for plugin isolation):

arcadedb-mongodbw: MongoDB wire protocol adapter
arcadedb-postgresw: PostgreSQL wire protocol (JDBC-compatible)
arcadedb-redisw: Redis RESP protocol subset
arcadedb-bolt: Neo4j Bolt protocol for Cypher queries
arcadedb-grpcw: gRPC service implementation

UI and monitoring:

arcadedb-studio: React-based web UI with Cytoscape.js graph visualization
arcadedb-metrics: Prometheus-compatible metrics endpoint

Sources: pom.xml126-148 server/pom.xml70-81 mongodbw/pom.xml31-33 postgresw/pom.xml32-34 gremlin/pom.xml32-34 package/pom.xml145-261

Deployment Modes

ArcadeDB supports three primary deployment architectures, each implemented by a different Database interface implementation:

Embedded deployment uses LocalDatabase (engine/src/main/java/com/arcadedb/database/LocalDatabase.java) for direct file access with zero network overhead. The application directly instantiates DatabaseFactory and manages the database lifecycle.

Remote deployment uses RemoteDatabase (network/src/main/java/com/arcadedb/remote/RemoteDatabase.java) as an HTTP/JSON client that communicates with HttpServer (server/src/main/java/com/arcadedb/server/http/HttpServer.java). The server wraps databases in ServerDatabase to prevent unsafe operations like drop() or close() from remote clients.

Replicated deployment uses ReplicatedDatabase (server/src/main/java/com/arcadedb/server/ReplicatedDatabase.java) which wraps LocalDatabase and coordinates with HAServer (server/src/main/java/com/arcadedb/server/ha/HAServer.java) for leader election and quorum-based replication.

Sources: network/src/test/java/com/arcadedb/remote/RemoteDatabaseTest.java41-51 Diagram 3 from system architecture

Data Storage and Transactions

Record-Based Storage Model

All data in ArcadeDB—regardless of model (graph vertex, document, time-series point)—is stored as a record identified by a Record ID (RID) in the format #bucket:position. Records are grouped into buckets (files on disk), which are the unit of parallelism for queries and transactions.

Key classes:

RID: Record identifier (engine/src/main/java/com/arcadedb/database/RID.java)
MutableDocument, ImmutableVertex, etc.: Record implementations
Bucket: Physical storage file containing records
PageManager: Page cache and async I/O coordinator

Transaction Lifecycle

ArcadeDB uses a two-phase commit protocol with MVCC (Multi-Version Concurrency Control):

Phase 1 (commit1stPhase()) acquires locks, validates that no other transaction has modified the same pages (optimistic locking), and prepares a WAL entry.

Phase 2 (commit2ndPhase()) writes to the WAL, updates in-memory page cache, queues pages for asynchronous disk flush, and releases locks. This ensures durability while maintaining high throughput through async I/O.

Sources: Diagram 4 from system architecture, engine/pom.xml1-237

Query Processing Pipeline

Language-Specific Parsers

Each query language has a dedicated parser that produces an execution plan:

Language	Parser Technology	Output Format	Execution
SQL	ANTLR4 grammar (engine/src/main/antlr4/com/arcadedb/query/sql/parser/SQLGrammar.g4)	`SelectStatement` AST	Step-based pipeline
Cypher (native)	ANTLR4 OpenCypher grammar	`PhysicalPlan` (cost-based optimizer)	Operator tree
Cypher (legacy)	OpenCypher parser	`MatchStatement` AST	Step-based (compatible)
Gremlin	TinkerPop bytecode	`GraphTraversal`	Strategy-based execution
GraphQL	GraphQL schema + SDL	Field resolvers	Direct DB access

Query Execution Flow

All query languages ultimately execute steps against the same TransactionContext, which provides MVCC-isolated reads from the page cache and index lookups. This unified execution layer allows cross-language features like SQL functions called from Cypher or Gremlin traversals accessing SQL-created indexes.

Sources: Diagram 2 from system architecture, engine/pom.xml73-95

Configuration and Runtime

Configuration System

ArcadeDB's behavior is controlled by GlobalConfiguration (engine/src/main/java/com/arcadedb/GlobalConfiguration.java), an enum with 150+ settings. Configuration sources are resolved in this order (highest priority first):

Command-line arguments (e.g., -Darcadedb.server.rootPassword=...)
Environment variables (e.g., arcadedb_server_rootPassword)
System properties (e.g., System.setProperty())
Configuration file (config/server-configuration.json)
Default values (hardcoded in GlobalConfiguration enum)

Configuration profiles provide preset tunings for different scenarios:

default: Balanced settings for general use
high-performance: Maximizes throughput (larger caches, more threads)
low-ram: Minimizes memory footprint for constrained environments
low-cpu: Reduces CPU usage (fewer threads, less parallelism)

Server Lifecycle

The ArcadeDBServer class (server/src/main/java/com/arcadedb/server/ArcadeDBServer.java) orchestrates server startup:

Load configuration from multiple sources
Initialize ServerSecurity from config/server-users.jsonl and config/server-groups.json
Start HttpServer (Undertow) on port 2480 (default)
Start optional HAServer for replication (if configured)
Load wire protocol plugins from plugins/ directory (MongoDB, Postgres, Redis, Bolt, gRPC)
Initialize default databases (if specified in config)
Register shutdown hook for graceful termination

Sources: Diagram 7 from system architecture, server/pom.xml69-136

Distribution Variants

ArcadeDB releases include four pre-built distribution packages, assembled by package/pom.xml42-143:

Variant	Included Modules	Size	Use Case
full	All modules (engine, server, console, studio, gremlin, graphql, all wire protocols, metrics)	~200 MB	Development, evaluation, full-featured deployments
minimal	Excludes `gremlin`, `redisw`, `mongodbw`, `graphql`	~150 MB	Production with SQL/Cypher only
headless	Excludes `gremlin`, `redisw`, `mongodbw`, `graphql`, `studio`	~140 MB	Server-only deployments without web UI
base	Engine, server, network only (no console, studio, wire protocols, metrics)	~100 MB	Minimal footprint, embedded or custom integrations

The Maven Shade Plugin (pom.xml375-424) creates self-contained JARs for wire protocol modules, isolating dependencies to prevent classpath conflicts. For example, arcadedb-gremlin-shaded.jar bundles Apache TinkerPop dependencies without affecting other modules.

Custom distributions can be built using the Custom Package Builder script:

Sources: README.md152-168 package/pom.xml42-143 pom.xml375-424

Technology Stack Summary

Layer	Technologies	Purpose
Core Language	Java 21+, `jdk.incubator.vector` (SIMD)	High-performance runtime with advanced JVM features
Build System	Maven 3.x, parent POM with 22 modules	Multi-module project management
Parsing	ANTLR4 (SQL, Cypher), JavaCC (legacy)	Query language lexing and parsing
Query Engines	Custom step-based execution, Apache TinkerPop 3.7.x, OpenCypher translator	Polyglot query support
Storage	Custom page-based engine with LZ4 compression, LSM-Tree indexes, Write-Ahead Log	Low-level file I/O with MVCC
Indexing	LSM-Tree (range queries), Hash (exact match), Lucene (full-text), JVector (similarity search)	Multi-index strategy
Networking	Undertow HTTP server, Netty (wire protocols), gRPC (Protocol Buffers)	Client-server communication
Web UI	React, Cytoscape.js, Webpack	Interactive graph visualization and query editor
Scripting	GraalVM Truffle (JavaScript, Python, Ruby)	Embedded polyglot scripting
Monitoring	Micrometer, Prometheus export, JMX	Metrics and observability
Testing	JUnit 5, Cucumber (OpenCypher TCK), Testcontainers	Unit, integration, and E2E tests
Packaging	Maven Assembly, Docker multi-platform builds	Distribution creation

Sources: pom.xml48-104 engine/pom.xml36-50 server/pom.xml36-42 gremlin/pom.xml36-46

Getting Started

Quick Start with Docker

The fastest way to explore ArcadeDB is using the official Docker image:

Access the web UI at http://localhost:2480 (default credentials: root / playwithdata).

Embedded Usage (Java)

Remote Client (Java)

Sources: README.md131-148 network/src/test/java/com/arcadedb/remote/RemoteDatabaseTest.java42-56

Next Steps

Architecture: Detailed module organization and layered design
Database Engine: Deep dive into storage, transactions, and query processing
Server Infrastructure: HTTP API, security, clustering, and monitoring
Client Interfaces: Console, Studio, RemoteDatabase, and Python bindings
Development and Operations: Building, testing, packaging, and deployment

Overview

Relevant source files

Purpose and Scope

What is ArcadeDB?

The system is fully transactional with ACID guarantees, supporting MVCC isolation, Write-Ahead Logging (WAL), and distributed replication with leader-based quorum consensus.

Sources: README.md67-76 README.md74-86 pom.xml25-32

Key Characteristics

Multi-Model Design

Data Model	Description	Primary Use Cases
Graph	Native vertices and edges with bidirectional links	Social networks, fraud detection, knowledge graphs
Document	Schemaless JSON documents with type definitions	Content management, catalogs, flexible schemas
Key/Value	Fast exact-match lookups by primary key	Caching, session storage, configuration
Time Series	Columnar storage with compression	IoT sensor data, metrics, monitoring
Vector	Similarity search with HNSW indexing	AI/ML embeddings, semantic search
Geospatial	Spatial indexing and proximity queries	Location services, routing, GIS

Sources: README.md77-86

Polyglot Query Support

ArcadeDB natively understands five query languages, all executing against the same underlying engine:

Language	Description	Entry Point
SQL	Extended OrientDB SQL with graph extensions	`com.arcadedb.query.sql.parser` (ANTLR4-generated)
Cypher	OpenCypher with native and legacy execution paths	`com.arcadedb.query.opencypher` package
Gremlin	Apache TinkerPop 3.7.x graph traversal	`com.arcadedb.gremlin.ArcadeGraph`
GraphQL	Schema-first GraphQL queries	`com.arcadedb.graphql` package
MongoDB QL	Subset of MongoDB query operators	`com.arcadedb.mongodbw` wire protocol translator

Sources: README.md88-94 engine/pom.xml73-95 gremlin/pom.xml102-138

Architectural Overview

Module Organization

ArcadeDB is structured as a Maven multi-module project with 22 modules organized into logical layers:

Core modules (pom.xml126-148):

arcadedb-engine: Storage engine, schema, transactions, SQL/Cypher parsers
arcadedb-network: RemoteDatabase HTTP client, binary protocol
arcadedb-server: HttpServer (Netty/Undertow), ServerSecurity, ReplicatedDatabase
arcadedb-console: Interactive/batch Console CLI

Wire protocol modules (shaded JARs for plugin isolation):

arcadedb-mongodbw: MongoDB wire protocol adapter
arcadedb-postgresw: PostgreSQL wire protocol (JDBC-compatible)
arcadedb-redisw: Redis RESP protocol subset
arcadedb-bolt: Neo4j Bolt protocol for Cypher queries
arcadedb-grpcw: gRPC service implementation

UI and monitoring:

arcadedb-studio: React-based web UI with Cytoscape.js graph visualization
arcadedb-metrics: Prometheus-compatible metrics endpoint

Sources: pom.xml126-148 server/pom.xml70-81 mongodbw/pom.xml31-33 postgresw/pom.xml32-34 gremlin/pom.xml32-34 package/pom.xml145-261

Deployment Modes

ArcadeDB supports three primary deployment architectures, each implemented by a different Database interface implementation:

Sources: network/src/test/java/com/arcadedb/remote/RemoteDatabaseTest.java41-51 Diagram 3 from system architecture

Data Storage and Transactions

Record-Based Storage Model

Key classes:

RID: Record identifier (engine/src/main/java/com/arcadedb/database/RID.java)
MutableDocument, ImmutableVertex, etc.: Record implementations
Bucket: Physical storage file containing records
PageManager: Page cache and async I/O coordinator

Transaction Lifecycle

ArcadeDB uses a two-phase commit protocol with MVCC (Multi-Version Concurrency Control):

Phase 1 (commit1stPhase()) acquires locks, validates that no other transaction has modified the same pages (optimistic locking), and prepares a WAL entry.

Sources: Diagram 4 from system architecture, engine/pom.xml1-237

Query Processing Pipeline

Language-Specific Parsers

Each query language has a dedicated parser that produces an execution plan:

Language	Parser Technology	Output Format	Execution
SQL	ANTLR4 grammar (engine/src/main/antlr4/com/arcadedb/query/sql/parser/SQLGrammar.g4)	`SelectStatement` AST	Step-based pipeline
Cypher (native)	ANTLR4 OpenCypher grammar	`PhysicalPlan` (cost-based optimizer)	Operator tree
Cypher (legacy)	OpenCypher parser	`MatchStatement` AST	Step-based (compatible)
Gremlin	TinkerPop bytecode	`GraphTraversal`	Strategy-based execution
GraphQL	GraphQL schema + SDL	Field resolvers	Direct DB access

Query Execution Flow

Sources: Diagram 2 from system architecture, engine/pom.xml73-95

Configuration and Runtime

Configuration System

Command-line arguments (e.g., -Darcadedb.server.rootPassword=...)
Environment variables (e.g., arcadedb_server_rootPassword)
System properties (e.g., System.setProperty())
Configuration file (config/server-configuration.json)
Default values (hardcoded in GlobalConfiguration enum)

Configuration profiles provide preset tunings for different scenarios:

default: Balanced settings for general use
high-performance: Maximizes throughput (larger caches, more threads)
low-ram: Minimizes memory footprint for constrained environments
low-cpu: Reduces CPU usage (fewer threads, less parallelism)

Server Lifecycle

The ArcadeDBServer class (server/src/main/java/com/arcadedb/server/ArcadeDBServer.java) orchestrates server startup:

Load configuration from multiple sources
Initialize ServerSecurity from config/server-users.jsonl and config/server-groups.json
Start HttpServer (Undertow) on port 2480 (default)
Start optional HAServer for replication (if configured)
Load wire protocol plugins from plugins/ directory (MongoDB, Postgres, Redis, Bolt, gRPC)
Initialize default databases (if specified in config)
Register shutdown hook for graceful termination

Sources: Diagram 7 from system architecture, server/pom.xml69-136

Distribution Variants

ArcadeDB releases include four pre-built distribution packages, assembled by package/pom.xml42-143:

Variant	Included Modules	Size	Use Case
full	All modules (engine, server, console, studio, gremlin, graphql, all wire protocols, metrics)	~200 MB	Development, evaluation, full-featured deployments
minimal	Excludes `gremlin`, `redisw`, `mongodbw`, `graphql`	~150 MB	Production with SQL/Cypher only
headless	Excludes `gremlin`, `redisw`, `mongodbw`, `graphql`, `studio`	~140 MB	Server-only deployments without web UI
base	Engine, server, network only (no console, studio, wire protocols, metrics)	~100 MB	Minimal footprint, embedded or custom integrations

Custom distributions can be built using the Custom Package Builder script:

Sources: README.md152-168 package/pom.xml42-143 pom.xml375-424

Technology Stack Summary

Layer	Technologies	Purpose
Core Language	Java 21+, `jdk.incubator.vector` (SIMD)	High-performance runtime with advanced JVM features
Build System	Maven 3.x, parent POM with 22 modules	Multi-module project management
Parsing	ANTLR4 (SQL, Cypher), JavaCC (legacy)	Query language lexing and parsing
Query Engines	Custom step-based execution, Apache TinkerPop 3.7.x, OpenCypher translator	Polyglot query support
Storage	Custom page-based engine with LZ4 compression, LSM-Tree indexes, Write-Ahead Log	Low-level file I/O with MVCC
Indexing	LSM-Tree (range queries), Hash (exact match), Lucene (full-text), JVector (similarity search)	Multi-index strategy
Networking	Undertow HTTP server, Netty (wire protocols), gRPC (Protocol Buffers)	Client-server communication
Web UI	React, Cytoscape.js, Webpack	Interactive graph visualization and query editor
Scripting	GraalVM Truffle (JavaScript, Python, Ruby)	Embedded polyglot scripting
Monitoring	Micrometer, Prometheus export, JMX	Metrics and observability
Testing	JUnit 5, Cucumber (OpenCypher TCK), Testcontainers	Unit, integration, and E2E tests
Packaging	Maven Assembly, Docker multi-platform builds	Distribution creation

Sources: pom.xml48-104 engine/pom.xml36-50 server/pom.xml36-42 gremlin/pom.xml36-46

Getting Started

Quick Start with Docker

The fastest way to explore ArcadeDB is using the official Docker image:

Access the web UI at http://localhost:2480 (default credentials: root / playwithdata).

Embedded Usage (Java)

Remote Client (Java)

Sources: README.md131-148 network/src/test/java/com/arcadedb/remote/RemoteDatabaseTest.java42-56

Next Steps

Architecture: Detailed module organization and layered design
Database Engine: Deep dive into storage, transactions, and query processing
Server Infrastructure: HTTP API, security, clustering, and monitoring
Client Interfaces: Console, Studio, RemoteDatabase, and Python bindings
Development and Operations: Building, testing, packaging, and deployment

Overview

Purpose and Scope

What is ArcadeDB?

Key Characteristics

Multi-Model Design

Polyglot Query Support

Architectural Overview

Module Organization

Deployment Modes

Data Storage and Transactions

Record-Based Storage Model

Transaction Lifecycle

Query Processing Pipeline

Language-Specific Parsers

Query Execution Flow

Configuration and Runtime

Configuration System

Server Lifecycle

Distribution Variants

Technology Stack Summary

Getting Started

Quick Start with Docker

Embedded Usage (Java)

Remote Client (Java)

Next Steps

On this page

Overview

Purpose and Scope

What is ArcadeDB?

Key Characteristics

Multi-Model Design

Polyglot Query Support

Architectural Overview

Module Organization

Deployment Modes

Data Storage and Transactions

Record-Based Storage Model

Transaction Lifecycle

Query Processing Pipeline

Language-Specific Parsers

Query Execution Flow

Configuration and Runtime

Configuration System

Server Lifecycle

Distribution Variants

Technology Stack Summary

Getting Started

Quick Start with Docker

Embedded Usage (Java)

Remote Client (Java)

Next Steps

On this page