A fully functional SQL database engine built from scratch in Python with zero dependencies. Implements a complete pipeline from raw SQL text to query execution: lexer → parser → executor, with support for indexing, persistence, and an interactive REPL.
- SQL Lexer — Tokenizes raw SQL into a typed token stream. Handles keywords, identifiers, string/number/boolean literals, comparison operators, and comments.
- Recursive Descent Parser — Parses tokens into an AST with precedence climbing for expressions (OR → AND → NOT → comparisons → primary).
- Query Executor — Evaluates AST nodes against in-memory table storage. Supports DDL and DML operations with type coercion.
- Column Indexing — B-tree-style sorted index using
bisectfor O(log n) equality lookups and range scans. Automatically maintained on insert/update/delete. - Persistence — Tables serialize to JSON on every write and reload on startup from a configurable data directory.
- Interactive REPL — SQL shell with box-drawing table output, multi-line input, and meta-commands (
.tables,.schema,.help).
-- DDL
CREATE TABLE users (id INT PRIMARY KEY, name TEXT, age INT);
DROP TABLE users;
CREATE INDEX idx_name ON users (name);
-- DML
INSERT INTO users VALUES (1, 'Alice', 30);
INSERT INTO users (name, age) VALUES ('Bob', 25);
SELECT * FROM users WHERE age > 25 ORDER BY name ASC LIMIT 10;
SELECT DISTINCT department FROM employees;
SELECT COUNT(*), AVG(salary), MIN(salary), MAX(salary) FROM employees;
UPDATE users SET age = 31 WHERE name = 'Alice';
DELETE FROM users WHERE id = 1;Data types: INT, FLOAT, TEXT, BOOL, NULL
Operators: =, !=, <>, <, >, <=, >=, AND, OR, NOT
Aggregates: COUNT(*), SUM(), AVG(), MIN(), MAX()
Clauses: WHERE, ORDER BY ASC|DESC, LIMIT, DISTINCT
SQL string
│
▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│ Lexer │────▶│ Parser │────▶│ Executor │
│ │ │ │ │ │
│ Tokens │ │ AST │ │ Result │
└─────────┘ └──────────┘ └──────────┘
│
┌───────┴───────┐
│ │
┌───┴───┐ ┌─────┴─────┐
│ Table │ │ Index │
│ │ │ (bisect) │
└───┬───┘ └───────────┘
│
┌─────┴─────┐
│ JSON file │
│ (persist) │
└───────────┘
# Interactive REPL
python3 main.py
# Load demo data (10 employees, 3 departments) and start REPL
python3 main.py --demo
# Execute a statement directly
python3 main.py -e "SELECT 1 + 1"
# Use persistent storage
python3 main.py --data ./mydbsql> CREATE TABLE users (id INT, name TEXT, age INT);
Table 'users' created.
sql> INSERT INTO users VALUES (1, 'Alice', 30);
1 row inserted.
sql> INSERT INTO users VALUES (2, 'Bob', 25);
1 row inserted.
sql> SELECT * FROM users WHERE age > 20 ORDER BY name ASC;
┌────┬───────┬─────┐
│ id │ name │ age │
├────┼───────┼─────┤
│ 1 │ Alice │ 30 │
│ 2 │ Bob │ 25 │
└────┴───────┴─────┘
2 rows returned.
python3 -m pytest tests/ -v80 tests covering:
- Lexer — Token types, literals, operators, keywords, error handling
- Parser — All statement types, expression precedence, error recovery
- Database — End-to-end CRUD, aggregates, data types, persistence, indexing
sql-database-engine/
├── main.py # CLI entry point
├── src/
│ ├── types.py # DataType enum, Column, Schema
│ ├── ast_nodes.py # AST node dataclasses
│ ├── lexer.py # SQL tokenizer
│ ├── parser.py # Recursive descent parser
│ ├── table.py # Row storage + JSON persistence
│ ├── index.py # Sorted column index (bisect)
│ ├── executor.py # Query execution engine
│ ├── database.py # Top-level interface
│ └── repl.py # Interactive SQL shell
└── tests/
├── test_lexer.py
├── test_parser.py
└── test_database.py
The entire engine — lexer, parser, executor, storage, indexing, REPL — is built using only Python's standard library. No ORMs, no SQLite bindings, no third-party packages.