Skip to content

mohosy/sql-database-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sql-database-engine

A fully functional SQL database engine built from scratch in Python with zero dependencies. Implements a complete pipeline from raw SQL text to query execution: lexer → parser → executor, with support for indexing, persistence, and an interactive REPL.

Features

  • SQL Lexer — Tokenizes raw SQL into a typed token stream. Handles keywords, identifiers, string/number/boolean literals, comparison operators, and comments.
  • Recursive Descent Parser — Parses tokens into an AST with precedence climbing for expressions (OR → AND → NOT → comparisons → primary).
  • Query Executor — Evaluates AST nodes against in-memory table storage. Supports DDL and DML operations with type coercion.
  • Column Indexing — B-tree-style sorted index using bisect for O(log n) equality lookups and range scans. Automatically maintained on insert/update/delete.
  • Persistence — Tables serialize to JSON on every write and reload on startup from a configurable data directory.
  • Interactive REPL — SQL shell with box-drawing table output, multi-line input, and meta-commands (.tables, .schema, .help).

Supported SQL

-- DDL
CREATE TABLE users (id INT PRIMARY KEY, name TEXT, age INT);
DROP TABLE users;
CREATE INDEX idx_name ON users (name);

-- DML
INSERT INTO users VALUES (1, 'Alice', 30);
INSERT INTO users (name, age) VALUES ('Bob', 25);
SELECT * FROM users WHERE age > 25 ORDER BY name ASC LIMIT 10;
SELECT DISTINCT department FROM employees;
SELECT COUNT(*), AVG(salary), MIN(salary), MAX(salary) FROM employees;
UPDATE users SET age = 31 WHERE name = 'Alice';
DELETE FROM users WHERE id = 1;

Data types: INT, FLOAT, TEXT, BOOL, NULL

Operators: =, !=, <>, <, >, <=, >=, AND, OR, NOT

Aggregates: COUNT(*), SUM(), AVG(), MIN(), MAX()

Clauses: WHERE, ORDER BY ASC|DESC, LIMIT, DISTINCT

Architecture

SQL string
  │
  ▼
┌─────────┐     ┌──────────┐     ┌──────────┐
│  Lexer  │────▶│  Parser  │────▶│ Executor │
│         │     │          │     │          │
│ Tokens  │     │   AST    │     │ Result   │
└─────────┘     └──────────┘     └──────────┘
                                      │
                              ┌───────┴───────┐
                              │               │
                          ┌───┴───┐     ┌─────┴─────┐
                          │ Table │     │   Index   │
                          │       │     │ (bisect)  │
                          └───┬───┘     └───────────┘
                              │
                        ┌─────┴─────┐
                        │ JSON file │
                        │ (persist) │
                        └───────────┘

Usage

# Interactive REPL
python3 main.py

# Load demo data (10 employees, 3 departments) and start REPL
python3 main.py --demo

# Execute a statement directly
python3 main.py -e "SELECT 1 + 1"

# Use persistent storage
python3 main.py --data ./mydb

REPL session

sql> CREATE TABLE users (id INT, name TEXT, age INT);
  Table 'users' created.

sql> INSERT INTO users VALUES (1, 'Alice', 30);
  1 row inserted.

sql> INSERT INTO users VALUES (2, 'Bob', 25);
  1 row inserted.

sql> SELECT * FROM users WHERE age > 20 ORDER BY name ASC;
┌────┬───────┬─────┐
│ id │ name  │ age │
├────┼───────┼─────┤
│ 1  │ Alice │ 30  │
│ 2  │ Bob   │ 25  │
└────┴───────┴─────┘
2 rows returned.

Testing

python3 -m pytest tests/ -v

80 tests covering:

  • Lexer — Token types, literals, operators, keywords, error handling
  • Parser — All statement types, expression precedence, error recovery
  • Database — End-to-end CRUD, aggregates, data types, persistence, indexing

Project Structure

sql-database-engine/
├── main.py              # CLI entry point
├── src/
│   ├── types.py         # DataType enum, Column, Schema
│   ├── ast_nodes.py     # AST node dataclasses
│   ├── lexer.py         # SQL tokenizer
│   ├── parser.py        # Recursive descent parser
│   ├── table.py         # Row storage + JSON persistence
│   ├── index.py         # Sorted column index (bisect)
│   ├── executor.py      # Query execution engine
│   ├── database.py      # Top-level interface
│   └── repl.py          # Interactive SQL shell
└── tests/
    ├── test_lexer.py
    ├── test_parser.py
    └── test_database.py

Zero Dependencies

The entire engine — lexer, parser, executor, storage, indexing, REPL — is built using only Python's standard library. No ORMs, no SQLite bindings, no third-party packages.

About

SQL database engine from scratch — lexer, recursive descent parser, query executor, column indexing, persistence, and interactive REPL (Python)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages