RDBMS

A modular, minimal relational database management system written in modern C++. RDBMS implements a minimal but end-to-end SQL pipeline—from lexical analysis through semantic binding, volcano-style query execution, and pluggable storage—designed as baseline infrastructure for researching and extending database internals state-of-the art in the future.

The storage engine is deliberately decoupled from the SQL front end, enabling future engine swaps (for example, append-only logs, B-Trees, or LSM-Trees) without rewriting the parser or execution layer.

Almost all existing RDBMS out there are too heavy for most of the projects and use-case that independent developers and small teams work on, on a daily basis. They are heavily bloated. So I implemented my own and it wasn't hard as I knew most of the functionality already. It might miss many important things, and get some things wrong. I also woulnd't expect this to be run in production in its current state or ever.

Design Goals

Goal	Implementation
Modularity	Strict boundaries between parser, catalog, binder, executor, and storage
Pluggable storage	`IStorageEngine` / `IScanner` abstract raw byte records; SQL layers never touch disk layout directly
Modern C++	C++20, RAII, smart pointers, `std::optional`, `std::variant`, `std::string_view`
Cross-platform	CMake build verified on macOS (Apple Clang) and Linux (GCC 11+ / Clang)
Test-driven	Google Test cases covering every major subsystem
Readable codebase	Headers separated from implementations; interfaces isolated under `include/rdbms/`

Architecture

RDBMS follows a classic layered, monolithic architecture. Each layer depends only on the abstractions below it.

flowchart TB
    subgraph client [Client Layer]
        REPL["rdbms_repl"]
    end

    subgraph frontend [SQL Front End]
        Lexer["Lexer"]
        Parser["Recursive Descent Parser"]
        AST["AST"]
        Binder["Semantic Binder"]
    end

    subgraph metadata [Metadata]
        Catalog["CatalogManager"]
    end

    subgraph execution [Execution Engine — Volcano Model]
        Planner["Plan Builder"]
        Scan["SequentialScanNode"]
        Filter["FilterNode"]
    end

    subgraph storage [Storage Layer]
        Codec["Tuple Codec"]
        Engine["InMemoryStorageEngine"]
    end

    REPL --> Lexer
    Lexer --> Parser
    Parser --> AST
    AST --> Binder
    Binder --> Catalog
    Binder --> Planner
    Planner --> Filter
    Filter --> Scan
    Scan --> Engine
    Scan --> Codec
    Engine --> Codec

Request lifecycle (SELECT)

SQL string
  → Lexer (tokens)
  → Parser (AST)
  → Binder (validates against CatalogManager, produces BoundStatement)
  → Plan tree: FilterNode → SequentialScanNode
  → IStorageEngine::GetScanner (raw bytes)
  → Tuple codec (deserialize)
  → Predicate evaluation (WHERE)
  → ASCII table formatting (stdout)

Request lifecycle (INSERT)

SQL string → Parser → Binder → Tuple codec (serialize) → IStorageEngine::InsertRecord

DDL (CREATE TABLE) updates both the catalog (schema metadata) and the storage engine (physical table slot).

Supported SQL (v1)

v1 intentionally supports a minimal ANSI subset. Keywords are case-insensitive. Trailing semicolons are optional.

CREATE TABLE

CREATE TABLE users (id INT, name VARCHAR);

Rule	Detail
Types	`INT`, `VARCHAR` only
Constraints	Table must not already exist; column names must be unique
Output	`OK`

INSERT

INSERT INTO users VALUES (1, 'Alice');
INSERT INTO users VALUES (2, 'Bob');

Rule	Detail
Literals	Integers (`42`) or single-quoted strings (`'Alice'`)
Arity	Value count must match column count
Types	Values must match column types (`INT` ↔ integer, `VARCHAR` ↔ string)
Output	`OK`

SELECT

SELECT id, name FROM users;
SELECT * FROM users;
SELECT id, name FROM users WHERE id = 1;
SELECT * FROM users WHERE name = 'Alice';

Rule	Detail
Projection	Explicit column list or `*`
WHERE	Optional; supports `column = literal` equality only
Output	ASCII table + row count, or `(0 rows)`

REPL controls

Command	Action
`exit` / `quit`	Exit the REPL

Not supported in v 0.1

UPDATE, DELETE, DROP, joins, aggregates, ORDER BY, LIMIT, transactions, indexes, network protocol, and multi-statement scripts.

Quick Start

Prerequisites

Requirement	Version
CMake	≥ 3.20
C++ compiler	C++20 (Apple Clang, GCC 11+, or Clang)
Git	Required for FetchContent (Google Test)

Build

git clone <repository-url>
cd RDBMS

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Run the REPL

./build/rdbms_repl

Example session:

Toy RDBMS v1. Type SQL, or 'exit' to quit.
rdbms> CREATE TABLE users (id INT, name VARCHAR);
OK
rdbms> INSERT INTO users VALUES (1, 'Alice');
OK
rdbms> SELECT * FROM users;
+----+-------+
| id | name  |
+----+-------+
| 1  | Alice |
+----+-------+
(1 row)
rdbms> exit

Run tests

ctest --test-dir build --output-on-failure

All 51 tests should pass.

Project Layout

RDBMS/
├── CMakeLists.txt              # Root build; static lib + rdbms_repl + tests
├── cmake/
│   └── CompilerWarnings.cmake  # -Wall -Wextra -Wpedantic (cross-platform)
├── include/rdbms/              # Public headers
│   ├── ast/                    # Abstract syntax tree + visitor
│   ├── binder/                 # Semantic binder + bound AST
│   ├── catalog/                # Schema metadata
│   ├── common/                 # Shared utilities (Expected<T,E>)
│   ├── execution/              # Volcano plan nodes + predicate eval
│   ├── parser/                 # Lexer + recursive descent parser
│   ├── repl/                   # QueryExecutor + result formatting
│   ├── storage/                # Storage interface + in-memory engine
│   └── types/                  # Tuple, Value, RecordID, DataType
├── src/                        # Implementations (mirrors include/)
├── test/                       # Google Test suites
└── src/main.cpp                # REPL entry point

Module Reference

Module	Key Types	Responsibility
Types	`Tuple`, `Value`, `RecordID`, `DataType`	Runtime value representation
Storage (interface)	`IStorageEngine`, `IScanner`	Physical record I/O boundary (table name + raw bytes only)
Storage (v1 impl)	`InMemoryStorageEngine`	In-memory `vector<uint8_t>` record store
Tuple codec	`encode_tuple`, `decode_tuple`	Schema-aware serialization between `Tuple` and byte payloads
Catalog	`CatalogManager`, `TableSchema`	In-memory schema registry (tables, columns, byte offsets)
AST	`Statement`, `Expr`, `AstVisitor`	Parsed SQL structure (`std::variant`-based)
Binder	`Binder`, `BoundStatement`	Resolves identifiers, validates types, produces bound AST
Execution	`PlanNode`, `SequentialScanNode`, `FilterNode`	Iterator-model (Volcano) query execution
Parser	`Lexer`, `Parser`	Hand-rolled tokenizer + recursive descent grammar
REPL	`QueryExecutor`, `format_table`	End-to-end orchestration and console output

Storage engine contract

The storage layer is intentionally ignorant of SQL. It operates on:

Table names (std::string)
Record IDs (RecordID{page_id, slot_id})
Opaque byte payloads (std::vector<uint8_t>)

class IStorageEngine {
public:
    virtual void Init() = 0;
    virtual void Shutdown() = 0;
    virtual void CreateTable(const std::string& table_name) = 0;
    virtual RecordID InsertRecord(const std::string& table_name,
                                const std::vector<uint8_t>& payload) = 0;
    virtual bool UpdateRecord(const std::string& table_name, RecordID rid,
                              const std::vector<uint8_t>& payload) = 0;
    virtual std::unique_ptr<IScanner> GetScanner(const std::string& table_name) = 0;
};

InMemoryStorageEngine is the v1 concrete implementation. UpdateRecord is implemented at the storage layer but not yet exposed via SQL.

Execution model

Every plan node implements the Volcano iterator interface:

class PlanNode {
public:
    virtual void Init() = 0;
    virtual std::optional<Tuple> Next() = 0;  // std::nullopt = EOF
};

v1 builds FilterNode → SequentialScanNode trees for SELECT statements with an optional WHERE clause.

Physical Data Model

Column byte offsets are assigned at table creation time and stored in the catalog.

Type	On-disk / in-memory layout
`INT`	8 bytes (`int64_t`) at column offset
`VARCHAR`	4-byte length prefix (`uint32_t`) + variable string data

Example schema users(id INT, name VARCHAR):

Column	Offset	Size (fixed prefix)
`id`	0	8 bytes
`name`	8	4-byte length + N bytes of string data

Build & Test

CMake targets

Target	Description
`rdbms`	Static library containing all engine components
`rdbms_repl`	Interactive REPL executable
`rdbms_tests`	Google Test runner (51 tests)

Compiler flags

Non-MSVC builds enable -Wall -Wextra -Wpedantic via cmake/CompilerWarnings.cmake.

Test coverage by area

Test file	Area
`build_pipeline_test.cpp`	Build / C++20 smoke
`types_test.cpp`	Core types + storage interface
`catalog_manager_test.cpp`	Schema CRUD + error paths
`ast_test.cpp`	AST construction + visitor
`binder_test.cpp`	Semantic validation
`plan_node_test.cpp`	PlanNode iterator contract
`execution_operators_test.cpp`	Scan + filter pipeline (mock storage)
`in_memory_storage_engine_test.cpp`	Storage engine + codec
`parser_test.cpp`	Lexer + parser (valid + invalid SQL)
`repl_test.cpp`	End-to-end QueryExecutor

Debug build

cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build

compile_commands.json is generated for clangd / IDE integration.

Roadmap

Planned extensions that align with the existing architecture:

Area	Enhancement
Storage	Persistent append-only log, B-Tree pages, or LSM-Tree backend behind `IStorageEngine`
SQL	`UPDATE` / `DELETE`, richer `WHERE` (comparisons, `AND` / `OR`)
Execution	`NestedLoopJoinNode`, projection node, simple cost-based planner
Catalog	JSON / binary persistence across restarts
Client	Wire protocol (PostgreSQL-compatible or custom RPC)
Durability	Write-ahead log (WAL), crash recovery

Contributions in these areas should preserve the storage boundary and avoid leaking SQL concepts into IStorageEngine.

Contributing

Workflow

Fork the repository and create a feature branch from main.
Implement changes with corresponding Google Tests.
Ensure the build is clean on your platform:

 cmake -B build -DCMAKE_BUILD_TYPE=Debug
 cmake --build build
 ctest --test-dir build --output-on-failure

Open a pull request with a clear description of the architectural impact.

Code standards

C++20 with RAII; no owning raw pointers.
Prefer std::optional, std::variant, and the project-local Expected<T, E> for error propagation.
Keep headers minimal; use forward declarations where possible.
Match existing module boundaries—do not couple the parser or executor to a specific storage implementation.
One logical concern per commit; write descriptive commit messages (no step-number prefixes required).

Adding a new storage engine

Subclass IStorageEngine and IScanner.
Operate exclusively on std::vector<uint8_t> payloads.
Use tuple_codec for serialization—do not duplicate layout logic.
Add a dedicated test file under test/.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cmake		cmake
include/rdbms		include/rdbms
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

RDBMS

Table of Contents

Design Goals

Architecture

Request lifecycle (SELECT)

Request lifecycle (INSERT)

Supported SQL (v1)

CREATE TABLE

INSERT

SELECT

REPL controls

Not supported in v 0.1

Quick Start

Prerequisites

Build

Run the REPL

Run tests

Project Layout

Module Reference

Storage engine contract

Execution model

Physical Data Model

Build & Test

CMake targets

Compiler flags

Test coverage by area

Debug build

Roadmap

Contributing

Workflow

Code standards

Adding a new storage engine

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages