A modular, minimal relational database management system written in modern C++. RDBMS implements a minimal but end-to-end SQL pipeline—from lexical analysis through semantic binding, volcano-style query execution, and pluggable storage—designed as baseline infrastructure for researching and extending database internals state-of-the art in the future.
The storage engine is deliberately decoupled from the SQL front end, enabling future engine swaps (for example, append-only logs, B-Trees, or LSM-Trees) without rewriting the parser or execution layer.
Almost all existing RDBMS out there are too heavy for most of the projects and use-case that independent developers and small teams work on, on a daily basis. They are heavily bloated. So I implemented my own and it wasn't hard as I knew most of the functionality already. It might miss many important things, and get some things wrong. I also woulnd't expect this to be run in production in its current state or ever.
- Design Goals
- Architecture
- Supported SQL (v1)
- Quick Start
- Project Layout
- Module Reference
- Physical Data Model
- Build & Test
- Development History
- Roadmap
- Contributing
- License
| Goal | Implementation |
|---|---|
| Modularity | Strict boundaries between parser, catalog, binder, executor, and storage |
| Pluggable storage | IStorageEngine / IScanner abstract raw byte records; SQL layers never touch disk layout directly |
| Modern C++ | C++20, RAII, smart pointers, std::optional, std::variant, std::string_view |
| Cross-platform | CMake build verified on macOS (Apple Clang) and Linux (GCC 11+ / Clang) |
| Test-driven | Google Test cases covering every major subsystem |
| Readable codebase | Headers separated from implementations; interfaces isolated under include/rdbms/ |
RDBMS follows a classic layered, monolithic architecture. Each layer depends only on the abstractions below it.
flowchart TB
subgraph client [Client Layer]
REPL["rdbms_repl"]
end
subgraph frontend [SQL Front End]
Lexer["Lexer"]
Parser["Recursive Descent Parser"]
AST["AST"]
Binder["Semantic Binder"]
end
subgraph metadata [Metadata]
Catalog["CatalogManager"]
end
subgraph execution [Execution Engine — Volcano Model]
Planner["Plan Builder"]
Scan["SequentialScanNode"]
Filter["FilterNode"]
end
subgraph storage [Storage Layer]
Codec["Tuple Codec"]
Engine["InMemoryStorageEngine"]
end
REPL --> Lexer
Lexer --> Parser
Parser --> AST
AST --> Binder
Binder --> Catalog
Binder --> Planner
Planner --> Filter
Filter --> Scan
Scan --> Engine
Scan --> Codec
Engine --> Codec
SQL string
→ Lexer (tokens)
→ Parser (AST)
→ Binder (validates against CatalogManager, produces BoundStatement)
→ Plan tree: FilterNode → SequentialScanNode
→ IStorageEngine::GetScanner (raw bytes)
→ Tuple codec (deserialize)
→ Predicate evaluation (WHERE)
→ ASCII table formatting (stdout)
SQL string → Parser → Binder → Tuple codec (serialize) → IStorageEngine::InsertRecord
DDL (CREATE TABLE) updates both the catalog (schema metadata) and the storage engine (physical table slot).
v1 intentionally supports a minimal ANSI subset. Keywords are case-insensitive. Trailing semicolons are optional.
CREATE TABLE users (id INT, name VARCHAR);| Rule | Detail |
|---|---|
| Types | INT, VARCHAR only |
| Constraints | Table must not already exist; column names must be unique |
| Output | OK |
INSERT INTO users VALUES (1, 'Alice');
INSERT INTO users VALUES (2, 'Bob');| Rule | Detail |
|---|---|
| Literals | Integers (42) or single-quoted strings ('Alice') |
| Arity | Value count must match column count |
| Types | Values must match column types (INT ↔ integer, VARCHAR ↔ string) |
| Output | OK |
SELECT id, name FROM users;
SELECT * FROM users;
SELECT id, name FROM users WHERE id = 1;
SELECT * FROM users WHERE name = 'Alice';| Rule | Detail |
|---|---|
| Projection | Explicit column list or * |
| WHERE | Optional; supports column = literal equality only |
| Output | ASCII table + row count, or (0 rows) |
| Command | Action |
|---|---|
exit / quit |
Exit the REPL |
UPDATE, DELETE, DROP, joins, aggregates, ORDER BY, LIMIT, transactions, indexes, network protocol, and multi-statement scripts.
| Requirement | Version |
|---|---|
| CMake | ≥ 3.20 |
| C++ compiler | C++20 (Apple Clang, GCC 11+, or Clang) |
| Git | Required for FetchContent (Google Test) |
git clone <repository-url>
cd RDBMS
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build./build/rdbms_replExample session:
Toy RDBMS v1. Type SQL, or 'exit' to quit.
rdbms> CREATE TABLE users (id INT, name VARCHAR);
OK
rdbms> INSERT INTO users VALUES (1, 'Alice');
OK
rdbms> SELECT * FROM users;
+----+-------+
| id | name |
+----+-------+
| 1 | Alice |
+----+-------+
(1 row)
rdbms> exit
ctest --test-dir build --output-on-failureAll 51 tests should pass.
RDBMS/
├── CMakeLists.txt # Root build; static lib + rdbms_repl + tests
├── cmake/
│ └── CompilerWarnings.cmake # -Wall -Wextra -Wpedantic (cross-platform)
├── include/rdbms/ # Public headers
│ ├── ast/ # Abstract syntax tree + visitor
│ ├── binder/ # Semantic binder + bound AST
│ ├── catalog/ # Schema metadata
│ ├── common/ # Shared utilities (Expected<T,E>)
│ ├── execution/ # Volcano plan nodes + predicate eval
│ ├── parser/ # Lexer + recursive descent parser
│ ├── repl/ # QueryExecutor + result formatting
│ ├── storage/ # Storage interface + in-memory engine
│ └── types/ # Tuple, Value, RecordID, DataType
├── src/ # Implementations (mirrors include/)
├── test/ # Google Test suites
└── src/main.cpp # REPL entry point
| Module | Key Types | Responsibility |
|---|---|---|
| Types | Tuple, Value, RecordID, DataType |
Runtime value representation |
| Storage (interface) | IStorageEngine, IScanner |
Physical record I/O boundary (table name + raw bytes only) |
| Storage (v1 impl) | InMemoryStorageEngine |
In-memory vector<uint8_t> record store |
| Tuple codec | encode_tuple, decode_tuple |
Schema-aware serialization between Tuple and byte payloads |
| Catalog | CatalogManager, TableSchema |
In-memory schema registry (tables, columns, byte offsets) |
| AST | Statement, Expr, AstVisitor |
Parsed SQL structure (std::variant-based) |
| Binder | Binder, BoundStatement |
Resolves identifiers, validates types, produces bound AST |
| Execution | PlanNode, SequentialScanNode, FilterNode |
Iterator-model (Volcano) query execution |
| Parser | Lexer, Parser |
Hand-rolled tokenizer + recursive descent grammar |
| REPL | QueryExecutor, format_table |
End-to-end orchestration and console output |
The storage layer is intentionally ignorant of SQL. It operates on:
- Table names (
std::string) - Record IDs (
RecordID{page_id, slot_id}) - Opaque byte payloads (
std::vector<uint8_t>)
class IStorageEngine {
public:
virtual void Init() = 0;
virtual void Shutdown() = 0;
virtual void CreateTable(const std::string& table_name) = 0;
virtual RecordID InsertRecord(const std::string& table_name,
const std::vector<uint8_t>& payload) = 0;
virtual bool UpdateRecord(const std::string& table_name, RecordID rid,
const std::vector<uint8_t>& payload) = 0;
virtual std::unique_ptr<IScanner> GetScanner(const std::string& table_name) = 0;
};InMemoryStorageEngine is the v1 concrete implementation. UpdateRecord is implemented at the storage layer but not yet exposed via SQL.
Every plan node implements the Volcano iterator interface:
class PlanNode {
public:
virtual void Init() = 0;
virtual std::optional<Tuple> Next() = 0; // std::nullopt = EOF
};v1 builds FilterNode → SequentialScanNode trees for SELECT statements with an optional WHERE clause.
Column byte offsets are assigned at table creation time and stored in the catalog.
| Type | On-disk / in-memory layout |
|---|---|
INT |
8 bytes (int64_t) at column offset |
VARCHAR |
4-byte length prefix (uint32_t) + variable string data |
Example schema users(id INT, name VARCHAR):
| Column | Offset | Size (fixed prefix) |
|---|---|---|
id |
0 | 8 bytes |
name |
8 | 4-byte length + N bytes of string data |
| Target | Description |
|---|---|
rdbms |
Static library containing all engine components |
rdbms_repl |
Interactive REPL executable |
rdbms_tests |
Google Test runner (51 tests) |
Non-MSVC builds enable -Wall -Wextra -Wpedantic via cmake/CompilerWarnings.cmake.
| Test file | Area |
|---|---|
build_pipeline_test.cpp |
Build / C++20 smoke |
types_test.cpp |
Core types + storage interface |
catalog_manager_test.cpp |
Schema CRUD + error paths |
ast_test.cpp |
AST construction + visitor |
binder_test.cpp |
Semantic validation |
plan_node_test.cpp |
PlanNode iterator contract |
execution_operators_test.cpp |
Scan + filter pipeline (mock storage) |
in_memory_storage_engine_test.cpp |
Storage engine + codec |
parser_test.cpp |
Lexer + parser (valid + invalid SQL) |
repl_test.cpp |
End-to-end QueryExecutor |
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build buildcompile_commands.json is generated for clangd / IDE integration.
Planned extensions that align with the existing architecture:
| Area | Enhancement |
|---|---|
| Storage | Persistent append-only log, B-Tree pages, or LSM-Tree backend behind IStorageEngine |
| SQL | UPDATE / DELETE, richer WHERE (comparisons, AND / OR) |
| Execution | NestedLoopJoinNode, projection node, simple cost-based planner |
| Catalog | JSON / binary persistence across restarts |
| Client | Wire protocol (PostgreSQL-compatible or custom RPC) |
| Durability | Write-ahead log (WAL), crash recovery |
Contributions in these areas should preserve the storage boundary and avoid leaking SQL concepts into IStorageEngine.
- Fork the repository and create a feature branch from
main. - Implement changes with corresponding Google Tests.
- Ensure the build is clean on your platform:
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
ctest --test-dir build --output-on-failure- Open a pull request with a clear description of the architectural impact.
- C++20 with RAII; no owning raw pointers.
- Prefer
std::optional,std::variant, and the project-localExpected<T, E>for error propagation. - Keep headers minimal; use forward declarations where possible.
- Match existing module boundaries—do not couple the parser or executor to a specific storage implementation.
- One logical concern per commit; write descriptive commit messages (no step-number prefixes required).
- Subclass
IStorageEngineandIScanner. - Operate exclusively on
std::vector<uint8_t>payloads. - Use
tuple_codecfor serialization—do not duplicate layout logic. - Add a dedicated test file under
test/.