Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: August 21, 2025
In this tutorial, we’ll build a simple yet handy command-line database tool called simpledb from scratch, written in C on a Linux environment. Instead of building a complex relational system, we’ll use JSON files as a lightweight data store, allowing us to inspect and manipulate records directly with standard UNIX utilities.
We’ll walk through the high-level rationale, key design decisions, and a thorough testing approach.
The full source code, including the C implementation and a Bash script for in-depth testing, is available in our repository. We’ll see only relevant snippets here to keep the explanation clear and concise.
MySQL and other relational database engines are powerful solutions for large or highly structured applications, but their hardware requirements and complexity are typically prohibitive for very small or embedded projects.
In contrast, simpledb is extremely lightweight. On our Linux Mint 22 testing environment, its compiled binary occupies only 22KiB on disk, and its database operations consume about 1792KiB of RAM, as measured by /usr/bin/time -v. This negligible footprint makes simpledb suitable for embedded platforms, lightweight development environments, Bash scripting, cron jobs, and prototyping.
Each database table is a standard JSON file that we can parse, inspect, or modify using common command-line tools such as grep, join, sort, and jq. In this way, we follow the UNIX philosophy of “doing one thing well” by delegating more complex functionality – such as sophisticated filtering or joins – to external utilities, thereby keeping the simpledb codebase lean and easy to maintain.
In addition, it’s quite easy to convert simpledb‘s data from JSON to CSV, which makes it possible to migrate data from simpledb to other databases. We’ll see some examples of CSV export in the final Bash script.
Let’s establish the core requirements and architectural principles that underlie the design of simpledb.
Our JSON-based design strikes a balance between simplicity and human readability. We also avoid the complexity of maintaining a proprietary or binary format, preferring full compatibility with version control systems and existing Linux utilities:
In keeping with a minimalist approach, simpledb doesn’t perform extensive data validation beyond what is strictly necessary. This keeps the codebase lean and promotes faster development.
Whenever we invoke simpledb without valid parameters, or explicitly ask for the parameter syntax with –help, it displays the following instructions:
$ ./simpledb --help
Usage:
./simpledb --db-path <PATH> COMMAND [ARGS...]
Commands:
list <table>
get <table> field=value
save <table> field1=value1 [field2=value2 ...]
delete <table> field=value
Options:
--db-path <PATH> Required. Path to the database directory.
Let’s take a closer look at the commands:
Many simple and advanced examples of these commands are included in the final Bash script.
The dual behavior of save is known as upsert and is common in many databases and persistence frameworks. It simplifies record management by combining insert and update logic into a single command.
On successful operations, simpledb prints the appropriate JSON records or a success message. On failure due to malformed JSON, missing –db-path, invalid parameters, no matching records for delete or get, or other unexpected conditions, it prints an error message to stderr and exits with a non-zero status.
Each command begins by reading the contents of the relevant JSON file into memory and then parsing it using cJSON. This library provides a straightforward API for parsing and building JSON objects in C, keeping the code simple and free of excessive boilerplate.
We can use this command to install cJSON on a Debian or Ubuntu-based system:
$ sudo apt install libcjson-dev
As changes are made to the in-memory representation, simpledb stores the updated JSON. However, instead of writing directly to the final file, it first writes all changes to a temporary file and then performs an atomic rename operation. This renaming step replaces the old file with the newly written version in one go, preventing partial or corrupt writes in case of unexpected failures.
However, according to POSIX guarantees, the rename() system call is atomic only if both the source and target files reside on the same filesystem. This ensures that no intermediate state of the file is visible to other processes. Atomicity isn’t guaranteed on different mounted filesystems, networked filesystems such as NFS, or non-POSIX compliant filesystems. Simply put, simpledb is intended to run only on a UNIX-compatible filesystem such as ext4, XFS, ZFS, or Btrfs.
We’ll explore some core components of simpledb, focusing on data flow, command dispatch, JSON handling, and atomic writing.
The main() function is responsible for:
The program requires at least four command line arguments, as the program name counts as the first argument, and this is the expected format for calling it:
$ ./simpledb --db-path <PATH> command ...
This simplified diagram shows the main() execution flow in an intuitive way:
The code starts by checking for the minimum number of arguments:
[...]
if (argc < 4) {
print_usage(argv[0]);
return 1;
}
[...]
We then proceed with a loop to identify the database path and the command:
[...]
int i = 1;
for (; i < argc; i++) {
if (strcmp(argv[i], "--db-path") == 0 || strcmp(argv[i], "-d") == 0) {
if (i + 1 < argc) {
db_path = argv[++i];
continue;
} else {
fprintf(stderr, "Error: --db-path requires an argument\n");
return 1;
}
} else {
// This is likely the command
command = argv[i];
i++;
break;
}
}
[...]
The database path must exist and be a directory, which adds an extra layer of filesystem validation:
[...]
struct stat st;
if (stat(db_path, &st) != 0) {
fprintf(stderr, "Error: Database path '%s' does not exist.\n", db_path);
return 1;
}
if (!S_ISDIR(st.st_mode)) {
fprintf(stderr, "Error: '%s' is not a directory.\n", db_path);
return 1;
}
[...]
Each command has its own requirements for the number and format of arguments. For example, the get and delete commands expect exactly one additional argument of the form field=value, while the save command allows multiple field=value pairs. The command_args array stores these extra arguments so that the underlying command functions can handle them correctly:
[...]
char* command_args[MAX_COMMAND_ARGS];
int command_args_count = 0;
[...]
for (; i < argc && command_args_count < MAX_COMMAND_ARGS; i++) {
command_args[command_args_count++] = argv[i];
}
[...]
We take care not to exceed MAX_COMMAND_ARGS (number of command arguments) to avoid memory or buffer overflow problems.
In the get and delete commands, the implementation explicitly searches for the = character to split the field from its corresponding value:
[...]
} else if (strcmp(command, "get") == 0) {
// Expects: get <table> field=value
if (command_args_count != 1) {
print_usage(argv[0]);
return 1;
}
char* eq = strchr(command_args[0], '=');
if (!eq) {
fprintf(stderr, "Error: Invalid get argument '%s'. Use field=value.\n", command_args[0]);
return 1;
}
*eq = '\0';
const char* field = command_args[0];
const char* value = eq + 1;
return command_get(db_path, table_name, field, value);
[...]
This is just a taste of the main() function, which acts as a dispatcher, calling the appropriate command function or printing usage instructions if the command isn’t recognized.
Our JSON data is stored in individual files. We implement two helper routines:
In this case, the code is quite compact, so there is no need for diagrams:
/* --------------------------------------------------------------------------
* Load the JSON array from <table>.json, or create an empty JSON array if file
* doesn't exist. Return a cJSON pointer, or NULL on error.
* -------------------------------------------------------------------------- */
static cJSON* load_table(const char* db_path, const char* table_name) {
// Construct the file path: e.g., db_path/users.json
char filepath[1024];
snprintf(filepath, sizeof(filepath), "%s/%s.json", db_path, table_name);
char* content = read_file(filepath);
cJSON* root = NULL;
if (content) {
// Parse the JSON
root = cJSON_Parse(content);
free(content);
// If parse failed or root is not an array, create a new array
if (!root || !cJSON_IsArray(root)) {
if (root) {
cJSON_Delete(root);
}
root = cJSON_CreateArray();
}
} else {
// File not found or not readable; let's assume it's just empty
root = cJSON_CreateArray();
}
return root;
}
/* --------------------------------------------------------------------------
* Write the JSON array back to <table>.json (atomically).
* -------------------------------------------------------------------------- */
static int save_table(const char* db_path, const char* table_name, cJSON* root) {
if (!root) return -1;
char* print_buffer = cJSON_PrintUnformatted(root);
if (!print_buffer) {
return -1;
}
char filepath[1024];
snprintf(filepath, sizeof(filepath), "%s/%s.json", db_path, table_name);
int ret = write_file_atomic(filepath, print_buffer);
free(print_buffer);
return ret;
}
Here, write_file_atomic() ensures that data is never corrupted by partial writes:
/* --------------------------------------------------------------------------
* Utility: Write a temporary file, then rename it to ensure atomic updates.
* Returns 0 on success, non-zero on error.
* -------------------------------------------------------------------------- */
static int write_file_atomic(const char* filename, const char* data) {
// Create a temp file name
char temp_filename[1024];
snprintf(temp_filename, sizeof(temp_filename), "%s.tmp", filename);
// Write data to temp file
FILE* fp = fopen(temp_filename, "wb");
if (!fp) {
return -1;
}
size_t len = strlen(data);
if (fwrite(data, 1, len, fp) < len) {
fclose(fp);
return -1;
}
fclose(fp);
// Atomically rename the temp file to the actual file
if (rename(temp_filename, filename) != 0) {
return -1;
}
return 0;
}
This straightforward approach combines cJSON parsing with atomic writes to ensure data integrity. Unexpected failures will never corrupt the database.
So far we have seen in detail the implementation of some difficult parts of simpledb. We won’t go any further into the code, but rather look at it as a whole:
In the testing folder of our repository, we have the script test_simpledb.sh for a large comprehensive test set that includes these categories:
There are 45 self-explanatory tests. Here is an example:
[...]
echo "- Adding a new record with id=100 to 'users' in $DB1:"
$SIMPLEDB --db-path "$DB1" save users id=100 name="John Doe" age=30 [email protected]
echo "- Listing 'users' again (should show John):"
$SIMPLEDB --db-path "$DB1" list users
echo "- Updating record with id=100 (changing email):"
$SIMPLEDB --db-path "$DB1" save users id=100 [email protected]
echo "- Listing 'users' again (should show updated email):"
$SIMPLEDB --db-path "$DB1" list users
[...]
The test_simpledb.log file contains the expected output. This is a small interesting snippet:
[...]
### 9) Simulating a 'join' between tables...
- Creating an 'orders' table in testdb1...
{"id":"1","order_id":"9001","user_id":"100","product":"Red Book","price":"15.00"}
{"id":"2","order_id":"9002","user_id":"101","product":"Blue Pen","price":"2.50"}
{"id":"3","order_id":"9003","user_id":"999","product":"Green Pencil","price":"1.00"}
- Listing 'orders':
{"id":"1","order_id":"9001","user_id":"100","product":"Red Book","price":"15.00"}
{"id":"2","order_id":"9002","user_id":"101","product":"Blue Pen","price":"2.50"}
{"id":"3","order_id":"9003","user_id":"999","product":"Green Pencil","price":"1.00"}
#### 9a) Converting 'users' to CSV (id,name,email) => users.csv
"100","John Doe","[email protected]"
"101","Alpha Tester","[email protected]"
"102","Beta Tester","[email protected]"
#### 9b) Converting 'orders' to CSV (order_id,user_id,product,price) => orders.csv
"9001","100","Red Book","15.00"
"9002","101","Blue Pen","2.50"
"9003","999","Green Pencil","1.00"
#### 9c) Sort both CSV files by their key for join.
- Sorted 'users':
"100","John Doe","[email protected]"
"101","Alpha Tester","[email protected]"
"102","Beta Tester","[email protected]"
- Sorted 'orders':
"9001","100","Red Book","15.00"
"9002","101","Blue Pen","2.50"
"9003","999","Green Pencil","1.00"
#### 9d) Join by user_id (users.id == orders.user_id)
(We have to specify that for 'users' the join field is column 1, for 'orders' it's column 2)
- Result of join (joined.csv):
"100","John Doe","[email protected]","9001","Red Book","15.00"
"101","Alpha Tester","[email protected]","9002","Blue Pen","2.50"
[...]
Suppose we prefer an automated approach instead of manually inspecting the output. In that case, we can integrate our tests with a framework like Bats (Bash Automated Testing System) or a similar tool that allows us to specify the expected result for each command.
In this article, we explored the rationale, design, and implementation of a minimal command-line database tool, simpledb, which stores data in the JSON format in a Linux environment.
By taking a JSON-based approach and using standard UNIX utilities for advanced manipulation, we demonstrated how to maintain a lightweight codebase that remains both highly portable and easy to inspect. With features such as upsert-like save command behavior, atomic file writes, and minimal data validation, simpledb shines as an ideal solution for small, embedded, or scripting scenarios where traditional relational databases may be overkill.
By combining simpledb with common tools like grep, jq, and join, we unleash the power of the broader UNIX ecosystem for filtering, formatting, and data analysis.
We also saw the importance of rigorous testing with a Bash script that demonstrates how simpledb handles everything from basic record creation to more advanced tasks like simulating joins.