DepState is a tool for detecting synchronization failure bugs in distributed database management systems.
The artifact consists of the following contents:
One folder containing the tool binaries, binary file after code instrumentation (NDB), and the system configuration file in the case of an NDB cluster (DepState). Please note that I compressed the mysqld file after staking due to size limitations, so please unzip it before using it. Three folders containing the tool's coverage metadata for the three databases under test (ndb_log, mariadb_log, innodb_log). The readme.md describes how to use the tool to test the databases, including the corresponding configuration. The errors detected by DepState are described in bug, It contains the log file for the bug. Since we are testing in a distributed scenario, we also learned a file that helps you build a mysql NDB cluster as a test environment via docker's container (NDB cluster-container.md).
| # | DDBMSs | Bug Type | The Root Cause Analysis | Bug Status |
|---|---|---|---|---|
| 1 | MySQL NDB Cluster | Crash | Engine mishandles metadata synchronization and locking. | Confirmed |
| 2 | MySQL NDB Cluster | Crash | Concurrent cluster restart during node reboot causes state inconsistency. | Confirmed |
| 3 | MySQL NDB Cluster | Crash | Improper lock handling during node removal in synchronization. | Confirmed |
| 4 | MySQL NDB Cluster | Crash | Internal error occurs during signal transmission in nodes. | Confirmed |
| 5 | MySQL NDB Cluster | Crash | Improper metadata lock handling during synchronization with complex dependencies. | Confirmed |
| 6 | MySQL NDB Cluster | Crash | FindTable function failure during BACKUP. | Investigating |
| 7 | MySQL NDB Cluster | Crash | SimulatedBlock component's signal processing fails. | Confirmed |
| 8 | MySQL NDB Cluster | Crash | Data Check fails, the specified table or table pointer could not be found. | Investigating |
| 9 | MySQL NDB Cluster | Crash | SUMA bucket switch failure during asynchronous event processing. | Confirmed |
| 10 | MySQL NDB Cluster | Crash | Forced shutdown-induced signal processing error caused cascade node restarts. | Investigating |
| 11 | MySQL NDB Cluster | Crash | Some operations are not supported when synchronizing complex SQL queries. | Investigating |
| 12 | MySQL NDB Cluster | Hang | Timeout mechanism failure in NDB Cluster during complex query execution. | Confirmed |
| 13 | MySQL NDB Cluster | Hang | Failure in query plan generation and optimization when handling complex nested queries. | Confirmed |
| 14 | MySQL NDB Cluster | Hang | Transaction optimization enters an infinite loop. | Confirmed |
| 15 | MySQL NDB Cluster | Hang | ID allocation failure disrupts synchronization during node rejoin. | Confirmed |
| 16 | MySQL NDB Cluster | Hang | Synchronization fails during complex query processing. | Confirmed |
| 17 | MySQL NDB Cluster | Inconsistency | Failure to send synchronization signal in function. | Investigating |
| 18 | MySQL NDB Cluster | Inconsistency | Error occurred updating automatic index statistics. | Investigating |
| 19 | MySQL InnoDB Cluster | Crash | Incompatible data types cause synchronization to fail. | Investigating |
| 20 | MySQL InnoDB Cluster | Crash | A data type conversion error after a network connection failure causes the node to exit. | Investigating |
| 21 | MariaDB Galera Cluster | Crash | Missing records and duplicate key conflicts in delete and update operations. | Investigating |
| 22 | MariaDB Galera Cluster | Inconsistency | Data type mismatches due to invalid default values are ignored by WSREP. | Investigating |
We conducted all experiments on a 64-bit Ubuntu 22.04 machine equipped with an AMD EPYC 7742 processor (128 cores @ 2.25 GHz) and 488 GiB of main memory.
A distributed cluster consisting of one node as manager, four nodes as data nodes, and four nodes as sql clients. Analogous to a mysql ndb cluster, there is one ndb_mgm, four ndbd, and four mysqld, where ndbd and mysqld must be the same node setup, as configured in (NDB cluster-container.md).
This step builds on the fact that you have built a cluster that meets the test conditions, using NDB as an example:
First, on all nodes, replace mysql, mysqld, mysqlad, mysqladmin, ndbd, ndb_mgm, ndb_mgmd in the existing environment with the ones in the DepState folder, and make sure that these files have executable permissions.The DepState folder contains the files after we staked them.
Then, on all nodes, copy log_file.txt from the DepState folder to the /home/mine-code/ directory. If you don't have this path, please create one
docker cp DepState/log_file.txt <containerID>:/home/mine-code/Subsequently, copy the control-main file from the DepState folder to the /home directory of the manager node and the node-main file to the /home directory of the data node.
docker cp DepState/control-main <containerID>:/home/mine-code/
docker cp DepState/node-main <containerID>:/home/mine-code/
...Then, please replace the corresponding staked binaries into the corresponding directory of the node. manager needs to replace ndb_mgm and ndb_mgmd, and data node needs to replace ndbd, mysql, mysqladmin and mysqld.
docker cp DepState/ndb_mgm <containerID>:/bin/ndb_mgm
docker cp DepState/ndb_mgmd <containerID>:/bin/ndb_mgmd
docker cp DepState/ndbd <containerID>:/bin/ndbd
docker cp DepState/mysql <containerID>:/bin/mysql
docker cp DepState/mysqladmin <containerID>:/bin/mysqladmin
docker cp DepState/mysqld <containerID>:/bin/mysqld
...Finally, please create a mytest database in the cluster, an empty one is fine.
create database mytest;Then you can run DepState. Run node-main on the four data nodes, then control-main on the manager node, and you're ready to start testing.
On data node:
./node-main <outfile> <dryrunflag> <timeout> <sql_max> <dbname> <SERVER_PORT>
such as:
./node-main /home/mine-code/output_v32-11 1 90 8 mytest103 101
./node-main /home/mine-code/output_v32-11 0 90 8 mytest103 101
./node-main /home/mine-code/output_v32-11 0 90 8 mytest103 101
./node-main /home/mine-code/output_v32-11 0 90 8 mytest103 101
-outfile:
Path to the result.
-dryrunflag:
Flag if the current node needs to be responsible for database initialization.
-timeout:
Timeout setting for one sql statement.
-sql_max:
How many sql sequences are included in a sql scenario.
-dbname:
The name of the database to be tested.
-SERVER_PORT:
sql service port number.Only the first node, which needs to be responsible for table creation, so dryrunflag is 1 and the rest is 0
On manager node:
./control-main <max_test> <outfile> <max_timeout> <state> <row_or_table_len> <ip1> <ip2> <ip3> <ip4> <SERVER_PORT> <dbname> <time_num>
such as:
./control-main 10 /home/mine-code/output_v32-11 5 10 10 192.172.10.9 192.172.10.10 192.172.10.11 192.172.10.12 101 mytest103 10
-max_test:
How many tests to run in one run.
-outfile:
Path to the results.
-max_timeout:
The number of execution failures that can be tolerated, 10 tests have to restart the mysql client if there are PROBE times.
-state:
number of tests in each round of testing, how many state sequence mutations to start for the next sql scenario after the state sequence mutation.
-row_or_table_len:
For each sql statement, how probable is the table level and how much is the column level.
-SERVER_IP_1:
IP address of the 1st data node.
-SERVER_IP_2:
IP address of the 2nd data node.
-SERVER_IP_3:
IP address of the 3rd data node.
-SERVER_IP_4:
IP address of the 4th data node.
-port:
SQL service port number.
-dbname:
Database name to be tested.
-time_num:
Maximum loop variable.The test results are stored in the output file.