Conversation
|
Some installation instructions are needed. |
matsduf
left a comment
There was a problem hiding this comment.
This PR is still marked as "draft". Is it really ready for installation?
There are other changes in the code that seems to be unrelated to Clickhouse.
- I find no installation instructions. Zonemaster-Backend installation document needs to be updated.
- Also the Zonemaster-Backend configuration document needs to be updated.
| (SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_critical, | ||
| (SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_error, | ||
| (SELECT count(*) FROM result_entries JOIN test_results ON result_entries.hash_id = test_results.hash_id AND level = ?) AS nb_warning, |
There was a problem hiding this comment.
Are these changes related to the addition of support of Clickhouse engine?
There are other changes in this module where it is not obvious that they are related to the Clickhouse change.
There was a problem hiding this comment.
I moved the changes to DB/Clickhouse.pm to avoid updating any logic related to the other engines. So this PR can stand by itself an be an experiment that can be reverted.
There was a problem hiding this comment.
Thanks. That makes it easier to review. I think it looks fine, but Travis does not. Some kind of documentation is needed (installation and configuration).
Yes. I marked it as "ready for review" and updated the description. |
tgreenx
left a comment
There was a problem hiding this comment.
Code looks fine.
I tried to test, and I successfully installed using the instructions in zonemaster/zonemaster#1228.
2023-12-07T16:27:29Z [622] [NOTICE] [Zonemaster::Backend::Config] Loading config: /home/tgreen/zonemaster/zonemaster-backend/backend_config.ini
2023-12-07T16:27:29Z [626] [NOTICE] [main] Daemon spawned
2023-12-07T16:27:30Z [626] [NOTICE] [Zonemaster::Backend::DB] Connecting to database 'DBI:mysql:database=zonemaster;host=127.0.0.1;port=9004' as user 'zonemaster'
However, as long as #1132 is not merged I can't go through with further testing:
$ ./script/zmtest zonemaster.net
error: method start_domain_test: 500 Internal Server Error at ./script/zmb line 678.
|
Release testing report – Success, no issues Rocky Linux 9.3/Clickhouse Merged For the unit tests, the procedure was incomplete. Unit tests needed a |
tgreenx
left a comment
There was a problem hiding this comment.
I trust the testing from @marc-vanderwal, otherwise I have reviewed the rest and it looks good to me.
Connections to Clickhouse are made through its MySQL interface. Clickhouse mostly supports standard SQL, however sometimes there are some deviations (for instance to UPDATE a row, or no native support for incremental indexes).
* Make UPDATE synchronous by setting `mutations_sync = 1` within the query, see <https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync> <https://clickhouse.com/docs/en/sql-reference/statements/alter#synchronicity-of-alter-queries> * Manually compute the number of affected rows, because this number is incorrect in Clickhouse and this is a feature, see <ClickHouse/ClickHouse#50970 (comment)>
Clickhouse "batch_id" column is a non-nullable UInt32 to avoid performance issue with nullable column, see <https://clickhouse.com/docs/en/sql-reference/data-types/nullable>. In comparaison the other database engines store an empty or NULL value when no "batch_id" is defined and an integer otherwise. +-------------+-------------+ | no batch_id | batch_id | +------------------+-------------+-------------+ | Clickhouse | 0 | UInt32 >= 1 | | Other DB engines | empty/NULL | integer | +------------------+-------------+-------------+
Clickhouse does not have a UNIQUE constraint per column. Therefore the check for user uniqueness is performed manually.
There is no COMMIT with Clickhouse. However there is no restriction either on the number of bind parameters. Therefore we can pass all domains in one pass via bind parameters. This has been successfully tested with a batch made of 2 millions domains (with Clickhouse only).
To avoid potential memory exhaustion.
|
I’m currently testing #1121 with the experimental Clickhouse database engine still enabled and it seems that I’m hitting issues related to the way Clickhouse implements ALTER TABLE … UPDATE that don’t make me feel comfortable at all with using a Clickhouse database backend in production. For example, I get warnings like the following: That line 362 is the equality comparison of sub test_state {
my ( $self, $test_id ) = @_;
my ( $progress ) = $self->dbh->selectrow_array(
q[
SELECT progress
FROM test_results
WHERE hash_id = ?
],
undef,
$test_id,
);
if ( !defined $progress ) {
die Zonemaster::Backend::Error::Internal->new( reason => 'job not found' );
}
if ( $progress == 0 ) {
return $TEST_WAITING;
}
elsif ( 0 < $progress && $progress < 100 ) {
return $TEST_RUNNING;
}
elsif ( $progress == 100 ) {
return $TEST_COMPLETED;
}
else {
die Zonemaster::Backend::Error::Internal->new( reason => 'state could not be determined' );
}
}But these “argument isn’t numeric” errors should in theory never happen, because the code fetches exactly one row and column and the datum is, according to the database schema, an integer. How do we end up with a string? Grepping Due to this, many tests either crash with “state could not be determined” or “illegal transition to COMPLETED”. It seems that Clickhouse doesn’t implement proper transaction isolation when updating a table. Yet, currently, the code does many updates to |
Not ready for inclusion yet, not even experimental.
More changes are required, in particular the way the queue is currently done in Backend. It should be separated and take into account the specifies of such a DBMS.
Purpose
Clickhouse is a vertical database engine that appears to be quite efficient when it comes to run Zonemaster with millions of domains
Context
Changes
How to test this PR
Setup a Clickhouse server
zonemasterdatabase and azonemasteruser:Configure and use Zonemaster-Backend
Adapt the
share/backend_config.inifile and usezmborzmtest.You could check any results by running some SQL commands with
clickhouse-client.Run the unit tests
Unit test should work when using Clickhouse.