fix flaky 03172_error_log_table_not_empty#66093
Conversation
|
EDIT: Already clarified the first point in #65604 (comment). THe second paragraph still holds.
Regarding removing the |
That is the line when the read from
|
It's not that the test only needs to wait for the data to be flushed. It needs to wait so that the data is collected as well. If you don't wait, no data will be collected. Forcing a flush doesn't collect the data, since it always relies on periodic intervals for that. |
Please define what you mean by collecting data? |
|
This is an automated comment for commit 78a2139 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
Collect the errors. This method is called by a background thread periodically. |
Oh, that is a problem. |
| errors_333=$($CLICKHOUSE_CLIENT -q "SELECT sum(value) FROM system.error_log WHERE code = 333") | ||
|
|
||
| # Throw three random errors: 111, 222 and 333 and wait for more than collect_interval_milliseconds to ensure system.error_log is flushed | ||
| # Throw three random errors: 111, 222 and 333 and call flush logs to ensure system.error_log is flushed |
There was a problem hiding this comment.
| # Throw three random errors: 111, 222 and 333 and call flush logs to ensure system.error_log is flushed | |
| # Throw three random errors: 111, 222 and 333, wait for more than collect_time_interval_milliseconds to ensure the errors are collected and then flush logs to ensure system.error_log is flushed |
| SELECT throwIf(true, 'error_log', toInt16(111)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 111 } | ||
| SELECT throwIf(true, 'error_log', toInt16(222)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 222 } | ||
| SELECT throwIf(true, 'error_log', toInt16(333)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 333 } | ||
| SELECT sleep(2) format NULL; |
There was a problem hiding this comment.
We need this. Without it, no errors are collected and tests fail close to 100% of the time:
+++ /home/ubuntu/ClickHouse/ClickHouse/tests/queries/0_stateless/03172_error_log_table_not_empty.stdout 2024-07-04 16:29:37.956423656 +0000
@@ -1,6 +1,6 @@
-1
-1
-1
-1
-1
-1
+0
+0
+0
+0
+0
+0| SELECT throwIf(true, 'error_log', toInt16(111)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 111 } | ||
| SELECT throwIf(true, 'error_log', toInt16(222)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 222 } | ||
| SELECT throwIf(true, 'error_log', toInt16(333)) SETTINGS allow_custom_error_code_in_throwif=1; -- { serverError 333 } | ||
| SELECT sleep(2) format NULL; |
There was a problem hiding this comment.
Same for this one. We actually need it.
system.error_log was developed based on system.metric_log which does the same thing. Both are periodic collectors that aggregate the metrics for the defined time window interval. They're different by nature to most other system logs. It doesn't make sense to collect and flush data during a |
|
I have restored timeouts. And marked the test as |
It took me a while to understand, you are right here. |
Here is the details
#65604 (comment)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):