Fix usage of wait_io_or_timeout from TLS layers#17
Merged
alexey-milovidov merged 1 commit intoClickHouse:3.1from Jul 1, 2024
Merged
Fix usage of wait_io_or_timeout from TLS layers#17alexey-milovidov merged 1 commit intoClickHouse:3.1from
alexey-milovidov merged 1 commit intoClickHouse:3.1from
Conversation
wait_io_or_timeout() accepts milliseconds, while options has seconds, since this is just plain MYSQL_OPT_READ_TIMEOUT/... The one that has milliseconds are pvio->timeout[PVIO_*_TIMEOUT], so use them. Usually this is not a problem, but, in case of interrupt (i.e. signal - EINTR) SSL_read() will return SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE, and then wait_io_or_timeout() will be called, and timeout will be wrong, may cause a failure.
This was referenced Jul 1, 2024
Merged
azat
added a commit
to azat/ClickHouse
that referenced
this pull request
Jul 1, 2024
Occasionally, 02479_mysql_connect_to_self fails on CI [1]. [1]: ClickHouse#50911 The problem was indeed query profiler and EINTR, but not in a way you may think. For such failures you may see the following trace in trace_log: contrib/openssl/crypto/bio/bss_sock.c:127::sock_read contrib/openssl/crypto/bio/bio_meth.c:121::bread_conv contrib/openssl/crypto/bio/bio_lib.c:285::bio_read_intern contrib/openssl/crypto/bio/bio_lib.c:311::BIO_read contrib/openssl/ssl/record/methods/tls_common.c:398::tls_default_read_n contrib/openssl/ssl/record/methods/tls_common.c:575::tls_get_more_records contrib/openssl/ssl/record/methods/tls_common.c:1122::tls_read_record contrib/openssl/ssl/record/rec_layer_s3.c:645::ssl3_read_bytes contrib/openssl/ssl/s3_lib.c:4527::ssl3_read_internal contrib/openssl/ssl/s3_lib.c:4551::ssl3_read contrib/openssl/ssl/ssl_lib.c:2343::ssl_read_internal contrib/openssl/ssl/ssl_lib.c:2357::SSL_read contrib/mariadb-connector-c/libmariadb/secure/openssl.c:729::ma_tls_read contrib/mariadb-connector-c/libmariadb/ma_tls.c:90::ma_pvio_tls_read contrib/mariadb-connector-c/libmariadb/ma_pvio.c:250::ma_pvio_read contrib/mariadb-connector-c/libmariadb/ma_pvio.c:297::ma_pvio_cache_read contrib/mariadb-connector-c/libmariadb/ma_net.c:373::ma_real_read contrib/mariadb-connector-c/libmariadb/ma_net.c:427::ma_net_read contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:192::ma_net_safe_read contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2138::mthd_my_read_query_result contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2212::mysql_real_query src/Common/mysqlxx/Query.cpp:56::mysqlxx::Query::executeImpl() src/Common/mysqlxx/Query.cpp:73::mysqlxx::Query::use() src/Processors/Sources/MySQLSource.cpp:50::DB::MySQLSource::Connection::Connection() After which the connection will fail with: Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION) But, if you will take a look at ma_tls_read() you will see that it has proper retries for SSL_ERROR_WANT_READ (and EINTR is just a special case of it), but still, for some reason it fails. And the reason is the units of the read/write timeout, ma_tls_read() calls poll(read_timeout) in case of SSL_ERROR_WANT_READ, but, it incorrectly assume that the timeout is in milliseconds, but that timeout was in seconds, this bug had been fixed in [2], and now it works like a charm! [2]: ClickHouse/mariadb-connector-c#17 I've verified it with patching openssl library: diff --git a/crypto/bio/bss_sock.c b/crypto/bio/bss_sock.c index 82f7be85ae..3d2f3926a0 100644 --- a/crypto/bio/bss_sock.c +++ b/crypto/bio/bss_sock.c @@ -124,7 +124,24 @@ static int sock_read(BIO *b, char *out, int outl) ret = ktls_read_record(b->num, out, outl); else # endif - ret = readsocket(b->num, out, outl); + { + static int i = 0; + static int j = 0; + if (!(++j % 5)) + { + fprintf(stderr, "sock_read: inject EAGAIN with ret=0\n"); + ret = 0; + errno = EAGAIN; + } + else if (!(++i % 3)) + { + fprintf(stderr, "sock_read: inject EAGAIN with ret=-1\n"); + ret = -1; + errno = EAGAIN; + } + else + ret = readsocket(b->num, out, outl); + } BIO_clear_retry_flags(b); if (ret <= 0) { if (BIO_sock_should_retry(ret)) And before this patch (well, not the patch itself, but the referenced patch in mariadb-connector-c) if you will pass read_write_timeout=1 it will fail: SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1) Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION) But after, it always works: $ ch benchmark -c10 -q "SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connection_pool_size=1, connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1)" ^CStopping launch of queries. SIGINT received. Queries executed: 478. localhost:9000, queries: 478, QPS: 120.171, RPS: 120.171, MiB/s: 0.001, result RPS: 120.171, result MiB/s: 0.001. 0.000% 0.014 sec. 10.000% 0.058 sec. 20.000% 0.065 sec. 30.000% 0.073 sec. 40.000% 0.079 sec. 50.000% 0.087 sec. 60.000% 0.089 sec. 70.000% 0.091 sec. 80.000% 0.095 sec. 90.000% 0.100 sec. 95.000% 0.102 sec. 99.000% 0.105 sec. 99.900% 0.140 sec. 99.990% 0.140 sec. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
wait_io_or_timeout() accepts milliseconds, while options has seconds, since this is just plain MYSQL_OPT_READ_TIMEOUT/...
The one that has milliseconds are pvio->timeout[PVIO_*_TIMEOUT], so use them.
Usually this is not a problem, but, in case of interrupt (i.e. signal - EINTR) SSL_read() will return SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE, and then wait_io_or_timeout() will be called, and timeout will be wrong, may cause a failure.
P.S. this will fix
02479_mysql_connect_to_selffailure, I will write better description in the PR updating submodules in ClickHouse repo.Upstream PR: mariadb-corporation#250