Skip to content

Make ClickHouse behaviour on multiple read from pipe more understandable#25960

Merged
alexey-milovidov merged 23 commits intoClickHouse:masterfrom
BoloniniD:pipe_reading
Jul 24, 2021
Merged

Make ClickHouse behaviour on multiple read from pipe more understandable#25960
alexey-milovidov merged 23 commits intoClickHouse:masterfrom
BoloniniD:pipe_reading

Conversation

@BoloniniD
Copy link
Copy Markdown
Contributor

@BoloniniD BoloniniD commented Jul 3, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
If file descriptor in File table is regular file - allow to read multiple times from it. It allows clickhouse-local to read multiple times from stdin (with multiple SELECT queries or subqueries) if stdin is a regular file like clickhouse-local --query "SELECT * FROM table UNION ALL SELECT * FROM table" ... < file. This closes #11124. Co-authored with @alexey-milovidov.

@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Jul 3, 2021
@BoloniniD BoloniniD changed the title First changes, nothing special Make ClickHouse behaviour on multiple read from pipe more understandable Jul 4, 2021
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test, smth like:

clickhouse local --structure 'key String' -q 'select * from table; select * from table;' <<<foo # fails with pipeline error

echo foo > /tmp/foo
./clickhouse local --structure 'key String' -q 'select * from table; select * from table;' --file /tmp/foo # ok

mkfifo /tmp/pipe
echo foo > /tmp/pipe &
./clickhouse local --structure 'key String' -q 'select * from table; select * from table;' --file /tmp/pipe # hangs because it opens via path, and it will hang for the second open of pipe, since there is no write endpoint.

Also right now the following does not works, while it should:

./clickhouse local --structure 'key String' -q 'select * from table; select * from table;' --file <(echo foo)

And the following does not work because of additional check in ITableFunctionFileLike::parseArguments

./clickhouse local --stacktrace -q "select * from file(0, 'TSV', 'key String');" <<<foo

@BoloniniD
Copy link
Copy Markdown
Contributor Author

Tried to reproduce the problem with

./clickhouse local --stacktrace -q "select * from file(0, 'TSV', 'key String');" <<<foo

As it turns out ITableFunctionFileLike::parseArguments fails to represent 0 in the arguments for the file as a String. If I use '0' instead, then ITableFunctionFileLike::parseArguments works fine. Is it really necessary to change the .safeGet<String>() check? @azat

@azat
Copy link
Copy Markdown
Member

azat commented Jul 8, 2021

As it turns out ITableFunctionFileLike::parseArguments fails to represent 0 in the arguments for the file as a String.

Right.

If I use '0' instead, then ITableFunctionFileLike::parseArguments works fine. Is it really necessary to change the .safeGet() check? @azat

Actually thinking about this a little bit more, and it should not accept custom fd, since this a security vulnerability (since any user may read from any file that the process keeps open)

Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please don't miss this comment - #25960 (everything except for File by numeric fd)

BoloniniD and others added 2 commits July 13, 2021 14:08
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
@BoloniniD BoloniniD force-pushed the pipe_reading branch 3 times, most recently from 46787fa to a34c23b Compare July 15, 2021 11:35
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh

SAMPLE_FILE="$CURDIR/01947_sample_data.csv"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it uses one file, it is not possible to run flaky check (runs tests in parallel multiple times) - https://clickhouse-test-reports.s3.yandex.net/25960/bab7681bc18fbaf7df49737ca72cfe70756a5f62/functional_stateless_tests_flaky_check_(address)/test_run.txt.out.log
To fix this, you can simply use mktemp, i.e. mktemp --tmpdir "$CURDIR" 01947_multiple_pipe_read_saple_data_XXXXXX.sh (completely untested, but you can check it using clickhouse-test --test-runs 100 01947_multiple_pipe_read. --jobs 8)

. "$CURDIR"/../shell_config.sh

SAMPLE_FILE="$CURDIR/01947_sample_data.csv"
STD_ERROR_CAPTURED="$CURDIR/01947_std_error_captured.log"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

else
{
// else if fd is not a regular file, then throw an Exception
throw Exception("Cannot read from a pipeline twice", ErrorCodes::CANNOT_READ_FROM_FILE_DESCRIPTOR);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do you call this pipeline not pipe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed error message


echo '******************'
echo 'Attempt to read twice from a pipeline'
${CLICKHOUSE_LOCAL} --structure 'key String' -q 'select * from table; select * from table;' <<<foo
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it is better not to use this in tests, since the behavior is different in different versions on bash:

$ echo -n "$BASH_VERSION: "; test -p /dev/fd/0 <<<foo && echo pipe || echo regular
5.1.8(1)-release: pipe
$ docker run --rm -it ubuntu:20.04 bash -c 'echo -n "$BASH_VERSION: "; test -p /dev/fd/0 <<<foo && echo pipe || echo regular'
5.0.17(1)-release: regular

You already have test with pipe (echo 1 | clickhouse-local), this is enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that test

@azat
Copy link
Copy Markdown
Member

azat commented Jul 17, 2021

@BoloniniD BoloniniD marked this pull request as ready for review July 17, 2021 14:50
@alexey-milovidov alexey-milovidov self-assigned this Jul 18, 2021
@BoloniniD BoloniniD marked this pull request as draft July 18, 2021 12:55
@BoloniniD BoloniniD marked this pull request as ready for review July 24, 2021 17:28
alexey-milovidov added a commit that referenced this pull request Jul 24, 2021
@alexey-milovidov alexey-milovidov merged commit 456801c into ClickHouse:master Jul 24, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

Merged in #26777.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clickhouse-local freezes on multiple read from pipe

4 participants