-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Atomic rename for INTO OUTFILE TRUNCATE is broken: temp file path never reaches initOutputFormat #101726
Description
Found via ClickGap automated review. Please close or comment if this is incorrect or needs adjustment.
Describe what's wrong
When using INTO OUTFILE ... TRUNCATE, the original file is truncated directly (O_TRUNC) instead of writing to a temp file and renaming. If the query fails mid-execution, the original file content is lost.
Root cause: ClientBase.cpp:1230-1234 modifies a local variable out_file to a temp path but never updates the AST node query_with_output->out_file. The file writing pipeline in initOutputFormat reads the path from the AST, bypassing the temp path entirely.
Why we believe this is a bug: processOrdinaryQuery (ClientBase.cpp:1230-1234) generates a temp path in local variable out_file, but does NOT update the AST. initOutputFormat (ClientBase.cpp:709-710) reads the file path from the AST (query_with_output->out_file), which still contains the original path. The file is opened with O_TRUNC (line 734-735), destroying old content immediately. performAtomicRename (line 205-210) reads the original path from AST as tmp_file and renames it to out_file_if_truncated (also the original path) — a no-op.
Affected locations:
src/Client/ClientBase.cpp:1230— processOrdinaryQuery generates temp path but does not update ASTsrc/Client/ClientBase.cpp:709— initOutputFormat reads file path from AST, getting original pathsrc/Client/ClientBase.cpp:205— performAtomicRename reads original path from AST, renames to itself (no-op)
Impact: The atomic rename feature for INTO OUTFILE TRUNCATE does not work. If a query fails after partial data has been sent, the original file content is destroyed and replaced with partial results. The feature was intended to prevent data loss on query failure.
Does it reproduce on most recent release?
Yes — confirmed on current master (commit fc7ad06dc31f).
How to reproduce
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. "$CURDIR"/../shell_config.sh
OUT_FILE="${CLICKHOUSE_TMP}/test_atomic_error_preserve.out"
rm -f "${OUT_FILE}"
${CLICKHOUSE_CLIENT} --query="SELECT 'old_content' INTO OUTFILE '${OUT_FILE}' FORMAT RawBLOB"
echo "before_error: $(cat ${OUT_FILE})"
${CLICKHOUSE_CLIENT} --query="SELECT throwIf(number >= 1, 'forced error'), number FROM numbers(10) INTO OUTFILE '${OUT_FILE}' TRUNCATE FORMAT TSV SETTINGS max_block_size=1" 2>/dev/null
CONTENT=$(cat "${OUT_FILE}" 2>/dev/null)
if [ "$CONTENT" = "old_content" ]; then
echo "after_error: old_content_preserved"
else
echo "after_error: content_changed_to: $(cat ${OUT_FILE} 2>/dev/null | head -c 200)"
fi
rm -f "${OUT_FILE}"Expected behavior
before_error: old_content
after_error: old_content_preserved
Error message and/or stacktrace
before_error: old_content
after_error: content_changed_to: 0 0
Additional context
Open risks:
- The PR's own test (03362) only tests the happy path where queries succeed, so it cannot detect this bug
Suggested fix: Either update the AST node (query_with_output->out_file) to point to the temp file path so initOutputFormat writes to it, or pass the temp file path to initOutputFormat through a different mechanism (e.g., a member variable on ClientBase).
Analysis details: Confidence HIGH | Severity P1 | Testability: STATELESS_SQL
Found during automated review of PR #77181.
ClickGapAI · Confidence: HIGH · Severity: P1 · Finding: h_pr77181_001