Skip to content

[BugFix] Handle parquet exception caused by wrong s3 configuration#23606

Merged
stephen-shelby merged 4 commits intoStarRocks:mainfrom
letian-jiang:fix-s3
May 18, 2023
Merged

[BugFix] Handle parquet exception caused by wrong s3 configuration#23606
stephen-shelby merged 4 commits intoStarRocks:mainfrom
letian-jiang:fix-s3

Conversation

@letian-jiang
Copy link
Copy Markdown
Contributor

@letian-jiang letian-jiang commented May 17, 2023

Problem Summary:

Fixes # (issue)

Using a wrong s3 configuration (ak/sk/endpoint) will not be checked until the parquet writer tries to flush buffered data into the output stream. In this case, BufferedPageWriter.Close() throws an exception, which should be handled.

Moreover, the deconstructor of ParquetFileWriter causes SIGSEGV. We have to mark the file writer as closed and release its ownership to avoid calling its dtor.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto backported to target branch
    • 3.0
    • 2.5
    • 2.4
    • 2.3

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
Smith-Cruise
Smith-Cruise previously approved these changes May 18, 2023
stephen-shelby
stephen-shelby previously approved these changes May 18, 2023
_writer.release();

auto st = Status::IOError(fmt::format("{}: {}", "flush rowgroup error", e.what()));
LOG(WARNING) << st;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add _chunk_writer.reset(); before return?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
@letian-jiang letian-jiang dismissed stale reviews from stephen-shelby and Smith-Cruise via 7fd2491 May 18, 2023 06:34
@stephen-shelby stephen-shelby merged commit 672e3f5 into StarRocks:main May 18, 2023
Moonm3n pushed a commit to Moonm3n/starrocks that referenced this pull request May 23, 2023
…tarRocks#23606)

Using a wrong s3 configuration (ak/sk/endpoint) will not be checked
until the parquet writer tries to flush buffered data into the output
stream. In this case, `BufferedPageWriter.Close()` throws an exception,
which should be handled.

Moreover, the deconstructor of `ParquetFileWriter` causes SIGSEGV. We
have to mark the file writer as closed and release its ownership to
avoid calling its dtor.

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
Signed-off-by: Moonm3n <saxonzhan@gmail.com>
wxl24life pushed a commit to wxl24life/starrocks that referenced this pull request May 25, 2023
…tarRocks#23606)

Using a wrong s3 configuration (ak/sk/endpoint) will not be checked
until the parquet writer tries to flush buffered data into the output
stream. In this case, `BufferedPageWriter.Close()` throws an exception,
which should be handled.

Moreover, the deconstructor of `ParquetFileWriter` causes SIGSEGV. We
have to mark the file writer as closed and release its ownership to
avoid calling its dtor.

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
abc982627271 pushed a commit to abc982627271/starrocks that referenced this pull request Jun 5, 2023
…tarRocks#23606)

Using a wrong s3 configuration (ak/sk/endpoint) will not be checked
until the parquet writer tries to flush buffered data into the output
stream. In this case, `BufferedPageWriter.Close()` throws an exception,
which should be handled.

Moreover, the deconstructor of `ParquetFileWriter` causes SIGSEGV. We
have to mark the file writer as closed and release its ownership to
avoid calling its dtor.

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
dirtysalt pushed a commit that referenced this pull request Jun 20, 2023
Fixes #23606, where memory leaks are introduced by
`_chunk_writer->release()`

The modification on Arrow is to make sure
`RowGroupSerializer::column_writers_` is cleaned up even if exception
throws.

You may refer apache/arrow#35520 for more
details.

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
southernriver pushed a commit to southernriver/starrocks that referenced this pull request Nov 28, 2023
…tarRocks#23606)

Using a wrong s3 configuration (ak/sk/endpoint) will not be checked
until the parquet writer tries to flush buffered data into the output
stream. In this case, `BufferedPageWriter.Close()` throws an exception,
which should be handled.

Moreover, the deconstructor of `ParquetFileWriter` causes SIGSEGV. We
have to mark the file writer as closed and release its ownership to
avoid calling its dtor.

Signed-off-by: Letian Jiang <letian.jiang@outlook.com>
(cherry picked from commit 672e3f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants