-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](Load) Use deferred file format properties for broker load without format (#55450) #55498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… load without format (apache#55450) User may not specify data format in broker load, so we can only infer the data format after listing the files. So we have to defer the initialization of file properties object --------- Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses an issue in broker load functionality where users may not specify a data format, requiring the system to infer the format after listing files. The PR introduces a deferred initialization pattern for file format properties to handle this scenario.
Key changes include:
- Introduction of
DeferredFileFormatPropertiesclass for delayed format property initialization - Updates to data description classes to use deferred properties when format is not specified
- MySQL load command refactoring to remove duplicate methods and fix typos
- Enhanced test coverage for HDFS load with default file format detection
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
hdfs_load_default_file_format.groovy |
New regression test for HDFS load with automatic file format detection |
LoadCommandTest.java |
Updated tests to handle DeferredFileFormatProperties |
MysqlLoadCommand.java |
Fixed typo in method name (handleMysqlLoadComand → handleMysqlLoadCommand) |
MysqlDataDescription.java |
Updated to use deferred file format properties |
LogicalPlanBuilder.java |
Explicitly sets CSV format for MySQL load |
NereidsDataDescription.java |
Updated to use deferred file format properties |
MysqlLoadManager.java |
Removed duplicate methods and refactored to use CSV properties directly |
BrokerLoadPendingTask.java |
Added logic to initialize deferred properties after listing files |
BrokerFileGroup.java |
Added method to initialize deferred properties based on file status |
FileFormatProperties.java |
Added factory method for creating deferred properties |
DeferredFileFormatProperties.java |
New wrapper class for deferred initialization of file format properties |
DataDescription.java |
Updated to use deferred file format properties |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // MySQL load only support csv, set it explicitly | ||
| properties.put("format", "csv"); |
Copilot
AI
Aug 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is calling put() on an ImmutableMap which will throw an UnsupportedOperationException. Since properties is created as ImmutableMap.of() on line 7848, it cannot be modified. Consider creating a mutable map or using a builder pattern.
| public boolean isBinaryFileFormat() { | ||
| // Must call initDeferredFileFormatPropertiesIfNecessary before |
Copilot
AI
Aug 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method assumes initDeferredFileFormatPropertiesIfNecessary has been called but doesn't verify it. If fileFormatProperties is still a DeferredFileFormatProperties instance that hasn't been initialized, this will incorrectly return false. Consider checking if it's a DeferredFileFormatProperties and either throw an exception or delegate to the actual implementation.
| public boolean isBinaryFileFormat() { | |
| // Must call initDeferredFileFormatPropertiesIfNecessary before | |
| // Defensive: check for uninitialized DeferredFileFormatProperties | |
| if (fileFormatProperties instanceof DeferredFileFormatProperties) { | |
| throw new IllegalStateException("DeferredFileFormatProperties must be initialized before calling isBinaryFileFormat()"); | |
| } |
| delegate = FileFormatProperties.createFileFormatProperties(this.formatName); | ||
| delegate.analyzeFileFormatProperties(origProperties, false); |
Copilot
AI
Aug 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createFileFormatProperties method can throw an AnalysisException, but deferInit doesn't declare this exception in its signature. This will cause a compilation error. The method signature should include throws AnalysisException.
|
run buildall |
TPC-H: Total hot run time: 34144 ms |
TPC-DS: Total hot run time: 187798 ms |
ClickBench: Total hot run time: 32.91 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 34193 ms |
TPC-DS: Total hot run time: 186656 ms |
ClickBench: Total hot run time: 32.58 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Related PR: #55498 Problem Summary: When importing a file via S3 load, if the file does not exist on S3, the load will report an error: label has already been used. 2025-09-22 23:00:57,177 WARN (pending-load-task-scheduler-pool-0|483) [LoadTask.exec():110] LOAD_JOB=1758553246283, error_msg={Unexpected failed to execute load task} java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:499) ~[guava-33.2.1-jre.jar:?] at org.apache.doris.nereids.load.NereidsLoadingTaskPlanner.plan(NereidsLoadingTaskPlanner.java:146) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadLoadingTask.init(LoadLoadingTask.java:138) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createTask(BrokerLoadJob.java:274) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createLoadingTask(BrokerLoadJob.java:311) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onPendingTaskFinished(BrokerLoadJob.java:204) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onTaskFinished(BrokerLoadJob.java:163) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadTask.exec(LoadTask.java:102) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.task.MasterTask.run(MasterTask.java:31) ~[doris-fe.jar:1.2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?]
Related PR: #55498 Problem Summary: When importing a file via S3 load, if the file does not exist on S3, the load will report an error: label has already been used. 2025-09-22 23:00:57,177 WARN (pending-load-task-scheduler-pool-0|483) [LoadTask.exec():110] LOAD_JOB=1758553246283, error_msg={Unexpected failed to execute load task} java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:499) ~[guava-33.2.1-jre.jar:?] at org.apache.doris.nereids.load.NereidsLoadingTaskPlanner.plan(NereidsLoadingTaskPlanner.java:146) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadLoadingTask.init(LoadLoadingTask.java:138) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createTask(BrokerLoadJob.java:274) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createLoadingTask(BrokerLoadJob.java:311) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onPendingTaskFinished(BrokerLoadJob.java:204) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onTaskFinished(BrokerLoadJob.java:163) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadTask.exec(LoadTask.java:102) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.task.MasterTask.run(MasterTask.java:31) ~[doris-fe.jar:1.2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?]
Related PR: #55498 Problem Summary: When importing a file via S3 load, if the file does not exist on S3, the load will report an error: label has already been used. 2025-09-22 23:00:57,177 WARN (pending-load-task-scheduler-pool-0|483) [LoadTask.exec():110] LOAD_JOB=1758553246283, error_msg={Unexpected failed to execute load task} java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:499) ~[guava-33.2.1-jre.jar:?] at org.apache.doris.nereids.load.NereidsLoadingTaskPlanner.plan(NereidsLoadingTaskPlanner.java:146) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadLoadingTask.init(LoadLoadingTask.java:138) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createTask(BrokerLoadJob.java:274) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.createLoadingTask(BrokerLoadJob.java:311) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onPendingTaskFinished(BrokerLoadJob.java:204) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.BrokerLoadJob.onTaskFinished(BrokerLoadJob.java:163) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.load.loadv2.LoadTask.exec(LoadTask.java:102) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.task.MasterTask.run(MasterTask.java:31) ~[doris-fe.jar:1.2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?]
### What problem does this PR solve? introduce by #55498 Fix execute copy task fail: ``` Caused by: org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.co mmands.info.CopyFromDesc.getFileFilterExpr()" is null ... 13 more Caused by: java.lang.NullPointerException: Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.commands.info.CopyFromDesc.getFileFilterExpr()" is null at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.doValidate(CopyIntoInfo.java:262) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.validate(CopyIntoInfo.java:195) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.CopyIntoCommand.run(CopyIntoCommand.java:57) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:668) ~[doris-fe.jar:1.2-SNAPSHOT] ... 12 more ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? introduce by #55498 Fix execute copy task fail: ``` Caused by: org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.co mmands.info.CopyFromDesc.getFileFilterExpr()" is null ... 13 more Caused by: java.lang.NullPointerException: Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.commands.info.CopyFromDesc.getFileFilterExpr()" is null at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.doValidate(CopyIntoInfo.java:262) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.validate(CopyIntoInfo.java:195) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.CopyIntoCommand.run(CopyIntoCommand.java:57) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:668) ~[doris-fe.jar:1.2-SNAPSHOT] ... 12 more ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? introduce by #55498 Fix execute copy task fail: ``` Caused by: org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.co mmands.info.CopyFromDesc.getFileFilterExpr()" is null ... 13 more Caused by: java.lang.NullPointerException: Cannot invoke "java.util.Optional.isPresent()" because the return value of "org.apache.doris.nereids.trees.plans.commands.info.CopyFromDesc.getFileFilterExpr()" is null at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.doValidate(CopyIntoInfo.java:262) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.info.CopyIntoInfo.validate(CopyIntoInfo.java:195) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.nereids.trees.plans.commands.CopyIntoCommand.run(CopyIntoCommand.java:57) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:668) ~[doris-fe.jar:1.2-SNAPSHOT] ... 12 more ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
User may not specify data format in broker load, so we can only infer the data format
after listing the files.
So we have to defer the initialization of file properties object
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)