Skip to content

Conversation

@924060929
Copy link
Contributor

@924060929 924060929 commented Oct 11, 2024

Proposed changes

Use NereidsSqlCoordinator instead of Coordinator because the code of Coordinator is too hard to maintaining

The main design approach is as follows:

  1. Divide the original flat Coordinator into multiple modules, with each module maintaining high cohesion.
  • DistributePlanner: The logic for calculating parallelism has been extracted in [refactor](nereids) New distribute planner #36531, and in the future, we will dynamically calculate parallelism based on cost.
  • CoordinatorContext: Some global parameters and states related to the Coordinator are encapsulated within CoordinatorContext.
  • PipelineExecutionTask: The entire scheduling task is encapsulated by PipelineExecutionTask, which includes the mapping relationship between each Backend and Pipeline task. PipelineExecutionTask contains two layers of tasks, each responsible for specific duties, with state maintenance handled internally rather than being centralized in the Coordinator.
    • MultiFragmentsPipelineTask: A Backend will generate multiple fragment tasks, which are bundled together and sent concurrently to the corresponding Backend.
    • SingleFragmentPipelineTask: A single fragment task for a Backend.
  • JobProcessor: Describes two types of tasks: SQL tasks and Load tasks.
    • QueryProcessor: Represents query tasks and provides a ResultReceiver to obtain query results.
    • LoadProcessor: Represents Insert into and Broker load tasks, providing a blocking function to wait for load completion.
  • ThriftPlansBuilder: Uses the DistributedPlan structure to build thrift parameters and encapsulates some intermediate temporary variables within functions, rather than placing them in the Coordinator.
  1. The overall Coordinator logic is more clearly organized. We can see that the NereidsCoordinator consists of only a few functions, allowing quick understanding of the main flow when reading the code.
  • Construct CoordinatorContext.
  • Enqueue the tasks.
  • Handle different sinks accordingly.
  • Register the Coordinator with QeProcessorImpl for cancellation and progress tracking.
  • Construct thrift parameters.
  • Build PipelineTask.
  • Initiate RPC calls to each Backend.

TODO:

  1. delete old Coordinator
  2. support cloud mode

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@924060929 924060929 marked this pull request as ready for review October 12, 2024 13:09
@924060929
Copy link
Contributor Author

run buildall

8 similar comments
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

2 similar comments
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.46% (9709/25916)
Line Coverage: 28.75% (80635/280504)
Region Coverage: 28.19% (41708/147970)
Branch Coverage: 24.77% (21215/85638)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b0d8dee255cb2b7c2a959ef55647929ef653200e_b0d8dee255cb2b7c2a959ef55647929ef653200e/report/index.html

@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 7, 2024
@924060929 924060929 merged commit 46e5294 into apache:master Nov 7, 2024
924060929 added a commit that referenced this pull request Nov 12, 2024
…or (#43763)

fix QueryProcessor cannot be cast to class LoadProcessor, introduced by
#41730

Problem Summary:

sql: any select statement

it only meet when open debug log, so I can not write a test
```
2024-11-12 08:15:52,266 WARN (mysql-nio-pool-0|206) [ConnectProcessor.handleQueryException():480] Process one query failed because unknown reason: 
java.lang.ClassCastException: class org.apache.doris.qe.runtime.QueryProcessor cannot be cast to class org.apache.doris.qe.runtime.LoadProcessor (org.apache.doris.qe.runtime.QueryProcessor and org.apache.doris.qe.runtime.LoadProcessor are in unnamed module of loader 'app')
	at org.apache.doris.qe.CoordinatorContext.asLoadProcessor(CoordinatorContext.java:262) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.NereidsCoordinator.getJobId(NereidsCoordinator.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.QeProcessorImpl.registerQuery(QeProcessorImpl.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1925) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1897) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:901) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:833) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:605) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.queryRetry(StmtExecutor.java:568) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:558) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:340) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:243) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:209) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:237) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:414) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.lang.Thread.run(Thread.java:840) ~[?:?] 
```
py023 pushed a commit to py023/doris that referenced this pull request Nov 13, 2024
…or (apache#43763)

fix QueryProcessor cannot be cast to class LoadProcessor, introduced by
apache#41730

Problem Summary:

sql: any select statement

it only meet when open debug log, so I can not write a test
```
2024-11-12 08:15:52,266 WARN (mysql-nio-pool-0|206) [ConnectProcessor.handleQueryException():480] Process one query failed because unknown reason: 
java.lang.ClassCastException: class org.apache.doris.qe.runtime.QueryProcessor cannot be cast to class org.apache.doris.qe.runtime.LoadProcessor (org.apache.doris.qe.runtime.QueryProcessor and org.apache.doris.qe.runtime.LoadProcessor are in unnamed module of loader 'app')
	at org.apache.doris.qe.CoordinatorContext.asLoadProcessor(CoordinatorContext.java:262) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.NereidsCoordinator.getJobId(NereidsCoordinator.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.QeProcessorImpl.registerQuery(QeProcessorImpl.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.executeAndSendResult(StmtExecutor.java:1925) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1897) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.handleQueryWithRetry(StmtExecutor.java:901) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:833) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:605) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.queryRetry(StmtExecutor.java:568) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:558) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:340) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:243) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:209) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:237) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:414) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.lang.Thread.run(Thread.java:840) ~[?:?] 
```
924060929 added a commit that referenced this pull request Nov 13, 2024
fix new coordinator compute a wrong scanRangeNum, introduced by #41730

This bug will show a wrong progress in s3 load:
```
Progress: 0.00%(73/2147483647)
```
924060929 added a commit that referenced this pull request Nov 18, 2024
optimize new distribute planner performance in tpc-h, because #41730
made some performance rollback has occurred

1. fix the wrong runtime filter thrift parameters
2. not default to print distribute plan in profile, you should config
`set profile_level=3` to see it
3. for shuffle join which two sides distribution of natural +
execution_bucketed, support compare cost between plans of shuffle to
left/right
924060929 added a commit that referenced this pull request Dec 2, 2024
…s CTE (#44753)

fix NereidsCoordinator compute wrong result when exists CTE, introduced
by #41730
924060929 added a commit that referenced this pull request Feb 17, 2025
…property (#47888)

fix Illegal bucket shuffle join or colocate join in fragment because
compute wrong join output property, introduced by #41730

the exception:
```
errCode = 2, detailMessage = Illegal bucket shuffle join or colocate join in fragment
```
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…property (apache#47888)

fix Illegal bucket shuffle join or colocate join in fragment because
compute wrong join output property, introduced by apache#41730

the exception:
```
errCode = 2, detailMessage = Illegal bucket shuffle join or colocate join in fragment
```
924060929 added a commit that referenced this pull request Mar 12, 2025
fix colocate agg + join compute wrong result, introduced by #41730
github-actions bot pushed a commit that referenced this pull request Mar 12, 2025
fix colocate agg + join compute wrong result, introduced by #41730
@924060929 924060929 deleted the new_scheduler5 branch March 25, 2025 10:25
github-actions bot pushed a commit that referenced this pull request May 27, 2025
fix colocate agg + join compute wrong result, introduced by #41730
924060929 added a commit to 924060929/incubator-doris that referenced this pull request May 28, 2025
…8934)

fix colocate agg + join compute wrong result, introduced by apache#41730

(cherry picked from commit 0e5abe8)
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…property (apache#47888)

fix Illegal bucket shuffle join or colocate join in fragment because
compute wrong join output property, introduced by apache#41730

the exception:
```
errCode = 2, detailMessage = Illegal bucket shuffle join or colocate join in fragment
```
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…8934)

fix colocate agg + join compute wrong result, introduced by apache#41730
morningman added a commit that referenced this pull request Jun 20, 2025
### What problem does this PR solve?

Followup #50791

Add a new FE HTTP API: `/rest/v2/manager/query/statistics/trace_id`.
This API will return the query runtime statistic corresponding to a
given trace id.
The query statistics includes info such as real-time scan rows/bytes.

Internally, Doris will get query id by trace id from all Frontends, and
then fetch query statistics from BE.

Use pattern:
1. User set custom trace id by: `set
session_context="trace_id:my_trace_id"`
2. User executes a query in same session
3. Start a http client to get query statistics in real-time during the
query process.


![progress](https://github.com/user-attachments/assets/0a697c7d-d87a-4e9c-8965-c5a2d7d7836e)

Also fix a bug in `CoordinatorContext.java`, to get real host.
introduced from #41730

This PR also change the column name of `information_schema.processlist`
table, to be same as column
name in `show processlist`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants