Skip to content

[Feature][Task Plugin] Add jupyter task plugin#9872

Merged
zhongjiajie merged 6 commits intoapache:devfrom
EricGao888:Fix-9816
May 12, 2022
Merged

[Feature][Task Plugin] Add jupyter task plugin#9872
zhongjiajie merged 6 commits intoapache:devfrom
EricGao888:Fix-9816

Conversation

@EricGao888
Copy link
Copy Markdown
Member

@EricGao888 EricGao888 commented May 4, 2022

Purpose of the pull request

Brief change log

  • Encapsulate papermill in Dolphin jupyter task plugin.
    image

Verify this pull request

  • Verified by unit test.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Merging #9872 (e50f23c) into dev (f9f9100) will increase coverage by 0.09%.
The diff coverage is 72.22%.

@@             Coverage Diff              @@
##                dev    #9872      +/-   ##
============================================
+ Coverage     40.31%   40.40%   +0.09%     
- Complexity     4530     4556      +26     
============================================
  Files           835      840       +5     
  Lines         33742    33850     +108     
  Branches       3725     3736      +11     
============================================
+ Hits          13602    13678      +76     
- Misses        18855    18875      +20     
- Partials       1285     1297      +12     
Impacted Files Coverage Δ
...cheduler/plugin/task/jupyter/JupyterConstants.java 0.00% <0.00%> (ø)
...eduler/plugin/task/jupyter/JupyterTaskChannel.java 0.00% <0.00%> (ø)
...plugin/task/jupyter/JupyterTaskChannelFactory.java 0.00% <0.00%> (ø)
...phinscheduler/plugin/task/jupyter/JupyterTask.java 74.62% <74.62%> (ø)
...heduler/plugin/task/jupyter/JupyterParameters.java 93.33% <93.33%> (ø)
...er/master/dispatch/host/assign/RandomSelector.java 77.77% <0.00%> (-5.56%) ⬇️
...r/plugin/task/sqoop/parameter/SqoopParameters.java 53.33% <0.00%> (-1.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9f9100...e50f23c. Read the comment docs.

@SbloodyS SbloodyS added the feature new feature label May 4, 2022

import com.fasterxml.jackson.databind.ObjectMapper;

public class JupyterTask extends AbstractYarnTask {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a yarn task?

Copy link
Copy Markdown
Member Author

@EricGao888 EricGao888 May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to run in yarn container. The reason I extends AbstractYarnTask is that jupyter task plugin builds papermill command and then run it through shell executor, just the same as spark task. Not sure whether there is a better way?

Copy link
Copy Markdown
Member

@zhongjiajie zhongjiajie May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could ref in ptyhon task, extends from AbstractTaskExecutor and use ShellCommandExecutor

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could ref in ptyhon task, extends from AbstractTaskExecutor and use ShellCommandExecutor

@zhongjiajie That's a good idea. Will fix it in the next commit. Thx for the suggestion~

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch JupyterTask to extend AbstractTaskExecutor instead of AbstractYarnTask in the latest commit.

public class JupyterTaskChannel implements TaskChannel {

@Override
public void cancelApplication(boolean status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does jupyter has no cancel operation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found cancel op for jupyter. The same as spark task.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit with cancelApplication method in JupyterTask.java

@EricGao888 EricGao888 changed the title [Feature][Task Plugin] Add jupyter task plugin (#9816) [Feature][Task Plugin] Add jupyter task plugin May 6, 2022
@zhongjiajie zhongjiajie requested a review from caishunfeng May 7, 2022 01:27
@EricGao888
Copy link
Copy Markdown
Member Author

@caishunfeng I've resolved the comments above in the latest commit. Could you please take another look when you have time? Thx~

Copy link
Copy Markdown
Contributor

@caishunfeng caishunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caishunfeng
Copy link
Copy Markdown
Contributor

BTW, it seems miss UI and doc, another PR?

@EricGao888
Copy link
Copy Markdown
Member Author

BTW, it seems miss UI and doc, another PR?

@caishunfeng Yes. As soon as the backend PR is merged, I will work on UI and docs.

@EricGao888
Copy link
Copy Markdown
Member Author

BTW, it seems miss UI and doc, another PR?

@caishunfeng I just modified the description of this PR. Issue:#9816 will not be closed by this PR and I will submit another PR of UI and docs.

@zhongjiajie
Copy link
Copy Markdown
Member

If you want to submit UI and docs, you could combine them in one single PR(this PR), I think

@zhongjiajie zhongjiajie added miss:docs missing documents in PR miss:ui missing UI components labels May 7, 2022
@EricGao888
Copy link
Copy Markdown
Member Author

If you want to submit UI and docs, you could combine them in one single PR(this PR), I think

@zhongjiajie The reason why I planned to submit separate PRs for backend and UI is that it seems people who review the backend part and UI part are usually different and it looks more convenient to review. If there's no more changes needed for the backend part, I can continue to add the UI and docs in this PR : )

@zhongjiajie
Copy link
Copy Markdown
Member

If you want to submit UI and docs, you could combine them in one single PR(this PR), I think

@zhongjiajie The reason why I planned to submit separate PRs for backend and UI is that it seems people who review the backend part and UI part are usually different and it looks more convenient to review. If there's no more changes needed for the backend part, I can continue to add the UI and docs in this PR : )

WDYT @caishunfeng @songjianet

@EricGao888
Copy link
Copy Markdown
Member Author

FYI, I've added jupyter task plugin UI in the latest commit. Docs is WIP, will add it in the next commit.

@EricGao888
Copy link
Copy Markdown
Member Author

EricGao888 commented May 9, 2022

UI and docs have been added. Could you please take another look when you have time? Thx : ) @zhongjiajie @caishunfeng @songjianet

labbomb
labbomb previously approved these changes May 10, 2022
Copy link
Copy Markdown
Member

@labbomb labbomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jieguangzhou
Copy link
Copy Markdown
Member

jieguangzhou commented May 10, 2022

This is a good task plugin, Awesome!

if (data.taskParams?.others) {
params.others = data.taskParams.others
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not a nested structure of the taskParams, there is no need to set the value one by one.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I'm not familiar with the front-end, could you please give some suggestions on how to fix it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘ve fixed it, please check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘ve fixed it, please check.

Thx for the help : )

@EricGao888
Copy link
Copy Markdown
Member Author

This is a good task plugin, Awesome!

@jieguangzhou Thanks! BTW, I saw the MLops issue #9725 you opened and it looks really cool. I'm quite looking forward to it.

@jieguangzhou
Copy link
Copy Markdown
Member

This is a good task plugin, Awesome!

@jieguangzhou Thanks! BTW, I saw the MLops issue #9725 you opened and it looks really cool. I'm quite looking forward to it.

I'm looking forward to adding the jupyter task to the MLops workflow.

Copy link
Copy Markdown
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other lgtm, except for the front end

- Jupyter Start Timeout: Timeout set for jupyter notebook kernel.
- Others: Other command options for papermill.

## Task Example
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find you add a new config conda.path=/opt/anaconda3/etc/profile.d/conda.sh in commom.properties, could you also add some docs about what it is and how to change the config by users them self

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update the docs. BTW, could anyone help with this? Not sure how to deal with it. I plan to update docs and fix the front end in the next commit.
image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find you add a new config conda.path=/opt/anaconda3/etc/profile.d/conda.sh in commom.properties, could you also add some docs about what it is and how to change the config by users them self

@zhongjiajie Docs updated, PTAL, thx~

Remove useless code.
@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

71.2% 71.2% Coverage
1.2% 1.2% Duplication

@EricGao888 EricGao888 requested a review from zhongjiajie May 12, 2022 02:03
Copy link
Copy Markdown
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM about the backend and docs

@zhongjiajie
Copy link
Copy Markdown
Member

zhongjiajie commented May 12, 2022

I find out @Amy0104 fix the frontend issue, will merge now, thanks @EricGao888

@zhongjiajie zhongjiajie merged commit 8036936 into apache:dev May 12, 2022
songjianet pushed a commit to songjianet/dolphinscheduler that referenced this pull request May 17, 2022
Co-authored-by: Amy0104 <97265214+Amy0104@users.noreply.github.com>
rockfang pushed a commit to rockfang/dolphinscheduler that referenced this pull request May 18, 2022
Co-authored-by: Amy0104 <97265214+Amy0104@users.noreply.github.com>
vagetablechicken added a commit to vagetablechicken/dolphinscheduler that referenced this pull request May 21, 2022
commit 8046180
Author: 王心雨 <wangxinyu@4paradigm.com>
Date:   Thu May 19 12:34:07 2022 +0800

    去掉错误提交的代码

commit 5e9e9a9
Author: 王心雨 <wangxinyu@4paradigm.com>
Date:   Thu May 19 12:23:44 2022 +0800

    加入openmldb task

commit b1885c7
Author: caishunfeng <caishunfeng2021@gmail.com>
Date:   Mon May 16 15:41:45 2022 +0800

    [Bug] fix run on docker and k8s (apache#10026)

    * fix docker-compose init schema

    * recovery depend on zk

    * update doc and dockerfile

    * fix run on k8s

    * udpate doc

    * add DOCKER flag & update doc

    * remove repeat DOCKER env

commit d643e1c
Author: Dannila <94423827+Dannila@users.noreply.github.com>
Date:   Mon May 16 15:06:21 2022 +0800

    [Fix-10039] Flink run command when perfecting Python jobs (apache#10042)

    * [fix] flink task

    * [fix] flink task

commit 3e471f8
Author: Jiajie Zhong <zhongjiajie955@hotmail.com>
Date:   Mon May 16 11:01:32 2022 +0800

    [feat] Add batch rerun and stop for process instance (apache#10013)

    * [feat] Add batch rerun and stop for process instance

    * fix execute type pass

    * error message changed

    * correct error message

    * add note to i18n ignore

    * enhance the note of process instance ids

commit f666c64
Author: chuxing <923622908@qq.com>
Date:   Mon May 16 09:53:45 2022 +0800

    [doc] Correct docs of development-environment-setup (apache#9995)

commit 359cbe2
Author: zixi0825 <sunzhaohe0825@gmail.com>
Date:   Sun May 15 17:46:31 2022 +0800

    [dataquality] Fix task commnd null bug (apache#9974)

commit 2423fa5
Author: xiangzihao <460888207@qq.com>
Date:   Sun May 15 13:04:48 2022 +0800

    [Feature-10034][CI] Add postgresql cluster test in CI (apache#10035)

    * add postgresql cluster test

    * add postgresql cluster test

    * add postgresql cluster test

    * add postgresql cluster test

commit df04c4a
Author: chuxing <923622908@qq.com>
Date:   Sun May 15 10:24:31 2022 +0800

    [fix-9991][worker]fix statement is closed before resultSet.getMetaData() (apache#10014)

commit baf654c
Author: xiangzihao <460888207@qq.com>
Date:   Sat May 14 12:30:57 2022 +0800

    [Feature-9474] [CI] Add cluster test script verify on shell script (apache#9997)

    * cluster test

    * fix init db failed

    * fix init db failed

    * fix init db failed

    * fix init db failed

    * fix init db failed

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add sudo

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * add github actions

    * remove cluster-test workflows

    * add github actions

    * add github actions

    * refactor test to docker mode

    * refactor test to docker mode

    * refactor test to docker mode

    * refactor test to docker mode

    * remove create schema logic

    * remove create schema logic

    * remove create schema logic

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * fix runnint cluster test error

    * add github actions

    * add github actions

    * add github actions

    * add cluster test timeout

    * add cluster start test loop check

    * add cluster start test loop check

    * optimize cluster start test loop check

commit ee2c516
Author: 旺阳 <wang@lqwang.net>
Date:   Fri May 13 10:39:05 2022 +0800

    [doc] Correct kubernetes (apache#9985)

    Co-authored-by: qingwli <qingwli@cisco.com>

commit d17379d
Author: rockfang <657297417@qq.com>
Date:   Fri May 13 10:16:54 2022 +0800

    [Fix][UI] Fix errorOutputPath column in dataquality page (apache#10015)

commit 8036936
Author: Eric Gao <ericgao.apache@gmail.com>
Date:   Thu May 12 19:55:49 2022 +0800

    [feat][task plugin] Add Jupyter task plugin (apache#9872)

    Co-authored-by: Amy0104 <97265214+Amy0104@users.noreply.github.com>

commit 0fe7548
Author: QuakeWang <45645138+QuakeWang@users.noreply.github.com>
Date:   Thu May 12 18:23:43 2022 +0800

    [doc] Add example and notice about task type Dependent (apache#10001)

    Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>

commit 18bfe63
Author: Amy0104 <97265214+Amy0104@users.noreply.github.com>
Date:   Thu May 12 16:54:03 2022 +0800

    [Fix][UI] Support only one file upload on the file manage page. (apache#10007)

commit dbdbfea
Author: WangJPLeo <103574007+WangJPLeo@users.noreply.github.com>
Date:   Thu May 12 16:31:53 2022 +0800

    [Fix-9975] The selected task instance was recreated when the Master service fail… (apache#9976)

    * The selected task instance was recreated when the Master service failed over.

    * Returns the expression result directly.

    * Use Recovery to determine whether to use the old task instance.

commit 00f1029
Author: Amy0104 <97265214+Amy0104@users.noreply.github.com>
Date:   Thu May 12 15:27:55 2022 +0800

    [Fix][UI] Fix the task name validator error. (apache#10008)

commit bce5a28
Author: Kerwin <37063904+zhuangchong@users.noreply.github.com>
Date:   Thu May 12 14:42:39 2022 +0800

    [doc] Add the description about execute type in SQL task (apache#9987)

commit 53cd700
Author: rockfang <657297417@qq.com>
Date:   Thu May 12 14:11:52 2022 +0800

    [Fix-10002] Fix some bugs in datasource list (apache#10005)

    * fix: fix ellipsis bug in table column

    * fix ellipsis bug in table column

    * fix: json-highlight component support nested jsonstring

    * fix: make datasource description show

commit 8d17fd2
Author: Jiajie Zhong <zhongjiajie955@hotmail.com>
Date:   Thu May 12 11:30:01 2022 +0800

    [ci] Dead link check all markdown file (apache#10004)

    there are 162 markdown files in dir `docs`
    and all file in project is 175 files, so I
    think check all file will not add too much
    load for our ci, but it could discovered
    the dead link in time to avoid some thing
    like apache#9998

commit 99b1c40
Author: songjianet <1778651752@qq.com>
Date:   Wed May 11 22:15:36 2022 +0800

    [Docs] Fix docker link. (apache#9998)

commit d4aeee1
Author: Tq <tianqitobethefirst@gmail.com>
Date:   Wed May 11 18:37:03 2022 +0800

    [Bug] [MASTER-9811]fix cmd param to overwrite global param when executing complement (apache#9952)

    * fix cmd param to overwrite global param when executing complement

    * fix cmd param to overwrite global param when executing complement

commit bc1c15b
Author: qianli2022 <97489722+qianli2022@users.noreply.github.com>
Date:   Wed May 11 16:40:30 2022 +0800

    [Feature-8252][doc] K8s and namespace manager docs and web page update (apache#9881)

    * test

    * test

    * doc

    * fix image error

    * fix

    * Update docs/docs/en/guide/security.md

    * Update docs/docs/en/guide/security.md

    * Update docs/docs/en/guide/security.md

    * Update docs/docs/zh/guide/security.md

    * fix doc

    Co-authored-by: qianl4 <qianl4@cicso.com>
    Co-authored-by: William Tong <weitong@cisco.com>
    Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
@devosend devosend added this to the 3.1.0-alpha milestone Jun 16, 2022
@EricGao888 EricGao888 removed miss:docs missing documents in PR miss:ui missing UI components labels Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Task Plugin] Add jupyter notebook task plugin

10 participants