Skip to content

Add Python API implementation of workflows-as-code#6269

Merged
lenboo merged 9 commits intoapache:devfrom
zhongjiajie:init-pyds
Oct 31, 2021
Merged

Add Python API implementation of workflows-as-code#6269
lenboo merged 9 commits intoapache:devfrom
zhongjiajie:init-pyds

Conversation

@zhongjiajie
Copy link
Copy Markdown
Member

Purpose of the pull request

First commit and init for pydolphinscheduler

Brief change log

Initialization pydolphinscheduler, a python sdk for DolphinScheduler which could define workflow by code

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

@zhongjiajie zhongjiajie changed the title Init DS python SDK pydolphinscheduler [WIP] Init DS python SDK pydolphinscheduler Sep 18, 2021
@zhongjiajie
Copy link
Copy Markdown
Member Author

How to reproduce

After this two steps, you could see above error and found server are null

@zhongjiajie
Copy link
Copy Markdown
Member Author

FYI, after I add commit a4d84aa
we could install python package pydolphinscheduler locally by run command python setup.py install in dir dolphinscheduler/dolphinscheduler-python/pydolphinscheduler

After command finish, we could run tutorial.py directly by

cd dolphinscheduler-python/pydolphinscheduler
python example/tutorial.py

to make a quick look how pydolphinscheduler work

@zhongjiajie
Copy link
Copy Markdown
Member Author

cc: @caishunfeng

@caishunfeng
Copy link
Copy Markdown
Contributor

The PythonGatewayServer is run by spring and it can manage beans by annotations like @Autowired,
so if you new the PythonGatewayServer by yourself, which is out of Spring control, you will get some null beans.

you can have a try like GatewayServer server = new GatewayServer(this); instead of new PythonGatewayServer.

Copy link
Copy Markdown
Member Author

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestion for you @caishunfeng

@zhongjiajie
Copy link
Copy Markdown
Member Author

@caishunfeng FYI, I add you as this PR co-author manually in commit 9da1811

@caishunfeng
Copy link
Copy Markdown
Contributor

Some suggestion for you @caishunfeng

Thanks, that's a good suggestion

@zhongjiajie zhongjiajie changed the title [WIP] Init DS python SDK pydolphinscheduler [WIP] Add Python API implementation of workflows-as-code Sep 28, 2021
@zhongjiajie
Copy link
Copy Markdown
Member Author

After commit 37bd41a add, we fix error while adding new task and change task attribute(including task name).

But I found out a bug when update workflow tasks, I create an new issue to track it #6418

@zhongjiajie
Copy link
Copy Markdown
Member Author

Could I code style check and auto fix in my local env?

@zhongjiajie
Copy link
Copy Markdown
Member Author

zhongjiajie commented Sep 29, 2021

Could I code style check and auto fix in my local env?

I found out we have check-style file and idea plugin in https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 30, 2021

Codecov Report

Merging #6269 (67f956a) into dev (d91801e) will decrease coverage by 0.08%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #6269      +/-   ##
============================================
- Coverage     38.55%   38.46%   -0.09%     
+ Complexity     3224     3221       -3     
============================================
  Files           646      646              
  Lines         25840    25866      +26     
  Branches       2799     2804       +5     
============================================
- Hits           9963     9950      -13     
- Misses        14961    15002      +41     
+ Partials        916      914       -2     
Impacted Files Coverage Δ
...scheduler/api/service/impl/ProjectServiceImpl.java 74.03% <0.00%> (-3.88%) ⬇️
...inscheduler/api/service/impl/QueueServiceImpl.java 64.81% <0.00%> (-7.36%) ⬇️
...nscheduler/api/service/impl/TenantServiceImpl.java 70.27% <0.00%> (-4.02%) ⬇️
.../org/apache/dolphinscheduler/common/Constants.java 75.00% <ø> (ø)
...ache/dolphinscheduler/server/log/LoggerServer.java 78.26% <0.00%> (-8.70%) ⬇️
...r/plugin/registry/zookeeper/ZookeeperRegistry.java 39.00% <0.00%> (-5.68%) ⬇️
...er/master/processor/queue/TaskResponseService.java 57.31% <0.00%> (-3.66%) ⬇️
...dolphinscheduler/remote/future/ResponseFuture.java 81.35% <0.00%> (-1.70%) ⬇️
...er/master/dispatch/host/assign/RandomSelector.java 83.33% <0.00%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d91801e...67f956a. Read the comment docs.

@zhongjiajie
Copy link
Copy Markdown
Member Author

zhongjiajie commented Sep 30, 2021

Questions: could I add py4j.jar to file known-dependencies.txt directly? for passing check-LICENSE CI? py4j license in https://github.com/bartdag/py4j/blob/master/LICENSE.txt

@kezhenxu94
Copy link
Copy Markdown
Member

Questions: could I add py4j.jar to file known-dependencies.txt directly? for passing check-LICENSE CI? py4j license in https://github.com/bartdag/py4j/blob/master/LICENSE.txt

You need to check whether the jar is license-compatible with APL2.0, and add the corresponding license info into the dolphinscheduler-dist/release-docs/licenses.

@zhongjiajie
Copy link
Copy Markdown
Member Author

You need to check whether the jar is license-compatible with APL2.0

@kezhenxu94 How could I check whether it compatible to APL2.0 or not?

@kezhenxu94
Copy link
Copy Markdown
Member

You need to check whether the jar is license-compatible with APL2.0

@kezhenxu94 How could I check whether it compatible to APL2.0 or not?

Read this https://www.apache.org/legal/resolved.html

According to your comment, py4j is BSD 2.0 so it's compatible with APL2.0

@zhongjiajie
Copy link
Copy Markdown
Member Author

Thanks, BTW your response as fast as usual

@zhongjiajie zhongjiajie marked this pull request as ready for review October 27, 2021 10:10
@zhongjiajie zhongjiajie changed the title [WIP] Add Python API implementation of workflows-as-code Add Python API implementation of workflows-as-code Oct 27, 2021
@zhongjiajie
Copy link
Copy Markdown
Member Author

zhongjiajie commented Oct 27, 2021

It's good to go now, I add example to desc how it work, and add some doc about pydolphinscheduler dev env build and core concept in readme
PTAL @dailidong @CalvinKirs @lenboo @caishunfeng

davidzollo
davidzollo previously approved these changes Oct 27, 2021
Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job, such a big feature
+1

Copy link
Copy Markdown
Member

@kezhenxu94 kezhenxu94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<dependency>
<groupId>net.sf.py4j</groupId>
<artifactId>py4j</artifactId>
<version>0.10.9</version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dependency should put in parent pom.xml to manage version easily.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this to parent pom.xml, but it seem we should still add dependency

<dependency>
    <groupId>net.sf.py4j</groupId>
    <artifactId>py4j</artifactId>
</dependency>

to python/pom.xml. otherwise it could not find py4j dependent. Am I wrong? @caishunfeng

@zhongjiajie
Copy link
Copy Markdown
Member Author

You need to also add the new dependency license to https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-dist/release-docs/LICENSE

I add it to release-docs/LICENSE now in 2da0790, thinks

@zhongjiajie
Copy link
Copy Markdown
Member Author

zhongjiajie commented Oct 28, 2021

When I run python define code, I could see workflow was created and process instance start but failed. The screenshot as below
image

And I see some error log look like

2021-10-28 11:17:07.160  INFO 30083 --- [tp905404580-123] o.a.d.a.service.impl.LoggerServiceImpl   : log host : 192.168.1.102 , logPath :  , logServer port : 50051
2021-10-28 11:17:07.160  INFO 30083 --- [tp905404580-123] o.a.d.service.log.LogClientService       : roll view log, host : 192.168.1.102, port : 50051, path , skipLineNum 0 ,limit 1000
2021-10-28 11:17:07.193  WARN 30083 --- [tp905404580-123] o.a.d.remote.NettyRemotingClient         : connect to Host{address='192.168.1.102:50051', ip='192.168.1.102', port=50051} error

@zhongjiajie
Copy link
Copy Markdown
Member Author

When I run python define code, I could see workflow was created and process instance start but failed. The screenshot as below image

And I see some error log look like

2021-10-28 11:17:07.160  INFO 30083 --- [tp905404580-123] o.a.d.a.service.impl.LoggerServiceImpl   : log host : 192.168.1.102 , logPath :  , logServer port : 50051
2021-10-28 11:17:07.160  INFO 30083 --- [tp905404580-123] o.a.d.service.log.LogClientService       : roll view log, host : 192.168.1.102, port : 50051, path , skipLineNum 0 ,limit 1000
2021-10-28 11:17:07.193  WARN 30083 --- [tp905404580-123] o.a.d.remote.NettyRemotingClient         : connect to Host{address='192.168.1.102:50051', ip='192.168.1.102', port=50051} error

After I inspect it, I find it cause due to my laptop cpu load too high and work do not available and task mark failed. Maybe we should add some notice for this?

But for this PR, I think is ready to go now.

@zhongjiajie
Copy link
Copy Markdown
Member Author

PTAL @dailidong @lenboo @CalvinKirs @caishunfeng

@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug B 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@caishunfeng
Copy link
Copy Markdown
Contributor

+1

Copy link
Copy Markdown
Member

@CalvinKirs CalvinKirs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License good.

Copy link
Copy Markdown
Contributor

@lenboo lenboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lenboo lenboo merged commit dd6ed36 into apache:dev Oct 31, 2021
@zhongjiajie zhongjiajie deleted the init-pyds branch November 1, 2021 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants