Skip to content

[Feature-2925][server] Init TaskLogger in TaskExecuteProcessor (#2925)#2965

Merged
davidzollo merged 2 commits intoapache:devfrom
yangyichao-mango:feature-2925
Jun 28, 2020
Merged

[Feature-2925][server] Init TaskLogger in TaskExecuteProcessor (#2925)#2965
davidzollo merged 2 commits intoapache:devfrom
yangyichao-mango:feature-2925

Conversation

@yangyichao-mango
Copy link
Copy Markdown
Contributor

@yangyichao-mango yangyichao-mango commented Jun 13, 2020

What is the purpose of the pull request

Init task logger before init TaskExecuteThread and pass the taskLogger in to TaskExecuteThread init method as parameter, so we can record the user and work dir creating log to task instance log, and user can read this log in front-end WebUI. #3014
Use common ShellExecutor in FileUtils. #2925

Brief change log

  • Add taskLogger parameter to TaskExecuteThread.
  • Add three log utils to LoggerUtils.
  • Init task logger before create initing TaskExecuteThread, and init task logger process to TaskExecuteProcessor.
  • Log all detail when create user and working dir in FileUtils.

Verify this pull request

This pull request is code cleanup without any test coverage.

@yangyichao-mango yangyichao-mango force-pushed the feature-2925 branch 3 times, most recently from 5482a5e to f36af53 Compare June 14, 2020 08:00
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Merging #2965 into dev will increase coverage by 0.05%.
The diff coverage is 33.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #2965      +/-   ##
============================================
+ Coverage     39.11%   39.17%   +0.05%     
- Complexity     2678     2682       +4     
============================================
  Files           435      435              
  Lines         20119    20161      +42     
  Branches       2454     2459       +5     
============================================
+ Hits           7870     7898      +28     
- Misses        11514    11525      +11     
- Partials        735      738       +3     
Impacted Files Coverage Δ Complexity Δ
...che/dolphinscheduler/server/log/TaskLogFilter.java 25.00% <0.00%> (-8.34%) 2.00 <1.00> (ø)
.../server/worker/processor/TaskExecuteProcessor.java 8.95% <0.00%> (-1.22%) 1.00 <0.00> (ø)
...eduler/server/worker/runner/TaskExecuteThread.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
.../apache/dolphinscheduler/common/utils/OSUtils.java 47.36% <33.33%> (-0.96%) 23.00 <3.00> (ø)
...pache/dolphinscheduler/common/utils/FileUtils.java 30.57% <47.36%> (+1.60%) 9.00 <0.00> (ø)
...che/dolphinscheduler/common/utils/LoggerUtils.java 66.66% <50.00%> (-20.00%) 6.00 <2.00> (+2.00) ⬇️
.../apache/dolphinscheduler/common/utils/IOUtils.java 0.00% <0.00%> (-50.00%) 0.00% <0.00%> (-1.00%)
...he/dolphinscheduler/api/service/LoggerService.java 82.60% <0.00%> (-8.70%) 9.00% <0.00%> (-1.00%)
...e/dolphinscheduler/remote/NettyRemotingClient.java 51.07% <0.00%> (-2.88%) 9.00% <0.00%> (-2.00%)
...inscheduler/common/task/sqoop/SqoopParameters.java 74.00% <0.00%> (ø) 25.00% <0.00%> (ø%)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3a7fd72...f36af53. Read the comment docs.

@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

Hi @gabrywu @dailidong , Can you help to review this PR? Thx a lot.

@gabry-lab
Copy link
Copy Markdown
Member

@yangyichao-mango What's purpose of this PR? Why are there so many irrelevant files to change ?

@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

yangyichao-mango commented Jun 17, 2020

@yangyichao-mango What's purpose of this PR? Why are there so many irrelevant files to change ?

Thx a lot for your review @gabrywu .

This PR is essentially to get the abnormal exit value caused by the error (exceptions caused by permissions, etc.) when the worker starts to create the working directory, to judge if the user info and work dir is created successfully.

But after my change, I found that if the worker failed to create the working directory, the exception it threw would be caught, only recorded in the log of the worker node, not in the log of a single task instance. Although the task will fail at the end of execution, the exception log is very few and it is difficult to locate the exception (permission exception, etc.) directly.

In my opinion, I understand that this error should fail fast, or at least recorded by logs. So in order to match the change of my exitval judgment work, I put the initialization change of tasklogger to the beginning of TaskExecuteProcessor. And I understand that the task logger should record all the logs related to the task, not only the task runtime logs.

If I have any misunderstanding, please give me suggestions.


非常感谢你的review。

这个pr本质上是为了获取在worker执行task时创建工作目录时的错误(由于权限导致的问题等)值,来判断工作目录,用户权限是否创建成功。

但是在改动完成后,我发现如果worker创作工作目录失败其抛出的异常会被catch住,异常信息只会在worker节点的日志中被记录下来,不会在单个task的日志中记录下来,这相当于创建工作目录相关的所有错误信息完全不会被直接外显给用户。虽然最后task执行的时候会失败,但是异常日志非常少,很难直接定位到工作目录中的异常(权限异常等)。

我理解这种错误应该fail-fast,或者至少有日志来记录,所以为了配合我这次针对exitval判断的改动,我就把taskLogger的初始化改动到了最前面,用来记录涉及到这个task的所有日志。并且我理解taskLogger的原本目的就是用来记录所有涉及到这个task的日志,不仅仅是task运行时的日志。

如果我有理解不正确的地方,敬请指出。

@gabry-lab
Copy link
Copy Markdown
Member

gabry-lab commented Jun 18, 2020

@yangyichao-mango What's purpose of this PR? Why are there so many irrelevant files to change ?

Thx a lot for your review @gabrywu .

This PR is essentially to get the abnormal exit value caused by the error (exceptions caused by permissions, etc.) when the worker starts to create the working directory, to judge if the user info and work dir is created successfully.

But after my change, I found that if the worker failed to create the working directory, the exception it threw would be caught, only recorded in the log of the worker node, not in the log of a single task instance. Although the task will fail at the end of execution, the exception log is very few and it is difficult to locate the exception (permission exception, etc.) directly.

In my opinion, I understand that this error should fail fast, or at least recorded by logs. So in order to match the change of my exitval judgment work, I put the initialization change of tasklogger to the beginning of TaskExecuteProcessor. And I understand that the task logger should record all the logs related to the task, not only the task runtime logs.

If I have any misunderstanding, please give me suggestions.

非常感谢你的review。

这个pr本质上是为了获取在worker执行task时创建工作目录时的错误(由于权限导致的问题等)值,来判断工作目录,用户权限是否创建成功。

但是在改动完成后,我发现如果worker创作工作目录失败其抛出的异常会被catch住,异常信息只会在worker节点的日志中被记录下来,不会在单个task的日志中记录下来,这相当于创建工作目录相关的所有错误信息完全不会被直接外显给用户。虽然最后task执行的时候会失败,但是异常日志非常少,很难直接定位到工作目录中的异常(权限异常等)。

我理解这种错误应该fail-fast,或者至少有日志来记录,所以为了配合我这次针对exitval判断的改动,我就把taskLogger的初始化改动到了最前面,用来记录涉及到这个task的所有日志。并且我理解taskLogger的原本目的就是用来记录所有涉及到这个task的日志,不仅仅是task运行时的日志。

如果我有理解不正确的地方,敬请指出。

You'd better give a suitable title to this issue, otherwise others may have a misunderstanding. If you want to fix different issue, different PRs should be created

@yangyichao-mango yangyichao-mango changed the title [Feature-2925][common] Add exitVal judge in OSUtils.exeCmd (#2925) [Feature-2925][server] Init TaskLogger in TaskExecuteProcessor (#2925) Jun 18, 2020
@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

@yangyichao-mango What's purpose of this PR? Why are there so many irrelevant files to change ?

Thx a lot for your review @gabrywu .
This PR is essentially to get the abnormal exit value caused by the error (exceptions caused by permissions, etc.) when the worker starts to create the working directory, to judge if the user info and work dir is created successfully.
But after my change, I found that if the worker failed to create the working directory, the exception it threw would be caught, only recorded in the log of the worker node, not in the log of a single task instance. Although the task will fail at the end of execution, the exception log is very few and it is difficult to locate the exception (permission exception, etc.) directly.
In my opinion, I understand that this error should fail fast, or at least recorded by logs. So in order to match the change of my exitval judgment work, I put the initialization change of tasklogger to the beginning of TaskExecuteProcessor. And I understand that the task logger should record all the logs related to the task, not only the task runtime logs.
If I have any misunderstanding, please give me suggestions.
非常感谢你的review。
这个pr本质上是为了获取在worker执行task时创建工作目录时的错误(由于权限导致的问题等)值,来判断工作目录,用户权限是否创建成功。
但是在改动完成后,我发现如果worker创作工作目录失败其抛出的异常会被catch住,异常信息只会在worker节点的日志中被记录下来,不会在单个task的日志中记录下来,这相当于创建工作目录相关的所有错误信息完全不会被直接外显给用户。虽然最后task执行的时候会失败,但是异常日志非常少,很难直接定位到工作目录中的异常(权限异常等)。
我理解这种错误应该fail-fast,或者至少有日志来记录,所以为了配合我这次针对exitval判断的改动,我就把taskLogger的初始化改动到了最前面,用来记录涉及到这个task的所有日志。并且我理解taskLogger的原本目的就是用来记录所有涉及到这个task的日志,不仅仅是task运行时的日志。
如果我有理解不正确的地方,敬请指出。

You'd better give a suitable title to this issue, otherwise others may have a misunderstanding. If you want to fix different issue, different PRs should be created

Thx a lot for your review @gabrywu .
I've change the description of this issue and PR and connect the [FEATURE] #3014 and [BUG] #2925.

Comment on lines 186 to 189
} catch (Throwable ignored) {
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why swallow exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why swallow exception?

Thx a lot for your review @dailidong . I've removed the unused exception try catch clause.

Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@davidzollo
Copy link
Copy Markdown
Contributor

What is the function of this ThreadLocal? Is it necessary? I think we changed our log format and implemented it with loggerutils

@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

What is the function of this ThreadLocal? Is it necessary? I think we changed our log format and implemented it with loggerutils

Thx a lot for your review @dailidong .
In my opinion, it's better to set the tasklogger threadlocal value to log the task detail but not to pass the tasklogger through the function's input params in the utils class of FileUtils and OSUtils.

@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

yangyichao-mango commented Jun 21, 2020

What is the function of this ThreadLocal? Is it necessary? I think we changed our log format and implemented it with loggerutils

Thx a lot for your review and work of merging dev branch to current branch @dailidong .
In my opinion, although my branch is out-of-date with the base branch, maybe it's better to rebase the current branch and it will make the dev branch cleaner. If it's needed, I will rebase my branch after your approval this PR and reply me.

@davidzollo
Copy link
Copy Markdown
Contributor

I think no need to rebase if no confilct files

@yangyichao-mango
Copy link
Copy Markdown
Contributor Author

I think no need to rebase if no confilct files

Thx a lot for your review. @dailidong

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 2 Code Smells

49.5% 49.5% Coverage
0.0% 0.0% Duplication

@Lucaszlei
Copy link
Copy Markdown
Contributor

Lucaszlei commented Jan 14, 2021

Hello, there is still a problem under the modified logic:
If you copy a resource file and there is FileNotFoundException (the file is in the resource list, but not in HDFS), you will still not see the relevant error message in the task log of the Web UI, making it difficult to troubleshoot the problem

@yangyichao-mango


您好,在现在修改后的逻辑下,还存在一个问题是:
如果在copy资源文件时发生java.io.FileNotFoundException(文件在资源列表中,但是在hdfs上不存在的特殊情况下),那么在web UI的task日志中还是看不到相关错误信息,不便于排查问题
图片

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Init TaskLogger in TaskExecuteProcessor [BUG] There is no exitVal judge in OSUtils.exeCmd function

6 participants