-
Notifications
You must be signed in to change notification settings - Fork 5k
[Bug] [WorkerServer] Too many open files error #6829
Copy link
Copy link
Closed
Labels
Waiting for replyWaiting for replyWaiting for replybugSomething isn't workingSomething isn't working
Milestone
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
version: 2.0
[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.common.utils.OSUtils:[175] - /etc/passwd (Too many open files)
java.io.FileNotFoundException: /etc/passwd (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at org.apache.dolphinscheduler.common.utils.OSUtils.getUserListFromLinux(OSUtils.java:189)
at org.apache.dolphinscheduler.common.utils.OSUtils.getUserList(OSUtils.java:172)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - tenantCode: root does not exist
[INFO] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[232] - develop mode is: false
[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[252] - delete exec dir failed : Failed to list contents of /tmp/dolphinscheduler/exec/process/851650632065024/851651851886592_1/3524626/4979296
java.io.IOException: Failed to list contents of /tmp/dolphinscheduler/exec/process/851650632065024/851651851886592_1/3524626/4979296
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1647)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.clearTaskExecPath(TaskExecuteThread.java:249)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:220)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# ulimit -n
65535
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# jps
3767833 Jps
3767487 WorkerServer
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# lsof -p 3767487 | wc -l
66016
When I run a dryRun model by more than 6w+ tasks, I found that worker had many Too many open files error.
It seems like worker didn't close files, because open files number is continued growth even though tasks are fail and finish.
What you expected to happen
Worker can close file normally.
How to reproduce
Run 6w+ tasks with dryRun Model.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Waiting for replyWaiting for replyWaiting for replybugSomething isn't workingSomething isn't working