Skip to content

[Bug] [WorkerServer] Too many open files error #6829

@caishunfeng

Description

@caishunfeng

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

version: 2.0

[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.common.utils.OSUtils:[175] - /etc/passwd (Too many open files)
java.io.FileNotFoundException: /etc/passwd (Too many open files)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at org.apache.dolphinscheduler.common.utils.OSUtils.getUserListFromLinux(OSUtils.java:189)
        at org.apache.dolphinscheduler.common.utils.OSUtils.getUserList(OSUtils.java:172)
        at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - tenantCode: root does not exist
[INFO] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[232] - develop mode is: false
[ERROR] 2021-11-12 14:13:35.611 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[252] - delete exec dir failed : Failed to list contents of /tmp/dolphinscheduler/exec/process/851650632065024/851651851886592_1/3524626/4979296
java.io.IOException: Failed to list contents of /tmp/dolphinscheduler/exec/process/851650632065024/851651851886592_1/3524626/4979296
        at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1647)
        at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
        at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.clearTaskExecPath(TaskExecuteThread.java:249)
        at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:220)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# ulimit -n
65535
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# jps
3767833 Jps
3767487 WorkerServer
[root@ds8 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# lsof -p 3767487 | wc -l
66016

When I run a dryRun model by more than 6w+ tasks, I found that worker had many Too many open files error.
It seems like worker didn't close files, because open files number is continued growth even though tasks are fail and finish.

What you expected to happen

Worker can close file normally.

How to reproduce

Run 6w+ tasks with dryRun Model.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Waiting for replyWaiting for replybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions