Skip to content

FileLock not work on gpfs #389

@xiaosu-zhu

Description

@xiaosu-zhu

An issue is observed when using filelock on gpfs.

Specially, the problem occurs when the huggingface datasets create filelocks on a multi-node cluster with a gpfs filesystem.

More details can be found in
huggingface/transformers#30859


the reproduce code (from @thinkahead)

I faced this problem when using the datasets/builder.py with multi node fine tuning. The default filelock FileLock code uses the UnixFileLock because it finds the "import fcntl". On gpfs, the UnixFileLock did not work. I had to use the SoftFileLock. You can try a simple test on the /scrip_continual_pretraining/ from 2 nodes

import time
from filelock import FileLock
#from filelock import SoftFileLock as FileLock
file_path = "/gpfs/text.txt"
lock_path = "/gpfs/test.lock"
lock = FileLock(lock_path, timeout=30)
with lock:
        print("Inside")
        time.sleep(15)
        open(file_path, "a").write("Hello there!")

If you try this in multiple nodes, you will see "Inside" printed on all nodes immediately. This is a problem.
If you try this on single node (multiple separate python processes), only one will show Inside and wait for 15 seconds before the others shows Inside.
With SoftFileLock line uncommented above, it the remaining nodes wait showing that it locks correctly on multiple nodes.

Originally posted by @thinkahead in huggingface/transformers#30859

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions