Skip to content

KeyError in default_cache_dir() when user account doesn't exist #140765

@jhostetler

Description

@jhostetler

🐛 Describe the bug

The torch._inductor package creates a cache directory. If the TORCHINDUCTOR_CACHE_DIR env variable is not set, it defaults to /tmp/torchinductor_{username}, where username is determined from the python standard library getpass.getuser() function.

This function raises a KeyError if the user account does not exist. This is a common situation in production deployments where containers are often forced to run as an ordinary user for security reasons, but the user account isn't created in the container with useradd or similar.

It would be helpful to fall back to /tmp/torchinductor or whatever if getting the user name fails. The username part doesn't seem especially necessary, since I don't imagine that "multiple users on a shared machine" is the most common usage context.

Setting TORCHINDUCTOR_CACHE_DIR does work around the problem.

Error logs

File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/runtime_utils.py", line 137, in cache_dir
    sanitized_username = re.sub(r'[\\/:*?"<>|]', "_", getpass.getuser())
                                                      ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/getpass.py", line 169, in getuser
    return pwd.getpwuid(os.getuid())[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
KeyError: 'getpwuid(): uid not found: 1001'

Versions

Unable to run in production config, but the problematic code is present in main currently:

def default_cache_dir() -> str:
sanitized_username = re.sub(r'[\\/:*?"<>|]', "_", getpass.getuser())
return os.path.join(
tempfile.gettempdir(),
"torchinductor_" + sanitized_username,
)

cc @ezyang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions