Skip to content

Trainer class on Mac uses accelerate to incorrectly set MPS device #24697

@alex2awesome

Description

@alex2awesome

System Info

transformers==4.30.2
Mac 2019, Ventura 13.4

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

ISSUE: I am running a generic model training using Trainer on my mac, locally. My model is being moved to MPS, but my tensors are staying on CPU.

I can provide more details about my script, but I kinda expect that this is a general library problem. Here's the lines of code I discovered:

When the accelerator is instantiated in the Trainer class, it doesn't get passed any user-specific arguments, like this from TrainingArgs for e.g to give the user control over which device to use. As a result, when running locally on Mac, Accelerate does a lot of inference about which device we want to use, and moves the model to self.device in the non-distributed setting. I'm not sure yet how self.device is instantiated in Accelerate, however, Trainer doesn't natively move my data to mps, so my script is crashing.

Expected behavior

Ideally, I have a flag I can pass into Trainer to help me not MPS if I don't want to, and just stick with CPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions