[Torch2 CPU] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum

### 🐛 Describe the bug

I'm trying to compile a UniXcoder model (variation of BERT) from huggingface transformers on CPU.
I use python version '3.8.16' and torch version '2.0.0.dev20221222+cpu'.
When performing `model = torch.compile(model)` with the default mode, as well as `mode=reduce-overhead` on a machine with 8GB ram, I encounter the error provided below. 
Any idea how to get through it?

Thank you!

### Error logs

```
[2022-12-22 14:41:22,477] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:41:35,196] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:41:45,469] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:41:56,081] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:42:06,499] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:42:19,296] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
[2022-12-22 14:42:30,450] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 676, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.fake_example_inputs())
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/debug_utils.py", line 945, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 1151, in __call__
    return self.compile_fn(model_, inputs_)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 398, in compile_fx
    return aot_autograd(
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/optimizations/training.py", line 78, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 2353, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 2050, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_tensor_args, aot_config)
  File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 1305, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config)
  File "/usr/local/lib/python3.8/dist-packages/torch/_functorch/aot_autograd.py", line 955, in aot_dispatch_base
    compiled_fw = aot_config.fw_compiler(fw_module, flat_args)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 373, in fw_compiler
    return inner_compile(
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/debug_utils.py", line 507, in debug_wrapper
    compiled_fn = compiler_fn(gm, example_inputs, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/debug.py", line 223, in inner
    return fn(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/compile_fx.py", line 140, in compile_fx_inner
    compiled_fn = graph.compile_to_fn()
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 538, in compile_to_fn
    return self.compile_to_module().call
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/graph.py", line 527, in compile_to_module
    mod = PyCodeCache.load(code)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 461, in load
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_root/ih/cihuzmkrufm4dzdsf7l5l6b7nhtybr7fexjtnk72btsrlnrnbtew.py", line 6242, in <module>
    async_compile.wait(globals())
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 656, in wait
    scope[key] = result.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 633, in task
    return CppCodeCache.load(source_code).kernel
  File "/usr/local/lib/python3.8/dist-packages/torch/_inductor/codecache.py", line 438, in load
    subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 493, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1639, in _execute_child
    self.pid = _posixsubprocess.fork_exec(
OSError: [Errno 12] Cannot allocate memory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "compile.py", line 596, in <module>
    embeddings = get_embeddings(model, code_segments)
  File "compile.py", line 582, in get_embeddings
    _,code_embedding = model(source_ids)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/unixcoder.py", line 83, in forward
    token_embeddings = self.model(source_ids,attention_mask = mask.unsqueeze(1) * mask.unsqueeze(2))[0]
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 83, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 212, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/eval_frame.py", line 333, in catch_errors
    return callback(frame, cache_size, hooks)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 480, in _convert_frame
    result = inner_convert(frame, cache_size, hooks)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 103, in _fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 339, in _convert_frame_assert
    return _compile(
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _compile
    out_code = transform_code_object(code, transform)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/convert_frame.py", line 387, in transform
    tracer.run()
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 1684, in run
    super().run()
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 538, in run
    and self.step()
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 501, in step
    getattr(self, inst.opname)(inst)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/symbolic_convert.py", line 1750, in RETURN_VALUE
    self.output.compile_subgraph(self)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 553, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 600, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/usr/local/lib/python3.8/dist-packages/torch/_dynamo/output_graph.py", line 681, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised OSError: [Errno 12] Cannot allocate memory

Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True
```

### Minified repro

_No response_

cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch2 CPU] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum #93495

🐛 Describe the bug

Error logs

Minified repro

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Torch2 CPU] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.cumsum #93495

Description

🐛 Describe the bug

Error logs

Minified repro

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions