Add support for dynamic shape in dynamo by wonjoo-wj · Pull Request #7676 · pytorch/xla

wonjoo-wj · 2024-07-12T21:00:28Z

TODO

Remove debugging code and add comments
Add unit tests
Handle error case when TorchDynamo passes us int types

wonjoo-wj · 2024-07-12T21:02:33Z

With the current changes, the following code generates correct results without recompiling the graph:

    ###
    # torch.compile dynamic shape ON
    torch._dynamo.config.automatic_dynamic_shapes = True
    compiled_fn = torch.compile(fn, backend='openxla', dynamic=True)
    a = torch.randn(3, 4, device=device)
    b = torch.ones(4, device=device)
    ret = compiled_fn(a, b)
    xm.mark_step()
    print(f'[Testing] {ret=}')
    print(f'--------------------')

    c = torch.randn(4, 5, device=device)
    d = torch.ones(5, device=device)
    ret2 = compiled_fn(c, d)
    xm.mark_step()
    print(f'[Testing] {ret2=}')
    print(f'--------------------')

As for next steps, I'll clean up some code and add some unit tests.

JackCaoG · 2024-07-16T20:22:18Z

seems like a bunch of test failed and a lot of them are real failures. @wonjoolee95 let me know if you need help debugging them

wonjoo-wj · 2024-07-18T05:26:51Z

+      #   self.assertTrue(
+      #       torch.allclose(output_cpu_new_shape, output_new_shape.cpu(), rtol=1e-05, atol=1e-05))


This part is odd. When I run these tests, the allclose fails because in some iteration of the data loader with this new_shape, the differences are as big as 0.2.

This is fixed with the explicit mark_step call within else statement under torch._dynamo.config.assume_static_by_default.

JackCaoG · 2024-07-19T22:15:46Z

+    for data, _ in loader_new_shape:
+      output_new_shape = dynamo_resnet18(data)
+      output_cpu_new_shape = resnet18(data.cpu())
+      # # TPU has some precision issues, skipping allclose check


remove one #

JackCaoG · 2024-07-19T22:16:06Z

+                output_new_shape.cpu(),
+                rtol=1e-05,
+                atol=1e-05))
+


maybe also check the CompileTime and ExecuteTime here

also can you make another test to test the case of

fn(shape_a) fn(shape_b) fn(shape_c) fn(shape_a)

want to make sure we don't forgot the old shapes that's cached.

JackCaoG · 2024-07-19T22:19:00Z

+  # Values: tuple of (xla_args_sharding_spec, args_and_out, graph_hash,
+  # arg_index_to_need_update_index, none_remover, graph_input_matcher,
+  # dumb_return_handler, xla_args_need_update).
+  input_shape_mappings: dict[tuple[int, ...], tuple[object, ...]] = {}


ust typing.Dict and typing.Tuple otherwise the python 3.8 CI in upstream will fail

JackCaoG · 2024-07-19T22:20:40Z

+        input_shape_mappings[arg_input_shapes] = (
+            xla_args_sharding_spec, args_and_out, graph_hash,
+            arg_index_to_need_update_index, none_remover, graph_input_matcher,
+            dumb_return_handler, xla_args_need_update)


I think you don't need this here

IIUC, we actually need this here. And we actually don't need this same logic in extract_internal above (removed this in the newest commit). The reason is when dynamic=True, only optimized_mod is called. Other functions (including extract_internal) are not called.

ok then you will run into the same old problem right?
first time

extract_graph_helper -> optimized_mod

in this case you do the compile, but you do not cache the input_shape_mappings

when optimized_mod is called the first tiem you will need to call extract_graph_helper again which is wasteful.

you should just do the caching(input_shape_mappings[arg_input_shapes] =) inside the extract_graph_helper

let me fix this too..

JackCaoG · 2024-07-19T22:22:20Z

+    dynamo_extract_graph_helper_metric_count = metrics.counter_value(
+        'DynamoExtractCompiledGraph')


will run_node call extract_compiled_graph too?

It's hard to see from documentations. However, when I try comparing metrics before/after run_node, from what I can see, it's not calling extract_compiled_graph.

ok then I am confused what this dynamo_extract_graph_helper_metric_count is doing here

This code (run_node) is executed when we're fetching the fallback ops. And in this code below, we clear our metric counters via metrics.clear_counters(). So we need a way to restore this counter, so we can verify extract_compiled_graph only gets called once in our unit tests.

Ah I see, I can fix it later. I think the right thing to do is to define a region where counter does not incremented.

JackCaoG · 2024-07-19T22:23:06Z

@ysiraichi FYI

wonjoo-wj · 2024-07-21T00:36:23Z

The PR should be in a reasonable state, now just seeing 2 failures the GPU tests requiring torch CUDA tests:

#1: DynamoInferenceBasicTest.test_dynamic_shape_resnet180 (True):
Input tensor is not an XLA tensor: CUDAFloatType

#2: DynamoInferenceBasicTest.test_resnet180 (True)
  File "/__w/xla/xla/pytorch/xla/test/dynamo/test_dynamo.py", line 370, in test_resnet18
    self.assertEqual(met.metric_data('CompileTime')[0], 1)
TypeError: 'NoneType' object is not subscriptable

For the first error, the stack trace points to:

xla/torch_xla/core/dynamo_bridge.py

Lines 292 to 295 in 4ba63ff

    
           pytree.tree_map_only( 
        
               torch.Tensor, 
        
               lambda xla_arg: torch_xla._XLAC._xla_get_tensor_id(xla_arg), 
        
               xla_args))

It seems like we may want to do an additional isinstance(arg, torch.Tensor) check here.

JackCaoG · 2024-07-22T17:20:09Z

I will pick this up and try to fix error today

JackCaoG · 2024-07-22T23:21:06Z

@alanwaketan There are a few places I want to fix but maybe we should just merge this pr to unblock Woosuk now. I am also running some benchmarks

alanwaketan

Approved to unblock.

Co-authored-by: JackCaoG <jackcao@google.com>

wonjoo-wj force-pushed the wonjoo/dynamo-dynamic-shape branch from 630bba3 to 0ef3768 Compare July 13, 2024 00:24

wonjoo-wj changed the title ~~[WIP] Add support for dynamic shape in dynamo~~ Add support for dynamic shape in dynamo Jul 15, 2024

JackCaoG reviewed Jul 15, 2024

View reviewed changes

Comment thread test/dynamo/test_dynamo.py Outdated

wonjoo-wj marked this pull request as ready for review July 15, 2024 21:23

wonjoo-wj requested a review from JackCaoG July 16, 2024 17:07

JackCaoG reviewed Jul 16, 2024

View reviewed changes

Comment thread test/dynamo/test_dynamo.py Outdated

JackCaoG reviewed Jul 16, 2024

View reviewed changes

Comment thread torch_xla/core/dynamo_bridge.py Outdated

JackCaoG reviewed Jul 16, 2024

View reviewed changes

Comment thread torch_xla/core/dynamo_bridge.py Outdated

wonjoo-wj mentioned this pull request Jul 17, 2024

Support AOT Autograd level Caching pytorch/pytorch#125958

Open

miladm assigned wonjoo-wj Jul 17, 2024

miladm added the dynamism Dynamic Shape Features label Jul 17, 2024

wonjoo-wj commented Jul 18, 2024

View reviewed changes

JackCaoG reviewed Jul 18, 2024

View reviewed changes

Comment thread torch_xla/core/dynamo_bridge.py

wonjoo-wj force-pushed the wonjoo/dynamo-dynamic-shape branch from e8334d4 to fcd08bb Compare July 19, 2024 03:39

wonjoo-wj added 8 commits July 19, 2024 21:52

Add support for dynamic shape in dynamo

36b7e69

Add test and clean up some comments

0175117

Remove some more debug code

63be2ef

Update tests

75de8d5

Introduce flag and update tests

f266986

Update way to get dynamic_shape flag

14947bd

Fix regressing unit tests

e4b52d5

Clean up some code

8b8897c

wonjoo-wj force-pushed the wonjoo/dynamo-dynamic-shape branch from 858359e to 8b8897c Compare July 19, 2024 21:52

JackCaoG reviewed Jul 19, 2024

View reviewed changes

Comment thread torch_xla/core/dynamo_bridge.py

JackCaoG reviewed Jul 19, 2024

View reviewed changes

Address comments

cdaefe4

JackCaoG added the tpuci label Jul 22, 2024

JackCaoG added 3 commits July 22, 2024 20:50

refactor and add new tests

bf0e640

add new tests

5b8b67f

add test that check mixing static and dynamic

256c4ac

JackCaoG requested a review from alanwaketan July 22, 2024 23:20

alanwaketan approved these changes Jul 23, 2024

View reviewed changes

JackCaoG merged commit 2b6b461 into master Jul 23, 2024

JackCaoG deleted the wonjoo/dynamo-dynamic-shape branch July 23, 2024 01:18

yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Oct 11, 2024

Add support for dynamic shape in dynamo (pytorch#7676)

3704c3b

Co-authored-by: JackCaoG <jackcao@google.com>

yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Dec 11, 2024

Add support for dynamic shape in dynamo (pytorch#7676)

dfed484

Co-authored-by: JackCaoG <jackcao@google.com>

yitongh pushed a commit to AlibabaPAI/xla that referenced this pull request Dec 11, 2024

Add support for dynamic shape in dynamo (pytorch#7676)

a60bf24

Co-authored-by: JackCaoG <jackcao@google.com>

		# self.assertTrue(
		# torch.allclose(output_cpu_new_shape, output_new_shape.cpu(), rtol=1e-05, atol=1e-05))

		dynamo_extract_graph_helper_metric_count = metrics.counter_value(
		'DynamoExtractCompiledGraph')

Conversation

wonjoo-wj commented Jul 12, 2024

Uh oh!

wonjoo-wj commented Jul 12, 2024

Uh oh!

Uh oh!

JackCaoG commented Jul 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG commented Jul 19, 2024

Uh oh!

wonjoo-wj commented Jul 21, 2024

Uh oh!

JackCaoG commented Jul 22, 2024

Uh oh!

JackCaoG commented Jul 22, 2024

Uh oh!

alanwaketan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants