delete Float8DynamicLinear by vkuzo · Pull Request #304 · meta-pytorch/float8_experimental

vkuzo · 2024-07-03T19:19:21Z

Stack from ghstack (oldest at bottom):

-> delete Float8DynamicLinear #304

Summary:

We are standardizing on Float8Linear as the only float8 linear object:

the stack ending with
[9/x]: make dynamic scaling default in Float8Linear #300 moved
all of the functionality of Float8DynamicLinear to Float8Linear.
The default settings of Float8Linear are to use dynamic scaling.
this PR deletes Float8DynamicLinear from the codebase and patches
the relevant callsites in fbsource.

Test Plan:

// all tests pass
./test_everything.sh

// also run all benchmarks and verify correctness

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D59342767

Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: We are standardizing on `Float8Linear` as the only float8 linear object: 1. the stack ending with #300 moved all of the functionality of `Float8DynamicLinear` to `Float8Linear`. The default settings of `Float8Linear` are to use dynamic scaling. 2. this PR deletes `Float8DynamicLinear` from the codebase and patches the relevant callsites in fbsource. Test Plan: ``` // all tests pass ./test_everything.sh // also run all benchmarks and verify correctness ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8ab4833 Pull Request resolved: #304

vkuzo · 2024-07-03T19:22:21Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

drisspg · 2024-07-03T19:24:28Z

benchmarks/bench_multi_gpu.py

 import torch.nn as nn
 import torch.utils.benchmark as benchmark
-from float8_experimental.float8_linear import Float8Linear
+from float8_experimental.float8_linear import Float8Linear, TensorScalingType


How useful is this benchmark in general?

I haven't used it recently

float8_experimental/float8_dynamic_utils.py

drisspg · 2024-07-03T19:26:53Z

float8_experimental/float8_linear.py

-        # example: "x:del,w:del,dldy:dyn"
-        return f"x:{self.scaling_type_x.short_str()},w:{self.scaling_type_w.short_str()},dldy:{self.scaling_type_dL_dY.short_str()}"
+        # example: "x_del_w_del_dldy_dyn"
+        return f"x_{self.scaling_type_x.short_str()}_w_{self.scaling_type_w.short_str()}_dldy_{self.scaling_type_dL_dY.short_str()}"


Why the change out of curiosity? I think the prior version might be a little more readable

I should have reverted this. Will follow-up in a future PR if that's ok, to make landing this PR easier.

drisspg · 2024-07-03T19:32:26Z

test/test_compile.py


-    m_fp8 = get_float8_linear(
-        linear_type, m_ref, emulate, scaling_type_x, scaling_type_w, scaling_type_dL_dY
+    m_fp8 = Float8Linear.from_float(


calling 'swap_..' on nn.Linear module returns a model out of place. I think its fine either way

I agree, we can make the tests use that if we want in a future PR.

drisspg · 2024-07-03T19:33:43Z

test/test_dtensor.py

-            "scaling_type_dL_dY": TensorScalingType.DYNAMIC,
-        }
+    # For now, just use Float8Linear with dynamic scaling, which is the
+    # same behavior as Float8Linear.


Float8Dynamic ? But also its probably to to just say, only supports dynamic scaling for all 3 tensors, x, w, dl_dY

agreed, let me fix in a future PR to speed up landing this, since this is a minor point.

drisspg · 2024-07-03T19:34:29Z

test/test_fsdp2/test_fsdp2_common.py

                    param.grad.div_(dist.get_world_size())
-            if module_cls is Float8Linear:
-                sync_float8_amax_and_scale_history(model)
+            # TODO(future): add amax syncing once delayed scaling is supported


was this just an unused code path?

drisspg · 2024-07-03T19:35:12Z

test/test_fsdp2/test_fsdp2_eager.py

-            return swap_linear_with_float8_linear(module, Float8Linear, **kwargs)
-        else:
-            return swap_linear_with_float8_linear(module, Float8DynamicLinear, **kwargs)
+    def swap_linear_with_dynamic(self, module: nn.Module, **kwargs: Any) -> nn.Module:


can we just remove this since this is the default?

agreed in principle, but ideally that would be a separate PR since it's only tangentially related

drisspg

Burn it with fire!🔥

facebook-github-bot · 2024-07-05T18:04:18Z

This pull request has been merged in 8e9623a.

Summary: Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 49448ec Pull Request resolved: #308

Summary: Pull Request resolved: #308 Addressing a couple of nits that slipped in #304 * more defaults to dynamic * undo repr change * fix comment Reviewed By: drisspg Differential Revision: D59521233 fbshipit-source-id: 5f69855cc2d19c6057a230b0963185c4396dcd99

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024

vkuzo requested review from bdhirsh, drisspg and weifengpy July 3, 2024 19:20

drisspg reviewed Jul 3, 2024

View reviewed changes

float8_experimental/float8_dynamic_utils.py Show resolved Hide resolved

drisspg reviewed Jul 3, 2024

View reviewed changes

drisspg approved these changes Jul 3, 2024

View reviewed changes

facebook-github-bot closed this in 8e9623a Jul 5, 2024

facebook-github-bot added the Merged label Jul 5, 2024

vkuzo mentioned this pull request Jul 8, 2024

fix nits from deletion of Float8DynamicLinear #308

Closed

Conversation

vkuzo commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Jul 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vkuzo commented Jul 3, 2024 •

edited

Loading