[1/x]: Make Float8Linear support dynamic scaling by vkuzo · Pull Request #290 · meta-pytorch/float8_experimental

vkuzo · 2024-06-28T19:37:42Z

Stack from ghstack (oldest at bottom):

Summary:

At a high level, we need to make dynamic vs delayed scaling configurable
separately for activations, weights and gradients. The way I am
approaching this is as follows:

PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to Float8Linear
PRs 2..n: one by one, add features implemented in Float8DynamicLinear to Float8Linear, as necessary
last PR: delete Float8DynamicLinear

Test Plan:

./test/test_everything.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D59305792

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7020d5e Pull Request resolved: #290

test/test_base.py

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7020d5e Pull Request resolved: #290

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0aeb524 Pull Request resolved: #290

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: bb1bfe5 Pull Request resolved: #290

float8_experimental/float8_linear.py

drisspg · 2024-06-28T20:19:05Z

float8_experimental/float8_linear_utils.py

+):
    """Returns whether the given linear_type requires sync before forward."""
-    return linear_type in REQUIRES_SYNC
+    return linear_type is LinearType.DELAYED and any(


should this be or?

since Float8DynamicLinear does not support TensorScalingType, I think and is right?

you are right 👍

drisspg · 2024-06-28T20:20:52Z

float8_experimental/float8_linear_utils.py

-        emulate=emulate,
-    )
+    if linear_type is LinearType.DYNAMIC:
+        return Float8DynamicLinear.from_float(


should we assert that non of the scaling types are True?

technically yes, I was just lazy and unmotivated since this stack is trying to delete Float8DynamicLinear

drisspg · 2024-06-28T20:23:40Z

float8_experimental/float8_linear_utils.py

-def linear_requires_sync(linear_type: LinearType):
+def linear_requires_sync(
+    linear_type: LinearType,
+    scaling_type_x: TensorScalingType = TensorScalingType.DELAYED,


Nit: we should probably remove the defaults right?

hmm, long term probably yes. I'm taking the approach of tackling the "what's the default" recipe separately and not changing default behavior for now.

drisspg

Seems good to me

Summary: At a high level, we need to make dynamic vs delayed scaling configurable separately for activations, weights and gradients. The way I am approaching this is as follows: * PR 1 (this PR): add basic support for dynamic scaling, configurable by tensor, to `Float8Linear` * PRs 2..n: one by one, add features implemented in `Float8DynamicLinear` to `Float8Linear`, as necessary * last PR: delete `Float8DynamicLinear` Test Plan: ``` ./test/test_everything.sh ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

vkuzo · 2024-07-02T23:58:06Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-03T14:34:00Z

This pull request has been merged in 3cb42e1.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024

vkuzo commented Jun 28, 2024

View reviewed changes

test/test_base.py Show resolved Hide resolved

drisspg reviewed Jun 28, 2024

View reviewed changes

float8_experimental/float8_linear.py Show resolved Hide resolved

drisspg reviewed Jun 28, 2024

View reviewed changes

drisspg approved these changes Jun 28, 2024

View reviewed changes

vkuzo mentioned this pull request Jun 28, 2024

[2/x]: fix numerics integration test and test delayed vs dynamic #291

Closed

facebook-github-bot closed this in 3cb42e1 Jul 3, 2024

facebook-github-bot added the Merged label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/x]: Make Float8Linear support dynamic scaling#290

[1/x]: Make Float8Linear support dynamic scaling#290
vkuzo wants to merge 5 commits intogh/vkuzo/11/basefrom
gh/vkuzo/11/head

vkuzo commented Jun 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

drisspg Jun 28, 2024

Uh oh!

vkuzo Jun 28, 2024

Uh oh!

drisspg Jun 28, 2024

Uh oh!

drisspg Jun 28, 2024

Uh oh!

vkuzo Jun 28, 2024

Uh oh!

drisspg Jun 28, 2024

Uh oh!

vkuzo Jun 28, 2024

Uh oh!

drisspg left a comment

Uh oh!

vkuzo commented Jul 2, 2024

Uh oh!

facebook-github-bot commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vkuzo commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drisspg Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

drisspg Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

drisspg Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

drisspg Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

vkuzo Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Jul 2, 2024

Uh oh!

facebook-github-bot commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vkuzo commented Jun 28, 2024 •

edited

Loading