[Training] Adam Optimizer by wschin · Pull Request #1970 · onnx/onnx

wschin · 2019-04-27T21:57:53Z

PR #2314 is a single place for reviewing the whole training story.

A common signature shared by Pytroch's and TF's Adam.

Design verification script (it shows that the proposed Adam signature covers both of TF and Pytorch):

import itertools
import numpy as np
import torch
from torch import optim
from torch import nn
import torch.random as rnd

iteration_count = 3
n = 10 # number of features
l = 5 # number of data points (aka batch size)

X_ = torch.randn(l, n)
Y_ = torch.randn(l, 1)

def apply_adam(t, r, x, g, v, h, norm_coefficient, alpha, beta, epsilon):  # type: ignore
    # Add gradient of regularization term.
    g_regularized = norm_coefficient * x + g
    # Update momentum.
    v_new = alpha * v + (1 - alpha) * g_regularized
    # Update second-order momentum.
    h_new = beta * h + (1 - beta) * (g_regularized * g_regularized)
    # Compute element-wise square root.
    h_sqrt = np.sqrt(h_new) + epsilon
    # Adjust learning rate.
    t = t + 1
    r_adjusted = r * np.sqrt(1 - beta**t) / (1 - alpha**t)
    # Apply Adam update rule.
    x_new = x - r_adjusted * (v_new / h_sqrt)
    return x_new, v_new, h_new

def show_pytorch(lr, alpha, beta, weight_decay, epsilon):
    X = X_.clone()
    Y = Y_.clone()

    rnd.manual_seed(0)
    model = nn.Sequential(
        nn.Linear(n, 1, bias=False)
    )

    loss_fn = nn.MSELoss(reduction='sum')

    solver = optim.Adam(model.parameters(), lr=lr, betas=[alpha, beta],
            weight_decay=weight_decay, eps=epsilon)

    results = []
    for t in range(iteration_count):
        Y_pred = model(X)
        loss = loss_fn(Y_pred, Y)
        results.append(float(loss))
        model.zero_grad()
        loss.backward()
        solver.step()
    return results

def show_tensorflow(lr, alpha, beta, epsilon):
    import tensorflow as tf
    rnd.manual_seed(0)
    layer = nn.Linear(n, 1, bias=False)

    X = tf.placeholder('float', shape=[l, n])
    W = tf.Variable(torch.Tensor.numpy(layer.weight.detach()))
    Y = tf.placeholder('float', shape=[l, 1])
    Y_pred = tf.matmul(X, W, transpose_b=True)
    loss = tf.reduce_sum(tf.square(Y - Y_pred))
    optimizer = tf.train.AdamOptimizer(learning_rate=lr, beta1=alpha, beta2=beta, epsilon=epsilon)
    minimizer = optimizer.minimize(loss)

    sess = tf.Session()
    init = tf.global_variables_initializer()
    sess.run(init)

    results = []
    for t in range(iteration_count):
        loss_value = sess.run(loss, {X: X_, Y: Y_})
        results.append(float(loss_value ))
        sess.run([minimizer], {X: X_, Y: Y_})

    return results

def show_mine(lr, alpha, beta, weight_decay, epsilon):
    X = X_.clone()
    Y = Y_.clone()

    rnd.manual_seed(0)
    model = nn.Sequential(
        nn.Linear(n, 1, bias=False)
    )

    loss_fn = nn.MSELoss(reduction='sum')

    solver = optim.Adam(model.parameters(), lr=lr, betas=[alpha, beta],
            weight_decay=weight_decay, eps=epsilon)

    results = []
    for t in range(iteration_count):
        Y_pred = model(X)
        loss = loss_fn(Y_pred, Y)
        results.append(float(loss))
        model.zero_grad()
        loss.backward()

        with torch.no_grad():
            for param in model.parameters():
                if 'exp_avg' not in solver.state[param]:
                    solver.state[param]['exp_avg'] = torch.zeros_like(param)
                if 'exp_avg_sq' not in solver.state[param]:
                    solver.state[param]['exp_avg_sq'] = torch.zeros_like(param)

                new_tensor, new_v, new_h = apply_adam(t=t,
                        r=lr, x=param.data, g=param.grad.data,
                        v=solver.state[param]['exp_avg'].data,
                        h=solver.state[param]['exp_avg_sq'].data,
                        norm_coefficient=weight_decay,
                        alpha=alpha, beta=beta, epsilon=epsilon)

                solver.state[param]['exp_avg'].data = new_v.data
                solver.state[param]['exp_avg_sq'].data = new_h.data
                param.data = new_tensor.data

    return results


# Compare TF Adam and its ONNX counterpart.
lr_pool = [0.1, 0.001]
alpha_pool = [0.89, 0.8, 0.01]
beta_pool = [0.99]
weight_decay_pool = [0]
epsilon_pool = [1e-7, 1e-1]
for lr_, alpha_, beta_, weight_decay_, epsilon_ in itertools.product(lr_pool, alpha_pool, beta_pool, weight_decay_pool, epsilon_pool):
    tf_result = show_tensorflow(lr=lr_, alpha=alpha_, beta=beta_, epsilon=epsilon_)
    mine_result = show_mine(lr=lr_, alpha=alpha_, beta=beta_, weight_decay=weight_decay_, epsilon=epsilon_)
    assert np.allclose(tf_result, mine_result)

# Compare Pytorch Adam and its ONNX counterpart.
lr_pool = [0.1, 0.001]
alpha_pool = [0.89, 0.8, 0.01]
beta_pool = [0.99, 0.2, 0.5]
weight_decay_pool = [0, 0.01, 0.1, 1]
epsilon_pool = [1e-7, 1e-1]
for lr_, alpha_, beta_, weight_decay_, epsilon_ in itertools.product(lr_pool, alpha_pool, beta_pool, weight_decay_pool, epsilon_pool):
    torch_result = show_pytorch(lr=lr_, alpha=alpha_, beta=beta_, weight_decay=weight_decay_, epsilon=epsilon_)
    mine_result = show_mine(lr=lr_, alpha=alpha_, beta=beta_, weight_decay=weight_decay_, epsilon=epsilon_)
    assert np.allclose(torch_result, mine_result)

Update docs

CLAassistant · 2019-07-24T00:57:04Z

All committers have signed the CLA.

postrational

Looks good, but some artifacts made it in. Please clean up the artifacts so the PR can be approved.

sveta-levitan · 2020-03-03T18:37:36Z

@wschin Wei-Sheng, please respond to Michal's comments above, plus his comments on Gitter in Operators chat room. Thank you!

chinhuang007 · 2020-03-05T18:10:51Z

@wschin Can you please take a look and move this forward? During the TSC meeting today, the members suggest to have at least one optimizer PR merged for release 1.7. Please let us know if more time is needed.

wschin · 2020-03-06T19:04:11Z

@sveta-levitan , @postrational , @chinhuang007, PR is update-to-date again. Please take a look. Thank you.

* Adam draft Update docs * Empty shape inference * Add shape inference * Add tests * Fix a test * Sync with recent master changes * Sync with master * Make flake8 happy * Sync docs * sync doc * Revert side-effect changes * sync with master * Add 1 back * Polish doc * Finish the design of Adam spec * formatting

Adam draft

f90de28

Update docs

wschin requested a review from a team as a code owner April 27, 2019 21:57

wschin mentioned this pull request May 21, 2019

Summary of Training Story in ONNX #2038

Closed

wschin added 3 commits May 27, 2019 22:05

Empty shape inference

98ca041

Add shape inference

54800a7

Merge branch 'master' into adam

3a8c506

wschin requested a review from a team as a code owner June 4, 2019 22:16

Add tests

617e841

wschin changed the title ~~[WIP] Adam Optimizer~~ [Training] Adam Optimizer Jun 5, 2019

Fix a test

8871144

prasanthpul added this to the 1.7 milestone Aug 20, 2019

postrational added topic: operator Issues related to ONNX operators topic: training Issues related to ONNX training labels Aug 23, 2019

Merge branch 'master' into adam

efb4d44

wschin mentioned this pull request Sep 16, 2019

Training Proposal: Spec Changes and Gradient Operator #2314

Merged

Merge branch 'master' into adam

118c014

chinhuang007 reviewed Feb 18, 2020

View reviewed changes

Comment thread onnx/defs/operator_sets.h Outdated

wschin and others added 9 commits February 18, 2020 22:37

Sync with recent master changes

cd630d6

Merge remote-tracking branch 'origin/master' into adam

6126fd4

Sync with master

2d28250

Make flake8 happy

b8bddd3

Merge branch 'master' into adam

349d8cf

Merge branch 'master' into adam

a29f85a

Sync docs

17228d3

Merge remote-tracking branch 'origin/master' into adam

f42858a

sync doc

f6258c5

wschin removed this from the 1.7 milestone Feb 26, 2020

sync doc

29f9104

postrational reviewed Mar 2, 2020

View reviewed changes

Comment thread onnx/backend/test/data/node/test_add/model.onnx Outdated

Comment thread onnx/backend/test/data/node/test_add_bcast/model.onnx

wschin added 3 commits March 6, 2020 10:39

Revert side-effect changes

ddd9da7

Merge branch 'master' into adam

96eb42c

sync with master

f79615f

chinhuang007 reviewed Mar 6, 2020

View reviewed changes

Comment thread docs/Changelog.md

chinhuang007 reviewed Mar 6, 2020

View reviewed changes

Comment thread onnx/defs/tensor/defs.cc

wschin added 2 commits March 6, 2020 12:22

Add 1 back

a7aeb68

Merge branch 'master' into adam

23f259c

sveta-levitan approved these changes Mar 10, 2020

View reviewed changes

Polish doc

4f2e0c6

ebarsoum approved these changes Mar 11, 2020

View reviewed changes

wschin added 5 commits April 7, 2020 10:14

Merge branch 'master' into adam

b7c22c4

Finish the design of Adam spec

3026bb2

Merge branch 'master' into adam

760b7df

formatting

0fbc258

Merge branch 'master' into adam

ec12665

wschin merged commit f89f387 into onnx:master Apr 13, 2020

chinhuang007 added this to the 1.7 milestone May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] Adam Optimizer#1970

[Training] Adam Optimizer#1970
wschin merged 29 commits intoonnx:masterfrom
wschin:adam

wschin commented Apr 27, 2019 •

edited

Loading

Uh oh!

CLAassistant commented Jul 24, 2019 •

edited

Loading

Uh oh!

Uh oh!

postrational left a comment

Uh oh!

Uh oh!

Uh oh!

sveta-levitan commented Mar 3, 2020 •

edited

Loading

Uh oh!

chinhuang007 commented Mar 5, 2020

Uh oh!

wschin commented Mar 6, 2020

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

wschin commented Apr 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

postrational left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sveta-levitan commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chinhuang007 commented Mar 5, 2020

Uh oh!

wschin commented Mar 6, 2020

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wschin commented Apr 27, 2019 •

edited

Loading

CLAassistant commented Jul 24, 2019 •

edited

Loading

sveta-levitan commented Mar 3, 2020 •

edited

Loading