[GSoC] Adding optimization features not related with my GSoC project by sidorov-ks · Pull Request #1070 · mlpack/mlpack

sidorov-ks · 2017-07-21T16:50:39Z

This PR is part of my GSoC project "Augmented RNNs".

Implemented:

CrossEntropyLayer for evaluating the performance of the model on binary vector targets.
Initial version of the gradient clipping (albeit very dirty, as @zoq and @rcurtin have already mentioned in [GSoC] Augmented RNN models - benchmarking framework #1005)

As far as I understand, the conversation related to these two points (including but not limited to the reusable update API for gradient clipping) is transferred here.

…d RNNs GSoC project

zoq · 2017-07-23T13:43:57Z

src/mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp

+   */
+  GradientClipping(const double minGradient,
+                   const double maxGradient,
+                   UpdatePolicy updatePolicy) :


Should we pass the update policy by reference? Also, I think we should add:

//! Get the update policy. UpdatePolicyType& UpdatePolicy() const { return updatePolicy; } //! Modify the update policy. UpdatePolicyType& UpdatePolicy() { return updatePolicy; }

to modify the wrapped policy.

Likewise, it looks reasonable to add MinGradient and MaxGradient methods (done in the last commit)

zoq · 2017-07-23T13:48:11Z

src/mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp

+              const arma::mat& gradient)
+  {
+    // First, clip the gradient.
+    gradient.transform(


Haven't tested it, but I guess using transform is faster as arma::clamp.

Didn't know about clamp, implementing. By the way, it also resolves the issue with const reference.

Speaking about performance, they should have the comparable performance, as they're both element-wise operations (if transform is parallelized, which I think is true)

Benchmarked that one. Here's what I've got: https://gist.github.com/partobs-mdp/02b3bb93be0496ad6866528359d5ba3d

Thanks for taking a look into the issue; looks like transform is slightly faster, but I think it's negligible.

zoq · 2017-07-23T13:55:53Z

src/mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp

+   */
+  void Update(arma::mat& iterate,
+              const double stepSize,
+              const arma::mat& gradient)


gradient.transformshould fail, if we pass the gradient is const.

Another option is to relax the const restriction and allow the UpdatePolicy to modify the gradient when Update() is called. I don't think it would be a problem to do that, as long as we document that the UpdatePolicy is allowed to do so. And it would avoid the copy too. :)

Agreed, that is another good option, @partobs-mdp what do you think?

Sure, but should we do this? As we have seen, the clamp performance is not much of an issue, and (imho) we shouldn't break the natural assumption that the update policy doesn't break the variable which stores the gradient.

Right, the performance difference is insignificant, we could use arma::clamp here, no need to change it if you don't think that's a good idea. We just wanted to point out there is another solution.

zoq · 2017-07-23T13:57:21Z

src/mlpack/tests/sgd_test.cpp

+  SGDTestFunction f;
+  VanillaUpdate vanillaUpdate;
+  GradientClipping<VanillaUpdate> update(-3., +3., vanillaUpdate);
+  StandardSGD s(0.0003, 5000000, 1e-9, true);


We should use SGD<GradientClipping<VanillaUpdate> > s(..., update); here, right now we use the standard update policy, without gradient clipping.

zoq · 2017-07-23T13:59:18Z

src/mlpack/tests/sgd_test.cpp

+{
+  SGDTestFunction f;
+  VanillaUpdate vanillaUpdate;
+  GradientClipping<VanillaUpdate> update(-3., +3., vanillaUpdate);


Pedantic style issue; I think using 3.0 instead of 3. would slightly improve the readability.

zoq · 2017-07-23T14:03:05Z

src/mlpack/tests/sgd_test.cpp

@@ -39,6 +41,23 @@ BOOST_AUTO_TEST_CASE(SimpleSGDTestFunction)
  BOOST_REQUIRE_SMALL(coordinates[2], 1e-7);
 }



Might be a good idea to test the policy independently from the optimizer, we could simply test if the gradient is clipped correctly.

I also thought about this, but how could we extract the gradient history from SGD optimizer object?

zoq · 2017-07-23T14:04:44Z

src/mlpack/methods/ann/layer/cross_entropy_error_impl.hpp

+    const arma::Mat<eT>&& target,
+    arma::Mat<eT>&& output)
+{
+  output = (1. - target) / (1. - input + 1e-2) - target / (input + 1e-2);


1e-2 is somewhat hight, I would use something like 1e-10.

Introduced the eps parameter for handling this kind of trade-off without breaking the code.

zoq · 2017-07-23T14:06:35Z

src/mlpack/methods/ann/layer/cross_entropy_error.hpp

+    typename InputDataType = arma::mat,
+    typename OutputDataType = arma::mat
+>
+class CrossEntropyError


Can we add a simple test for the cross entropy error function?

zoq · 2017-07-24T14:54:54Z

src/mlpack/tests/gradient_clipping_test.cpp

+  arma::mat coordinates = arma::zeros(3, 3);
+  // Setting step = 1 to make math easy.
+  double stepSize = 1.0;
+  arma::mat dummyGradient = { {-6, +6, 0}, {1, 2, 3}, {-3, 0, +4} };


To make the windows build happy you could use:

mat input; input << 1 << 2 << 3 << endr << 4 << 5 << 6 << endr << 7 << 8 << 9 << ...;

or

mat input("1 2 3; \ 4 5 6 \ ...");

rcurtin

Looks good to me so far, just some minor comments to address from my end. Overall I think the design is fine, but we should definitely add a test for the CrossEntropyLayer like Marcus suggested.

rcurtin · 2017-07-25T03:49:03Z

src/mlpack/core/optimizers/sgd/sgd.hpp

+   * @param minGradient Minimum gradient value
+   *                    (affects optimization iff clipGradient flag is on).
+   * @param maxGradient Maximum gradient value
+   *                    (affects optimization iff clipGradient flag is on).


I think the last three comments here are not needed anymore. :)

rcurtin · 2017-07-25T03:51:17Z

src/mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp

+   */
+  void Update(arma::mat& iterate,
+              const double stepSize,
+              const arma::mat& gradient)


Another option is to relax the const restriction and allow the UpdatePolicy to modify the gradient when Update() is called. I don't think it would be a problem to do that, as long as we document that the UpdatePolicy is allowed to do so. And it would avoid the copy too. :)

rcurtin · 2017-07-25T03:52:13Z

src/mlpack/methods/ann/layer/cross_entropy_error.hpp

+  double& Eps() { return eps; }
+
+  /**
+   * Serialize the layer


Very pedantic comment: can you add a period at the end of the sentence? :)

rcurtin · 2017-07-25T04:09:02Z

src/mlpack/methods/ann/layer/cross_entropy_error_impl.hpp

+    const arma::Mat<eT>&& input, const arma::Mat<eT>&& target)
+{
+  return -arma::accu(target % arma::log(input + eps) +
+                     (1. - target) % arma::log(1. - input + eps));


It's a little late so I'm not sure I'm thinking 100% clearly, but this appears that it will work in the multiclass setting as long as the input and target matrices are one-hot encoded. So labels like [0 2 1] will not work but labels like [[1 0 0] [0 0 1] [0 1 0]] will. Correct me if I am wrong. (i.e., I think this is right and works the way I would expect it to.)

Well, it won't work as it stands, but the computations in the case you've mentioned would be easier: -arma::accu(target % arma::log(input + eps)) (the formula get easier due to data representation redundancy)

Fair enough; in this case, can you clarify the limitation on how the labels should be in the documentation for the class? We should definitely support multiclass cross-entropy at some point, so if you don't want to do that here that's ok, but in that case could I ask you to open a new issue for it, detailing basically what needs to be done and where someone could look to get started with it?

…ropy performance function

zoq · 2017-07-25T12:48:48Z

src/mlpack/core/optimizers/sgd/sgd.hpp

+
  //! Get the update policy.
-  const UpdatePolicyType& UpdatePolicy() const { return updatePolicy; }
+  UpdatePolicyType UpdatePolicy() const { return updatePolicy; }


We should return a reference instead of a copy here.

zoq · 2017-07-25T12:50:09Z

src/mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp

+   */
+  void Update(arma::mat& iterate,
+              const double stepSize,
+              const arma::mat& gradient)


Agreed, that is another good option, @partobs-mdp what do you think?

zoq · 2017-07-25T12:51:36Z

src/mlpack/methods/ann/layer/cross_entropy_error_impl.hpp

+    Archive& /* ar */,
+    const unsigned int /* version */)
+{
+  // Nothing to do here.


We should serialize eps here, since it's now a parameter.

zoq · 2017-07-25T12:57:30Z

src/mlpack/methods/ann/layer/cross_entropy_error_impl.hpp

+    const arma::Mat<eT>&& target,
+    arma::Mat<eT>&& output)
+{
+  output = (1. - target) / (1. - input + eps) - target / (input + eps);


Not sure, if the compile would optimize that away, but we could rewrite the expression as:

output = (target - input) / ((x - 1) % x); and save one extra division and the eps addition.

Checked it on a piece of paper and got the precise expression we should get after simplifying the original expression so that it would still be the gradient of the loss function with log(x + eps):

output = (input - target + eps * (1. - 2 * target)) / ((1. - input + eps) % (input + eps))

This one, however, doesn't really optimize much (or, at least, I think so), because even though it runs only one (element-wise) division, it runs three multiplications, which are also slow (as compared with additions - that's why I didn't count them).

Right, I guess the only benefit I see is that we could avoid adding eps since: ((x - 1) % x) should be stable. Anyway, I don't mind to leave it as it is.

sidorov-ks · 2017-07-25T13:45:41Z

What's wrong with the Internet connection on Travis CI? It didn't even manage to install boost from apt :'(

zoq · 2017-07-25T14:25:00Z

Let me restart the build.

sidorov-ks · 2017-07-26T04:29:54Z

Is there anything else that should be done on this PR on my side?

zoq

Looks ready for me, I'll wait 3 days for the merge, in case anyone has any more comments.

zoq · 2017-07-26T11:39:20Z

src/mlpack/tests/gradient_clipping_test.cpp

+  // the gradient from the momentum, which gives 2 * gradient value
+  // for the momentum on that step. Adding that to the gradient which
+  // was subtracted earlier yiels the 3 * gradient in the following check.
+  BOOST_REQUIRE_SMALL(arma::abs(coordinates - 3 * targetCoordinates).max(),


Really pedantic style issue; I would probably write:

BOOST_REQUIRE_SMALL( arma::abs(coordinates - targetCoordinates).max(), 1e-7);

rcurtin · 2017-07-27T22:31:47Z

src/mlpack/methods/ann/layer/cross_entropy_error.hpp

+/**
+ * The cross-entropy performance function measures the network's
+ * performance according to the cross-entropy
+ * between the input and target distributions.


I think you can reduce the number of lines here, it seems like they are nowhere near as long as 80 characters. :) (I think this also applies elsewhere)

rcurtin · 2017-07-27T22:34:09Z

src/mlpack/tests/sgd_test.cpp

 #include <mlpack/core.hpp>
 #include <mlpack/core/optimizers/sgd/sgd.hpp>
+#include <mlpack/core/optimizers/sgd/update_policies/gradient_clipping.hpp>
+#include <mlpack/core/optimizers/sgd/update_policies/vanilla_update.hpp>


Is there a need to add these here if they aren't being used by the tests?

rcurtin

Code looks good, nothing more from my side. Thanks for splitting this out from the other PR so we can merge it more quickly. :)

zoq · 2017-07-29T10:50:52Z

Thanks for the contributions!

sidorov-ks added 5 commits July 21, 2017 19:46

Adding all optimization features not directly related to the Augmente…

baca5db

…d RNNs GSoC project

Implemented the GradientClipping API

b318bcc

Added unit test for gradient clipping

8b216fb

Added GradientClipping documentation + more unit tests

b0588a8

Fixed cppcheck issues + removing legacy code

336855c

zoq mentioned this pull request Jul 23, 2017

Implementation of async one step q-learning #1064

Merged

zoq reviewed Jul 23, 2017

View reviewed changes

Added proper unit tests for gradient clipping + some other minor fixes

ed7a486

zoq reviewed Jul 24, 2017

View reviewed changes

rcurtin reviewed Jul 25, 2017

View reviewed changes

sidorov-ks added 2 commits July 25, 2017 10:46

Trying to make Windows build happy

e62e0a8

Fixing issues from @rcurtin's review + adding unit test for cross ent…

b81ca59

…ropy performance function

zoq reviewed Jul 25, 2017

View reviewed changes

Fixing issues from @zoq's review

3db7905

zoq approved these changes Jul 26, 2017

View reviewed changes

rcurtin reviewed Jul 27, 2017

View reviewed changes

rcurtin approved these changes Jul 27, 2017

View reviewed changes

Fixed minor style issues

e310c98

zoq merged commit 5c68061 into mlpack:master Jul 29, 2017

iamshnoo mentioned this pull request Jun 6, 2020

Some of the loss functions probably don't work correctly #2444

Closed

8 tasks

		@@ -39,6 +41,23 @@ BOOST_AUTO_TEST_CASE(SimpleSGDTestFunction)
		BOOST_REQUIRE_SMALL(coordinates[2], 1e-7);
		}

Uh oh!

Conversation

sidorov-ks commented Jul 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zoq Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sidorov-ks commented Jul 25, 2017

Uh oh!

zoq commented Jul 25, 2017

Uh oh!

sidorov-ks commented Jul 26, 2017

Uh oh!

zoq left a comment

zoq Jul 24, 2017 •

edited

Loading