Skip to content

[GSoC] Augmented RNN models - benchmarking framework#1005

Merged
zoq merged 66 commits intomlpack:masterfrom
sidorov-ks:master
Aug 7, 2017
Merged

[GSoC] Augmented RNN models - benchmarking framework#1005
zoq merged 66 commits intomlpack:masterfrom
sidorov-ks:master

Conversation

@sidorov-ks
Copy link
Copy Markdown
Contributor

This PR is part of my GSoC project "Augmented RNNs".
Imeplemented:

  • class CopyTask for evaluating models on the sequence copy problem, showcasing benchmarking framework;
  • unit test for it (a simple non-ML model that is hardcoded to copy the sequence required number of times is expected to ace the CopyTask).

@mlpack-jenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

}

template<typename ModelType>
double CopyTask::Evaluate(ModelType& model) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see the interface, I see a couple of problems that we have to discuss:

Right now the way we train a model e.g. RNN or LogisticRegression, etc. is we define all necessary model parameters (layer, regularization constant, etc.) and call the Train function. The Train takes the predictor and responses as input followed by an optional optimizer instance where we define the number of iterations/epochs, the threshold to terminate the optimization before we reach the maximum number of iterations, etc..

arma::mat predictor, responses;
Model model(...);

Optimizer optimizer(...);

model.Train(predictor, responses, optimizer);

The Evaluate function somehow replicates the functionality of the optimizer like the number of epochs. I'm not sure we should replicate the behavior since the way we train a model might be different from class to class. Like none of the existing classes can handle arma::field as input. There is no option to specify optimizer parameter, or what if we like to use the upcoming cross-validation feature as part of the evaluation pipeline. So I think instead of providing an evaluation function that takes a model and trains it we should see the Tasks as a generator that generates input/output samples for our model to train on. This enables us to train the model independently from the task. I guess besides the code replication another problem is I can't think of an easy solution to incorporate curriculum learning which might be necessary to get stable results into the class without providing a bunch of parameters for each task. So I propose the following:

CopyTask
{
 public:
  CopyTask(...) {}

  void Sample(arma::mat& predictor, arma::mat& responses)
  {
    // Generate predictor responses pair or pairs with the given parameter.
  }
}

CopyTask copyTask;
copyTask.Sample(predictor, responses);

model.Train(predictor, responses, optimizer);

Let me know what you think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good idea to me, because I've already ran into similar issue when trying to implement SortTask - it was a huge copy-paste from CopyTask, which failed to work nonetheless 😢.
However, I would also propose to create one more header file with evaluation functions (sklearn-ish), like this:

#include <score.hpp>
auto score_fn = scorers::sequence_precision; // number of sequences model has got right.
// As an option:
// auto precision = scorers::bit_precision; 
// = number of dataset bits model has got right.

// ... CopyTask defined as in the previous comment

copyTask.Sample(predictor_train, responses_train);
copyTask.Sample(predictor_test, responses_test);
arma::mat responses_test_pred; // or whatever type is required;

model.Train(predictor_train,  responses_train, optimizer);
model.Predict(predictor_test,  responses_test_pred);
auto score = score_fn(responses_test, responses_test_pred);

IMHO, this will be even more in accord with the concern separation principle.

At the moment, I'm at an internship at our local Web-development company, though, so I hope to try out the ideas in this thread tonight or in the next few days.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an idea but do you think since the score function depends on the task we should integrate the function into the Generator/Task class?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is true, but I am not sure that the score function is the part of the problem. IMHO the task should specify only inputs and their respective outputs. The reason is that there are usually several viable ways to evaluate the distance between predictions and ground truths (e.g., in regression, you can use Euclidean / Manhattan metrics, use L1 / L2 regularizer with different degrees of regularization).

Even here, in the case of CopyTask, there are two plausible ways to evaluate the performance: bit precision and sequence precision.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, there are often multiple ways to calculate the performance, and I think there are situations where you like to use the same functions during training, so I agree writing a new class/classes for the scoring functions is a good idea.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Wow, it was unreasonably buggy coding from my side, but it looks like I've finally got it right :)

So now, if I correctly remember the schedule, all it takes for Week 1 is to implement the data generator for SortTask and AddTask. I hope this won't be ridiculously hard, but it still requires to overload the evaluation methods for supporting sequences of binary numbers - in contrast to sequences of bits.

* @param maxLength Maximum length of sequence that has to be repeated by model.
* @param nRepeats Number of repeates required to solve the task.
*/
CopyTask(int maxLength, int nRepeats);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use const size_t maxLength and const size_t nRepeats here, we mostly use size_t instead of int if support of negative values isn't needed.

* @param labels The variable to store output sequences.
* @param batchSize The dataset size.
*/
void GenerateData(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style issue, please take a look at the Method Declarations sections in the style guide: https://github.com/mlpack/mlpack/wiki/DesignGuidelines#method-declarations.

private:
// Maximum length of a sequence.
int maxLength;
// Nomber of repeats the model has to perform to complete the task.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor spelling issue; Numberinstead of Nomber.

int nRepeats;
};
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for the closing namespace here, to improve the readability?

maxLength(maxLength),
nRepeats(nRepeats)
{
// Just storing task-specific paramters.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor spelling issue: parameters instead of paramters.

) {
input = arma::field<arma::irowvec>(batchSize);
labels = arma::field<arma::irowvec>(batchSize);
for (int i = 0; i < batchSize; ++i) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at the Brace Placement section: https://github.com/mlpack/mlpack/wiki/DesignGuidelines#brace-placement. I wonder why the Style Check doesn't complain about this.

input = arma::field<arma::irowvec>(batchSize);
labels = arma::field<arma::irowvec>(batchSize);
for (int i = 0; i < batchSize; ++i) {
size_t size = maxLength;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove size and just use maxLength, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use random value of size, so that we can feed variable-length data to our models?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, but I'm not sure we should do that inside the Generator class, since the required distribution depends on the user? If someone likes inputs with another length he could just call with another length right? Also, maybe it's a good idea, to use arma::mat instead of arma::field, since this way if someone likes to train only with a static length, he can directly use the output, since the model takes arma::mat as input.

labels = arma::field<arma::irowvec>(batchSize);
for (int i = 0; i < batchSize; ++i) {
size_t size = maxLength;
arma::irowvec item = arma::randi<arma::irowvec>(size, arma::distr_param(0, 1));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid the extra copy here and use input(i) instead of `ìtem``, maybe I missed something?

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 5, 2017

I hope this won't be ridiculously hard, but it still requires to overload the evaluation methods for supporting sequences of binary numbers - in contrast to sequences of bits.

Not sure what evaluation method you mean here, can you explain further?

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Not sure what evaluation method you mean here, can you explain further?

I mean that even though other tasks are still evaluated by sequence precision, AddTask and SortTask require processing/outputting of sequences of numbers (ain't interesting to sort 0s and 1s 😃), in contrast to CopyTask, which requires to copy sequences of 0s and 1s. The current implementation of SequencePrecision takes two field<irowvec>s as arguments --- to support two other tasks, it needs to be able to take two field<imat>s as arguments.

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 5, 2017

Thanks for the clarification, now I get what you mean 👍

In a previous comment you said you are currently at an internship, does the internship end this week or next week or something else? Just wanted to make sure you are able to work full-time on the project.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

In a previous comment you said you are currently at an internship, does the internship end this week or next week or something else? Just wanted to make sure you are able to work full-time on the project.

Yes, both the internship and the entrance exam to Yandex School of Data Analysis have recently wrapped up (exam, the last event so far, was held on Sunday). So from here, I am able to work full-time.

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 5, 2017

Thanks again for the clarification, the course looks interesting: https://yandexdataschool.com/edu-process/program/data-analysis not sure which one you did, but I guess every course is somehow interesting, at least the description looks promising. Hopefully, your exam went well.

Just a reminder, would be great to see the first weekly report today or tomorrow.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Fixed the style issues you've mentioned in the review, now switching to the other two tasks. By the way, what's up with Jenkins? It's being much more severe that it used to be.

Thanks again for the clarification, the course looks interesting: https://yandexdataschool.com/edu-process/program/data-analysis not sure which one you did, but I guess every course is somehow interesting, at least the description looks promising. Hopefully, your exam went well.

That (Data Analysis) is precisely the program I've applied to :) However, I'm also going to take on some courses from Big Data program (e.g., parallel computation course - when you know that LAPACK is supreme to just writing for-loop, but don't know precisely why 😸)

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 6, 2017

The Jenkins Style Checks only checked files in mlpack/core now it also checks files in methods and tests. I fixed almost all issues that are not related to any PR here: #1019, so realistically you should only look at the files you changed and try to fix the issues pointed out by cpplint.

That (Data Analysis) is precisely the program I've applied to :) However, I'm also going to take on some courses from Big Data program (e.g., parallel computation course - when you know that LAPACK is supreme to just writing for-loop, but don't know precisely why 😸)

Right, I think in mlpack's context switching OpenBLAS with MKL would also result in some noticeable performance improvements at least in some situations.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Travis CI has also got crazy - now VanillaNetworkTest from convolutional_network_test.cpp also crashes with getting 50% classification error.

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 6, 2017

Sometimes test fail because they e.g. didn't reach the expected error in the specified number of iterations using the given starting point, for more information take a look at #922

Realistically, you don't have to worry about failing test that are not related with your code.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Now I have a problem I can't easily solve. I need to generate a random N*5 matrix (N 5-bit numbers). However, when I use randi(N, 5, distr_param(0, 1)), it returns a sequence of 00000 and 11111 (in other words, only rows get randomized - not columns).

I also tried setting manually all values to numbers from RNG (the current implementation), but it has the same problem. How to fix it?

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 6, 2017

using:

arma::Mat<size_t> output = arma::randi<arma::Mat<size_t>>(10, 5, arma::distr_param(0, 1));

I get:

        0        0        0        1        0
        0        1        1        1        1
        1        1        1        0        1
        0        0        1        0        1
        1        0        1        1        1
        0        1        1        0        0
        0        1        0        1        1
        1        0        1        1        1
        1        0        0        1        0
        1        0        1        0        0

Isn't that what you expected?

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Isn't that what you expected?

Yes, but this one didn't work either :( Speaking of this, can I use vector<vector<int>> to create imat? So far I can't find any other viable way to fix that, but tomorrow I'll try again to debug it.

Speaking about the weekly report, I think it would be better to end tomorrow third task + fix the mentioned bug and then (once again, tomorrow) write the complete report on Week 1. What do you think?

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 6, 2017

So, do you mean my output is what you like, but it's not working for you? If that's the case, can you reduce the problem to a simple case? Easier to debug the problem.

Sending the weekly report tomorrow, sounds fine for me.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

So, do you mean my output is what you like, but it's not working for you? If that's the case, can you reduce the problem to a simple case? Easier to debug the problem.

Yes, I was trying to achieve your output. I have resolved the problem - it was due to my really stupid bug in debug output producing.

I have also implemented AddTask, so now I have only one question - how to merge the style fixes from master branch to my fork?

After that, I'm planning to write the Week 1 report. Then (according to our schedule) we are switching to some real (but not augmented) models - I hope to implement LSTM baseline and record its scores.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Whoops, my bad, there were only my errors. Trying to fix them.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Finally a clear style-check run :)

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 7, 2017

After that, I'm planning to write the Week 1 report. Then (according to our schedule) we are switching to some real (but not augmented) models - I hope to implement LSTM baseline and record its scores.

Sounds good, I think we could also compare against GRU (once it's merged), without changing a lot of code.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

sidorov-ks commented Jun 7, 2017

AppVeyor is running for a really long time. How can I dequeue my previous commits from here: https://ci.appveyor.com/project/mlpack/mlpack/history

UPDATE They have just been dequeued. Thanks ;)

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 7, 2017

I'm not sure if you are allowed to cancel your job at the moment, you could login into appveyor and see if you do, but in the meantime let me do it for you.

class AddTask
{
public:
AddTask(size_t bitLen);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment each parameter as we did for the other methods? Also, I think it would be good idea, to improve the general task description e.g. we could a simple example how to use the code and show an example output. What do you think?

public:
AddTask(size_t bitLen);

void GenerateData(arma::field<arma::irowvec>& input,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be cleaner to rename the function to Generate?

* @param maxLength Maximum length of sequence that has to be repeated by model.
* @param nRepeats Number of repeates required to solve the task.
*/
CopyTask(size_t maxLength, size_t nRepeats);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the const qualifier here const maxLength and const nRepeats.

* @param labels The variable to store output sequences.
* @param batchSize The dataset size.
*/
void Generate(arma::field<arma::irowvec>& input,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data should be col major, also I was thinking that even if the data is binary, it might be good to use arma::col instead of `àrma::icolhere since the most of the models takearma::mat`` as input, and that way we could avoid a cast. What do you think?

*/
void Generate(arma::field<arma::irowvec>& input,
arma::field<arma::irowvec>& labels,
size_t batchSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use const size_t batchSize here, to be consistent with the rest of the codebase.

// Just storing task-specific parameters.
}

void CopyTask::Generate(arma::field<arma::irowvec>& input,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style issue, can you align the labels and batchSize parameter with the input parameter.

*
* @param input The variable to store input sequences.
* @param labels The variable to store output sequences.
* @param batchSize The dataset size.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps batchSize isn't the best parameter name here? Do you think size could work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to leave the long name, because with size variable would be not as clear which size is meant (sample size? maximum sequence length? size of sequence elements?). The long name, in contrast, eliminates any chance to slip later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds reasonable to me.

{
input = arma::field<arma::irowvec>(batchSize);
labels = arma::field<arma::irowvec>(batchSize);
std::srand(unsigned(std::time(0)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to set a seed here, what if someone likes to generate the same output e.g. to debug some problem? If someone needs different results in a single run he can set math::RandomSeed(std::time(NULL));before the run.

std::srand(unsigned(std::time(0)));
for (size_t i = 0; i < batchSize; ++i) {
// Random uniform length from [2..maxLength]
size_t size = 2 + std::rand() % (maxLength - 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use RandInt from core/math/random.hpp here.

// Random uniform length from [2..maxLength]
size_t size = 2 + std::rand() % (maxLength - 1);
input(i) = arma::randi<arma::irowvec>(size, arma::distr_param(0, 1));
arma::irowvec item_ans = arma::irowvec(nRepeats * size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can use repmat here, to simplify the code?

// In case it hasn't been included yet.
#include "copy.hpp"

#include <cstdlib>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can remove #include <cstdlib>and #include <ctime>.

// Random uniform length from [2..maxLength]
size_t size = RandInt(2, maxLength+1);
input(i) = arma::randi<arma::colvec>(size, arma::distr_param(0, 1));
arma::colvec item_ans = arma::conv_to<arma::colvec>::from(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something, but it looks like we could avoid an extra copy here if we directly use labels(i) for the output of arma::conv_to<arma::colvec>::from(...

arma::field<arma::colvec> predOutputs)
{
double score = 0;
auto testSize = trueOutputs.n_elem;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid auto: I think if people go through the code using auto does not support the general code understanding, since you have to look further to figure out the return values. Also here is a quote from the armadillo page:

Can I use the C++11 auto keyword with Armadillo objects and/or expressions?
Use of C++11 auto is not recommended with Armadillo objects and expressions. Armadillo has a template meta-programming framework which creates lots of short lived temporaries that are not handled by auto.


for (size_t i = 0; i < testSize; i++)
{
auto prediction = trueOutputs.at(i);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we work with trueOutputs instead of a copy? I guess the compiler would optimize that, but would be easy for us to do it.

}
else
{
for (size_t j = 0; j < prediction.n_elem; ++j) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could simplify the expression here by doing something like: arma::accu(output.at(j) == prediction.at(j)) == prediction.n_elem, not sure if the expression necessary faster or slower.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through armadillo's docs and found an even better way: http://arma.sourceforge.net/docs.html#approx_equal

This method resolves all our problems in this place:

  • It should be fast - it's armadillo native method (should be parallelized) + should stop after first mismatch (equality check, nothing more)
  • No more code duplication - it accepts all armadillo objects, whether it's a matrix, vector or whatever else.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem I see is, that approx_equalwas introduced in Version 6.700, but we support >= 4.200.0, so I guess what we could do here is to backport the function as we did for the e.g. ind2sub in core/arma_extend.

return score;
}

double SequencePrecision(arma::field<arma::mat> trueOutputs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could combine the functions for arma::field<arma::mat> and arma::field<arma::colvec> and could avoid some codeduplication by using a template:

template<typename eT>
double SequencePrecision(arma::field<eT> trueOutputs, arma::field<eT> predOutputs)
{
...
}

// Random uniform length from [2..maxLength]
size_t size = RandInt(2, maxLength+1);
input(i) = arma::randi<arma::mat>(bitLen, size, arma::distr_param(0, 1));
arma::mat item_ans = arma::mat(bitLen, size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use camel casing for all names, for more information take a look at: https://github.com/mlpack/mlpack/wiki/DesignGuidelines#naming-conventions

size_t size = RandInt(2, maxLength+1);
input(i) = arma::randi<arma::mat>(bitLen, size, arma::distr_param(0, 1));
arma::mat item_ans = arma::mat(bitLen, size);
vector<pair<int, int>> vals(size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neat code, I was wondering if we could use arma::sort_index to replace the key of std::pair.

#include <vector>
#include <algorithm>
#include <utility>
#include <cstdlib>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove #include <cstdlib>and #include <ctime> here.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

I keep trying to create an LSTM baseline :) Right now, it kinda works, but there are two big problems:

  • I have no idea hwo to change model's rho (for LSTM layer, I just set its rho to the maximum required value) For this reason, LSTM model just kills sequences of length 2, but is unable to emit third symbol.
  • There is some weird bug when setting nRepeats to some value > 1. Precisely, the model.Train() instruction crashes. I wonder what's the difference between these cases.

Can you help me resolve those issues, please?

The current unit test code is in the last commit.

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 13, 2017

I looked into the issue and setting the correct sequence length and it turns out we have to do a little bit more as setting the rho value since the input size is only set once (https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/rnn_impl.hpp#L206). Also, I think it would be convenient if the model propagates the rho value through the model, that way we only have to change it once. Meaning we need another routine that does this for us.

@zoq
Copy link
Copy Markdown
Member

zoq commented Jun 13, 2017

Here is a quick patch

https://gist.github.com/zoq/24f8b2e4826d837d604f9613615763bb
https://gist.github.com/zoq/01952c2be67bd9fbeeeac5d6c69b39cd
https://gist.github.com/zoq/c7ffc1f16e654c94097fd79d1687fef3

Let me know if I should push the changes to my fork if that's easier to read for you.

and here is my output for nRepeat = 2

Input:
        0   1.0000   1.0000

Model output:
        0   1.0000   1.0000

True output:
        0   1.0000   1.0000

Input:
   1.0000        0   1.0000

Model output:
   1.0000        0   1.0000

True output:
   1.0000        0   1.0000

=======================================
Input:
   1.0000   1.0000

Model output:
   1.0000   1.0000

True output:
   1.0000   1.0000

=======================================
Input:
        0        0

Model output:
        0        0

True output:
        0        0

=======================================
Final score: 1

and the output for nRepeat = 3

Input:
        0   1.0000   1.0000

Model output:
        0        0        0

True output:
        0   1.0000   1.0000        0   1.0000   1.0000

=======================================
Input:
   1.0000   1.0000   1.0000

Model output:
   1.0000   1.0000   1.0000

True output:
   1.0000   1.0000   1.0000   1.0000   1.0000   1.0000

=======================================
Final score: 0

As for the repeat issue, without any further information, I can't see how the model should predict the right output.

@sidorov-ks
Copy link
Copy Markdown
Contributor Author

Implemented the changed you described, now it works nicely for nRepeats=1. Interstingly, that on my machine nRepeats=3 crashes (but your output suggests that the model has learned to copy but failed to learn to repeat) with the same message:

error: subtraction: incompatible matrix dimensions: 1x1 and 2x1
unknown location(0): fatal error: in "AugmentedRNNsTasks/LSTMBaselineTest": std::logic_error: subtraction: incompatible matrix dimensions: 1x1 and 2x1

Right now I'm implementing similar test for AddTask and SortTask, after what I'll proceed to record the scores and write Week 2 report.

However, I would like to know whether unit test is the appropriate place for running a baseline solution. Maybe I should implement it somewhere else, e.g. some kind of examples binary?

radical_test.cpp
random_forest_test.cpp
random_test.cpp
random_forest_test.cpp
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a merge artifact, if you remove the duplication travis should build the PR again.

#include <mlpack/methods/ann/layer/parametric_relu.hpp>
#include <mlpack/methods/ann/layer/reinforce_normal.hpp>
#include <mlpack/methods/ann/layer/select.hpp>
#include <mlpack/methods/ann/layer/cross_entropy_error.hpp>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened here, but we should not remove the recently introduced cross entropy error function.

weights[0] = 2;
// Increasing length by 1 double the number of valid numbers.
for (size_t i = 1; i < bitLen - 1; ++i)
weights[i] = 2 * weights[i - 1];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think should we use linspace in combination with exp2 here?

* accepts unnormalized probabilties as long as they are
* non-negative and sum to a positive number.
* @return A random integer sampled from the specified distribution.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with arma::vec for the weights instead of std::vector feels more consistent with the rest of the codebase, what do you think?

Copy link
Copy Markdown
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly minor style issues.

runningSum += el;
slots[i] = runningSum;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the check here, we already throw and exception if Probabilities < 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the case: this check is needed for correctly processing [0 0 0 ... 0] weights.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, thanks for the clarification!

*/
inline int RandInt(const arma::vec& weights)
{
// Build cumulative probabilities from event probabilities.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using arma::vechere, that way it's parallelised and we could write slots /= runningSum; instead of iterating over the vector.

Copy link
Copy Markdown
Contributor Author

@sidorov-ks sidorov-ks Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but is there any "arma-native" implementation of binary search? Since we compute cumulative sums, we won't really get benefits from parallelization - the cumsum is "inherently" sequential.

UPD Read your next commentary and understood that here we can use armadillo vector as if it was STL vector.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, cumsum is sequential, I guess the only operation that we could parallelise here is the accumulation. The performance boost should be negligible, we mainly save some lines of code.

{
// Build cumulative probabilities from event probabilities.
std::vector<double> slots(weights.size());
double runningSum = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if that check is necessary, since it was made clear in the documentation, we could simplify the expression:

arma::vec slots = cumsum(weights);
slots /= arma::accu(weights);
return std::lower_bound(slots.begin(), slots.end(), Random()) - slots.begin();

Let me know what you think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we still need to run that check - just to alert user if he tries to use it in the wrong way (intentialloy or otherwise)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the probabilities check? I'm not sure we have to, but I don't mind it here, we could use arma::any if you like or just leave the for loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reimplemented with arma::any. I think we do need those checks - it would lead to unpredicted consequences on weight vectors with negative elements. The binary search part is not robust to this case, and it is still nonsensical, so I like the idea to cut it out before the lower_bound run.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine for me, thanks for the input.


AddTask::AddTask(const size_t bitLen) : bitLen(bitLen)
{
if (bitLen <= 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go through the code and put he { on a new line?

// We have two binary numbers with exactly two digits (10 and 11).
// weights[0] = 2;
// Increasing length by 1 double the number of valid numbers.
/*for (size_t i = 1; i < bitLen - 1; ++i)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the code here?

using namespace mlpack::ann::augmented::tasks;
using namespace mlpack::ann::augmented::scorers;

using namespace mlpack::ann;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the namespaces here.

void Predict(arma::field<arma::mat>& predictors,
arma::field<arma::mat>& labels)
{
auto sz = predictors.n_elem;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can directly use predictors.n_elem; instead of sz here. In general we should avoid to use auto when using armadillo:

Use of C++11 auto is not recommended with Armadillo objects and expressions. Armadillo has a template meta-programming framework which creates lots of short lived temporaries that are not handled by auto.

In this case it would be just fine, but I think using it directly without using an alias is also just fine, an we can save another line.

predictors.reshape(3, predictors.n_elem / 3);
assert(predictors.n_rows == 3);
int num_A = 0, num_B = 0;
bool num = false; // true iff we have already seen the separating symbol
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, you mean if, also can you use the correct punctuation here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean "iff" in the meaning "if an only if" - just like it is normally done in math texts

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, didn't thought about that in the current context.

assert(predictors.n_rows == 3);
int num_A = 0, num_B = 0;
bool num = false; // true iff we have already seen the separating symbol
size_t len = predictors.n_cols;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need to create an alias here, what do you think?

BOOST_AUTO_TEST_SUITE(RandomTest);

// Test for RandInt() sampler from discrete uniform distribution.
BOOST_AUTO_TEST_CASE(DiscreteUniformRandomTest)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for write a test for the method!

if (el < 0) {
std::ostringstream oss;
// Check that constraints on weights parameter are satisfied.
if (arma::min(weights) < 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm really picky about the style, can you go through the code and adjust the lines with the following pattern: ) {.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran pcregrep -Mnr --color "\) \{\n" src/mlpack/ - is it okay if I fix this pattern elsewhere?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, nice idea!

@zoq
Copy link
Copy Markdown
Member

zoq commented Aug 1, 2017

Temporary fixed the disk space issue, can you go through the issues listed here: http://masterblaster.mlpack.org/job/pull-requests%20mlpack%20style%20checks/656/cppcheckResult/source.all/?before=5&after=5

Copy link
Copy Markdown
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good and we should merge it. We should let this sit for 3 days, before merge to give anyone else time to comment.

Copy link
Copy Markdown
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the hard work with this PR. I agree that it is ready, if you can take care of the comment about RandInt().

* @param weights Probabilities of individual numbers. Note that the function
* accepts unnormalized probabilties as long as they are
* non-negative and sum to a positive number.
* @return A random integer sampled from the specified distribution.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that instead of adding this overload of RandInt() we could use the functionality already available with DiscreteDistribution. Instead of

x = RandInt(weights);

we can create a DiscreteDistribution and use it:

DiscreteDistribution d(weights);
x = d.Random();

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As I have mentioned in the IRC chat) I don't see the concise way to use it in 1-dim setting - although I do admit that I didn't understand the n-dim API.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, the code I gave above was wrong. You could do:

DiscreteDistribution d(1); // One-dimensional discrete distribution.
d.Probabilities(0) = std::move(weights);
x = d.Random();

I think this will do what you need, let me know if not.

size_t totSize = vecInput.n_elem + addSeparator + vecLabel.n_elem;
input(i) = arma::zeros(totSize, 2);
input(i).col(0).rows(0, vecInput.n_elem-1) =
vecInput;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small style issues here---the wrapped lines should be indented 4 spaces, not 2, and vecInput.n_elem-1 should be vecInput.n_elem - 1. :)

input(i).rows(origPtr, origPtr + bitLen - 1);
ptr += bitLen;
origPtr += bitLen;
sepInput.at(ptr, 0) = 0.5;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see---thanks for the clarification.

Copy link
Copy Markdown
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring. I'm glad we can keep the tests too. :) I only have a couple minor comments that you can address if you like (it'll make the code just a bit cleaner), otherwise I agree this is ready for merge.

weights = arma::exp2(arma::linspace(1, bitLen - 1, bitLen - 1));

mlpack::distribution::DiscreteDistribution d(1);
d.Probabilities(0) = std::move(weights);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could compress this a bit, I think:

d.Probabilities(0) = arma::exp2(arma::linspace(1, bitLen - 1, bitLen - 1));

then you don't need the weights object. :)

weights = arma::exp2(arma::linspace(1, maxLength - 1, maxLength - 1));

mlpack::distribution::DiscreteDistribution d(1);
d.Probabilities(0) = std::move(weights);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this can be compressed too.

{
arma::vec armaWeights(weightSet);
mlpack::distribution::DiscreteDistribution d(1); // One-dimensional discrete distribution.
d.Probabilities(0) = std::move(armaWeights);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to compress this a little bit too---d.Probabilities(0) = arma::vec(weightSet).

{
arma::vec weights(maxLength - 1);

mlpack::distribution::DiscreteDistribution d(1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to include #include <mlpack/core/dists/discrete_distribution.hpp> to avoid an undefined method issue. The same for the Add task.

@zoq zoq merged commit 94907ce into mlpack:master Aug 7, 2017
@zoq
Copy link
Copy Markdown
Member

zoq commented Aug 7, 2017

Thanks for the great contribution!

@zoq zoq mentioned this pull request Aug 8, 2017
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-43.8%) to 38.366% when pulling 8f4af1e on 17minutes:master into 5c68061 on mlpack:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants