Skip to content

Copy layer inputs for CNN layers#2234

Closed
rcurtin wants to merge 3 commits intomlpack:masterfrom
rcurtin:ann-move
Closed

Copy layer inputs for CNN layers#2234
rcurtin wants to merge 3 commits intomlpack:masterfrom
rcurtin:ann-move

Conversation

@rcurtin
Copy link
Copy Markdown
Member

@rcurtin rcurtin commented Feb 24, 2020

This PR is an attempt to fix #2146, although not in the way that I would have hoped. Unfortunately, it seems that for now, for the sake of memory safety, we will need to copy the input matrix.

Specifically, the problem is this: layers like Convolution and AtrousConvolution and TransposedConvolution (and some others) create an alias of the given input matrix when Forward() is called; this is the member inputTemp. This alias uses the same memory as the given input matrix; it does not make a copy, and it is not responsible for freeing the memory when the alias's destructor is called. That member inputTemp may then be used in Backward() and Gradient().

However, the problem is that there exist layer types that will cause the input matrix to other layers to be freed during the forward pass; this means that any layer that took an alias during the forward pass is now holding a pointer to invalid memory. (I can't remember if it is individual layers that cause the freeing, or if that happens as part of the FFN and RNN classes. In either case, the result is the same.)

I thought that perhaps we could simply use std::move() instead of creating an alias, but this turns out to not work because other parts of the FFN and RNN forward/backward passes rely on having the layer input still available even after Forward() is called on the layer. So, for now, I've simply copied the input. That's not the fastest solution, but I think it's better to have a slow solution that works instead of a fast one that sometimes segfaults. :)

Some points for discussion in this PR:

  • If the layer input is still expected even after Forward() is called by the FFN class, then should forward really take arma::mat&&s for both input and output? Or should the input be, e.g., const arma::mat&?

  • Have we written anywhere the specific expectations with what can and can't be done with the inputs to Forward() implementations? (I guess that's kind of related to the first question.)

  • Is it possible to refactor FFN and RNN such that the std::move() approach (instead of the copy approach I used here) can work?

I ask these questions because I'm not particularly familiar with the internal design of this part of the code, so maybe there is just something I'm not aware of, I don't know. :) Relevant folks for this discussion: @zoq, @ShikharJ, @walragatver, @sreenikSS, @saksham189. (Sorry for so many tags. It just seems to me like there are a couple of fundamental issues here that might be worth discussing. 👍)

@saksham189
Copy link
Copy Markdown
Member

saksham189 commented Feb 24, 2020

I came to the same conclusion about the inputTemp variable here while testing and I could not understand why it was happening.

@rcurtin rcurtin closed this Feb 24, 2020
@rcurtin rcurtin reopened this Feb 24, 2020
@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Feb 26, 2020

Looks like Azure forgot to update the build status for the Windows VS14 Plain job... now it says that it both failed and passed. :) I guess the failed status just didn't get removed.

@zoq
Copy link
Copy Markdown
Member

zoq commented Feb 26, 2020

If the layer input is still expected even after Forward() is called by the FFN class, then should forward really take arma::mat&&s for both input and output? Or should the input be, e.g., const arma::mat&?

Thanks for looking into the issue, my first impression is that switching to const arma::mat& is probably the best solution.

@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Feb 28, 2020

@zoq agreed; do you think that that's something we should open a separate issue for? (Also I guess we should change output to arma::mat& instead of arma::mat&& too?)

@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Mar 3, 2020

Looks like Azure forgot to update the build status for the Windows VS14 Plain job... now it says that it both failed and passed. :) I guess the failed status just didn't get removed.

@zoq
Copy link
Copy Markdown
Member

zoq commented Mar 3, 2020

@zoq agreed; do you think that that's something we should open a separate issue for? (Also I guess we should change output to arma::mat& instead of arma::mat&& too?)

Right, change input and output and yes we should open a seperate issue for that, or maybe I just do it myself. I guess the question is do we want to include the change, switching from arma::mat&& to arma::mat& into the next release?

@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Mar 12, 2020

This code touches the ANN code, and there was recently a big refactoring of the ANN code in #2259, so be sure to merge the master branch into your branch here to make sure that nothing will fail if this PR is merged. 👍 (I'm pasting this message into all possibly relevant PRs. Even ones that I personally opened. :))

Copy link
Copy Markdown
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thanks!

@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Mar 20, 2020

I don't think I'm going to merge this yet. I'm doing a closer look at the convolution code and I'm wondering if inputTemp is even needed. I should have some result today or tomorrow, and depending on the result of that, we can merge this.

@zoq
Copy link
Copy Markdown
Member

zoq commented Mar 20, 2020

Sounds good.

Copy link
Copy Markdown

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

rcurtin added a commit to rcurtin/mlpack that referenced this pull request Mar 22, 2020
@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Mar 23, 2020

Closing in favor of #2326.

@rcurtin rcurtin closed this Mar 23, 2020
@rcurtin rcurtin deleted the ann-move branch March 23, 2020 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't Train a Normal CNN.

4 participants