Copy layer inputs for CNN layers by rcurtin · Pull Request #2234 · mlpack/mlpack

rcurtin · 2020-02-24T04:24:48Z

This PR is an attempt to fix #2146, although not in the way that I would have hoped. Unfortunately, it seems that for now, for the sake of memory safety, we will need to copy the input matrix.

Specifically, the problem is this: layers like Convolution and AtrousConvolution and TransposedConvolution (and some others) create an alias of the given input matrix when Forward() is called; this is the member inputTemp. This alias uses the same memory as the given input matrix; it does not make a copy, and it is not responsible for freeing the memory when the alias's destructor is called. That member inputTemp may then be used in Backward() and Gradient().

However, the problem is that there exist layer types that will cause the input matrix to other layers to be freed during the forward pass; this means that any layer that took an alias during the forward pass is now holding a pointer to invalid memory. (I can't remember if it is individual layers that cause the freeing, or if that happens as part of the FFN and RNN classes. In either case, the result is the same.)

I thought that perhaps we could simply use std::move() instead of creating an alias, but this turns out to not work because other parts of the FFN and RNN forward/backward passes rely on having the layer input still available even after Forward() is called on the layer. So, for now, I've simply copied the input. That's not the fastest solution, but I think it's better to have a slow solution that works instead of a fast one that sometimes segfaults. :)

Some points for discussion in this PR:

If the layer input is still expected even after Forward() is called by the FFN class, then should forward really take arma::mat&&s for both input and output? Or should the input be, e.g., const arma::mat&?
Have we written anywhere the specific expectations with what can and can't be done with the inputs to Forward() implementations? (I guess that's kind of related to the first question.)
Is it possible to refactor FFN and RNN such that the std::move() approach (instead of the copy approach I used here) can work?

I ask these questions because I'm not particularly familiar with the internal design of this part of the code, so maybe there is just something I'm not aware of, I don't know. :) Relevant folks for this discussion: @zoq, @ShikharJ, @walragatver, @sreenikSS, @saksham189. (Sorry for so many tags. It just seems to me like there are a couple of fundamental issues here that might be worth discussing. 👍)

saksham189 · 2020-02-24T10:16:04Z

I came to the same conclusion about the inputTemp variable here while testing and I could not understand why it was happening.

rcurtin · 2020-02-26T02:50:50Z

Looks like Azure forgot to update the build status for the Windows VS14 Plain job... now it says that it both failed and passed. :) I guess the failed status just didn't get removed.

zoq · 2020-02-26T18:55:43Z

If the layer input is still expected even after Forward() is called by the FFN class, then should forward really take arma::mat&&s for both input and output? Or should the input be, e.g., const arma::mat&?

Thanks for looking into the issue, my first impression is that switching to const arma::mat& is probably the best solution.

rcurtin · 2020-02-28T15:30:59Z

@zoq agreed; do you think that that's something we should open a separate issue for? (Also I guess we should change output to arma::mat& instead of arma::mat&& too?)

rcurtin · 2020-03-03T14:19:28Z

Looks like Azure forgot to update the build status for the Windows VS14 Plain job... now it says that it both failed and passed. :) I guess the failed status just didn't get removed.

zoq · 2020-03-03T20:52:47Z

@zoq agreed; do you think that that's something we should open a separate issue for? (Also I guess we should change output to arma::mat& instead of arma::mat&& too?)

Right, change input and output and yes we should open a seperate issue for that, or maybe I just do it myself. I guess the question is do we want to include the change, switching from arma::mat&& to arma::mat& into the next release?

rcurtin · 2020-03-12T14:55:56Z

This code touches the ANN code, and there was recently a big refactoring of the ANN code in #2259, so be sure to merge the master branch into your branch here to make sure that nothing will fail if this PR is merged. 👍 (I'm pasting this message into all possibly relevant PRs. Even ones that I personally opened. :))

zoq

Looks good to me thanks!

rcurtin · 2020-03-20T22:14:21Z

I don't think I'm going to merge this yet. I'm doing a closer look at the convolution code and I'm wondering if inputTemp is even needed. I should have some result today or tomorrow, and depending on the result of that, we can merge this.

zoq · 2020-03-20T22:39:04Z

Sounds good.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

This solves mlpack#2146 in a better way than mlpack#2234.

rcurtin · 2020-03-23T00:32:59Z

Closing in favor of #2326.

rcurtin added 2 commits February 23, 2020 23:14

Be safe... make copies.

08b65d8

Merge remote-tracking branch 'origin/master' into ann-move

16ab430

rcurtin added c: methods t: bugfix labels Feb 24, 2020

rcurtin added this to the mlpack 3.3.0 milestone Feb 24, 2020

rcurtin mentioned this pull request Feb 24, 2020

Can't Train a Normal CNN. #2146

Closed

rcurtin closed this Feb 24, 2020

rcurtin reopened this Feb 24, 2020

rcurtin mentioned this pull request Mar 6, 2020

Refactor ANN code to avoid rvalue references. #2259

Merged

rcurtin mentioned this pull request Mar 18, 2020

Convolution<> Layer cauing Segmentation Fault( core dumped) error. #2310

Closed

prince776 mentioned this pull request Mar 18, 2020

Temporary solution to issue #2310 #2316

Closed

Merge branch 'master' into ann-move

1044e59

zoq approved these changes Mar 20, 2020

View reviewed changes

mlpack-bot bot approved these changes Mar 21, 2020

View reviewed changes

rcurtin added a commit to rcurtin/mlpack that referenced this pull request Mar 22, 2020

No need for inputTemp; use the input given in Gradient().

29422c6

This solves mlpack#2146 in a better way than mlpack#2234.

rcurtin closed this Mar 23, 2020

rcurtin deleted the ann-move branch March 23, 2020 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Copy layer inputs for CNN layers#2234

Copy layer inputs for CNN layers#2234
rcurtin wants to merge 3 commits intomlpack:masterfrom
rcurtin:ann-move

rcurtin commented Feb 24, 2020

Uh oh!

saksham189 commented Feb 24, 2020 •

edited

Loading

Uh oh!

rcurtin commented Feb 26, 2020

Uh oh!

zoq commented Feb 26, 2020

Uh oh!

rcurtin commented Feb 28, 2020

Uh oh!

rcurtin commented Mar 3, 2020

Uh oh!

zoq commented Mar 3, 2020

Uh oh!

rcurtin commented Mar 12, 2020

Uh oh!

zoq left a comment

Uh oh!

rcurtin commented Mar 20, 2020

Uh oh!

zoq commented Mar 20, 2020

Uh oh!

mlpack-bot bot left a comment

Uh oh!

rcurtin commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

rcurtin commented Feb 24, 2020

Uh oh!

saksham189 commented Feb 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rcurtin commented Feb 26, 2020

Uh oh!

zoq commented Feb 26, 2020

Uh oh!

rcurtin commented Feb 28, 2020

Uh oh!

rcurtin commented Mar 3, 2020

Uh oh!

zoq commented Mar 3, 2020

Uh oh!

rcurtin commented Mar 12, 2020

Uh oh!

zoq left a comment

Choose a reason for hiding this comment

Uh oh!

rcurtin commented Mar 20, 2020

Uh oh!

zoq commented Mar 20, 2020

Uh oh!

mlpack-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

rcurtin commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saksham189 commented Feb 24, 2020 •

edited

Loading