Skip to content

Memory leak and invalid use fixes#2326

Merged
favre49 merged 22 commits intomlpack:masterfrom
rcurtin:memory-fixes
Mar 23, 2020
Merged

Memory leak and invalid use fixes#2326
favre49 merged 22 commits intomlpack:masterfrom
rcurtin:memory-fixes

Conversation

@rcurtin
Copy link
Copy Markdown
Member

@rcurtin rcurtin commented Mar 22, 2020

Inspired by #2314, #2310, #2300, and others, I built mlpack with -fsanitize=address and ran mlpack_test. The results were... noisy. But I took some time and I fixed everything (except for one, detailed later).

  • All of the ANN layers are properly freed now; it took a while to get this write, and DeleteVisitor will now delete internally-held layers when the layer has a Model() method. This may solve some issues people were having.

  • CNN layers don't make an alias of the input from Forward() anymore. This should fix Can't Train a Normal CNN. #2146 .

  • Test code properly frees things that it allocates. There were some tricky places where deallocation had been overlooked.

  • Bindings handle exceptions properly and deallocate any allocated memory when an exception is thrown. It makes the code a bit uglier, but it's safer.

There is one bug I didn't solve, which surfaces in the reinforcement learning code, in q_learning_impl.hpp:

     // Update target network
     if (totalSteps % config.TargetNetworkSyncInterval() == 0)
      targetNetwork = learningNetwork;

This code actually copies an FFN, and each individual layer will require a copy constructor to fix that memory leak. That's the same issue as in #2314, and anyway, that isn't solved here. But everything else in the tests is.

I do think it's important that we merge this before a release, otherwise we'll probably get lots of segfault and invalid memory access reports. :)

Copy link
Copy Markdown
Member

@birm birm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure what's up with the hd5 file test, but this looks good.
Wondering what it would take to automate the test you ran... (I mean of course we could just add that flag, but is there something quantifiable we could have azure spit out?)

Copy link
Copy Markdown
Member

@favre49 favre49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, looks like hours of work. Everything LGTM, besides some nitpicks.

Comment on lines +69 to +72
// Remove mergeLayer from the forward and backward RNNs so it doesn't get
// deleted. This assumes that mergeLayer is the last layer!
forwardRNN.network.pop_back();
backwardRNN.network.pop_back();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. We never add the mergeLayer to the forward and backward RNN but the last layer of the RNN's to the merge layer if you look here.
I guess we only need change the comment though. If reset is true then we should pop the last layers from the RNN's to avoid deleting twice otherwise we should simply delete the mergeLayer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right! I read it backwards. In any case, yeah, removing the last layer from the forward and backward RNNs is sufficient to prevent a memory leak. I've updated the comment in c42b9f7. 👍

@rcurtin
Copy link
Copy Markdown
Member Author

rcurtin commented Mar 23, 2020

Thanks @favre49, most of the time was actually just waiting on the build and seeing the results. :) I removed all the unnecessary braces in b1e5146, now let's see if it still builds clean. 👍

@birm yeah, there is some more info on the HDF5 test in #2320.

@zoq
Copy link
Copy Markdown
Member

zoq commented Mar 23, 2020

Thanks for spending so many ours on the issues, I'll take a look at the changes later today.

Copy link
Copy Markdown
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, this looks great to me.

{
// Remove the last layers from the forward and backward RNNs, as they are held
// in mergeLayer. So, when we use DeleteVisitor with mergeLayer, those two
// layers will be properly (and not doubly) freed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, nice!

@favre49
Copy link
Copy Markdown
Member

favre49 commented Mar 23, 2020

I'll go ahead and merge this then.

@favre49 favre49 merged commit 4b4b4f6 into mlpack:master Mar 23, 2020
@rcurtin rcurtin deleted the memory-fixes branch March 3, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't Train a Normal CNN.

6 participants