Skip to content

Fix memory leak in atom.c#362

Closed
stephanvantienen-keen wants to merge 1 commit intolldpd:masterfrom
stephanvantienen-keen:origin/fix/memory-leak-in-atom
Closed

Fix memory leak in atom.c#362
stephanvantienen-keen wants to merge 1 commit intolldpd:masterfrom
stephanvantienen-keen:origin/fix/memory-leak-in-atom

Conversation

@stephanvantienen-keen
Copy link
Copy Markdown

Fix memory leak in _lldpctl_do_something when error occurs after state_data has been duplicated. The function has multiple return statements after the strdup, the conn->state_data was only freed in the happy flow. Fixed by tracking the duplication and jumping to an end label when needing to return, freeing the duplication whenever needed.

Fix memory leak in _lldpctl_do_something when error occurs after state_data has been duplicated.
@vincentbernat
Copy link
Copy Markdown
Member

I am unsure if this is the right fix. With your modification, the data is freed in all cases. So, on next call, it's not here anymore. We store it in the connection to ensure we don't request for several stuff in parallel (the strcmp(conn->state_data, state_data) is the important part). We should not free the data if for example we don't have enough bytes.

In which case did you hit a memory leak? It may happen if there is a serialization error or if you run into an invalid state. But this also means you have some programming error elsewhere.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

I ran into the memory leak when invoking lldpctl_get_port from a client while the daemon no longer has the port since the port has gone down. In that case ctl_msg_recv_unserialized fails since the expected_type is not GET_INTERFACE but NONE. At that point _lldpctl_do_something returns without freeing conn->state_data. I do not see how I could prevent this from the client side.

I understand that the strcmp is important, I do not understand the part 'We store it in the connection to ensure we don't request for several stuff in parallel'. The strdup always executes when state_data is not NULL, right? With your explanation I would expect that the free does not execute at all in _lldpctl_do_something, it would be handled at some other part altogether to prevent a leak.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

Let me rephrase the latter part: with your explanation I would expect that the strdup only executes if conn->state_data is not already equal to the passed state_data, so the strcmp also is used before the strdup. In that case I see why a free would not be needed in all cases, since the conn instance stores it for later use as well (in that case the free can be added before doing a strdup and the end label is not needed). That would work for me.

@vincentbernat
Copy link
Copy Markdown
Member

The state_data is here to prevent you from querying interface A, then interface B while all the data for interface A has not been retrieved (in async mode). Maybe this protection is not very useful. Let me think a bit about it.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

Understood. It sounds a bit like needing something in conn to know that a free is needed for state_data. But I for sure see how my solution will not be nice for async mode.

@vincentbernat
Copy link
Copy Markdown
Member

Another question: how do you recover from the error? The state machine is in a bad state, I think you need to tear down the connection?

vincentbernat added a commit that referenced this pull request Nov 9, 2019
The state data is used to ensure we don't interleave requests of the
same kind (eg requesting data for eth0, then for eth1 while eth0 is
running). The data was freed only when reaching `CONN_STATE_IDLE`
again. Otherwise, there was a memory leak.

This is an attempt to rework the code to avoid the memory leak.
However, we still get a memory leak on error case because we don't
know the difference between an hard error case or simply a request to
wait for more bytes. See #362 for the discussion.
@vincentbernat
Copy link
Copy Markdown
Member

I have pushed an attempt to implement your suggestion (doing the strdup() later), but I think this doesn't fix the issue. This is 1a79cfa. In an error case, we don't reach the idle state again, so we don't know if the data is still needed or not. That's why I am interested how you recover from error when getting a serialization error.

I think a simpler solution would be to remove this safe guard since you uncover a condition that happens in valid code while the safe guard was here to catch invalid code.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

First of all: I agree, this fix will not help in my error situation, the leak will still be there. I see your point in not being able to see the difference between a hard error case or waiting for more bytes. In our error case the leak will still be there.
We indeed recover by tearing down the connection via lldpctl_release. Currently this function does not free conn->state_data, would it be an idea to add that? Combining your fix with that should cover all situations - assuming that anyone will need to tear down the connection in such a case.

@vincentbernat
Copy link
Copy Markdown
Member

Yes, we could free it when tearing down the connection. I had also other ideas:

  • removing the connection state data
  • using a fixed-size allocation (maximum size is mostly IFNAMSIZE)
  • fixing your specific case (we should be able to handle your use case without you having to tear down the whole connection)

@stephanvantienen-keen
Copy link
Copy Markdown
Author

All would work for me. I get the idea for the safeguard, at least the way I am using my client it will not be needed. But in general removing the safeguard could be a bit too much, but you are a better judge of that.

Using a fixed-size allocation would be fine, I actually like that one. It removes the whole hassle of having to keep track of the allocation. This is my preferred solution, it would mean atom.c can largely revert to what it was.

What is your preference?

vincentbernat added a commit that referenced this pull request Nov 11, 2019
The state data is used to ensure we don't interleave requests of the
same kind (eg requesting data for eth0, then for eth1 while eth0 is
running). The data was freed only when reaching `CONN_STATE_IDLE`
again. Otherwise, there was a memory leak.

To avoid the memory leak, we avoid use a static allocation instead.
See #362 for the discussion.
@vincentbernat
Copy link
Copy Markdown
Member

I am pushing a branch with static allocation. Let's see if it works with the tests. However, fixing the case where a client has a reference to a port and tries to use it while the port doesn't exist anymore would still be interesting. It doesn't seem straightforward. We could return an empty port, but this may break existing clients not checking this case.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

Thank you for the discussion so far and for the new commit, I will use that - reverting my PR. For my situation this is now covered, but I understand your broader point. Not sure what to do with this PR now though - should I close it?

vincentbernat added a commit that referenced this pull request Nov 11, 2019
The state data is used to ensure we don't interleave requests of the
same kind (eg requesting data for eth0, then for eth1 while eth0 is
running). The data was freed only when reaching `CONN_STATE_IDLE`
again. Otherwise, there was a memory leak.

To avoid the memory leak, we avoid use a static allocation instead.
See #362 for the discussion.
@vincentbernat
Copy link
Copy Markdown
Member

It will automatically close once the commit will be merged.

@stephanvantienen-keen
Copy link
Copy Markdown
Author

Thank you, good to know. Will wait for the final commit, I saw that the tests failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants