Reference counters by lminiero · Pull Request #403 · meetecho/janus-gateway

lminiero · 2015-12-03T15:14:30Z

This patch introduces a new file that provides a reference counter implementation that should hopefully be used across the whole Janus code, so that we can get rid of all the timed watchdogs/garbage collectors we have in place now (and the problems they can cause any time a race condition occurs), in both core and plugins/transports.

This PR does nothing but add this new implementation to the codebase. It's still not integrated in any part of Janus, as I first wanted to make sure this was a step in the right direction, and whether or not the mechanism itself as it is implemented right now can be improved. I didn't want to make use of GLib GObjects for several reasons, so I decided to take a lighter approach with something more "homemade". This excellent blog post provided some really nice info on how this could be done in an effective way, and that's what lead to the code I just added, which is quite obviously heavily based on that: atomic operations on counters, and a callback function that the application sets and is called when the counter gets to 0.

According to how the implementation works, now, this is how you'd use it. Assuming this is the struct we want to handle through the RC mechanism:

struct mystruct {
    int number;
    char *string;
}

that's how you'd modify it:

struct mystruct {
    int number;
    char *string;
    janus_refcount rc;
}

When allocating a new instance of that struct, you have to initialize the reference counter:

struct mystruct *instance = [..];
janus_refcount_init(&instance->rc, mystruct_free_cb);

This automatically sets the counter to 1 (as you obviously own yourself), and instructs the reference counter about which function must be invoked when the counter gets to 0 and so you can free the instance. The mystruct_free_cb method must have a signature like this:

void mystruct_free_cb(janus_refcount *counter);

and its implementation might look like this:

void mystruct_free_cb(janus_refcount *counter) {
    struct mystruct *instance = janus_refcount_containerof(counter, struct mystruct, rc);
    if(instance->string)
        free(instance->string);
    free(instance);
}

The important part here is janus_refcount_containerof, which is a macro we use to get the pointer to the instance that needs to be freed out of its reference counter. We pass it, in order:

a pointer to the reference counter itself;
the type of the instance (struct mystruct in this case);
the name of the reference counter as member of the instance struct (rc in our example).

This is needed because, as explained in the mentioned blog post, you can make use of offsetof to programmatically calculate the pointer to the container instance starting from one of its members (our reference counter, in this case). This saves us the need to have another pointer (and so more bytes) in janus_refcount itself to keep track of the object instance we're handling.

To increase the counter, you just use janus_refcount_increase:

janus_refcount_increase(&instance->rc);

Unsurprisingly, janus_refcount_decrease decreases the counter instead:

janus_refcount_decrease(&instance->rc);

Both work on the actual counter using atomic operations, and so are inherently thread-safe. As soon as a call to janus_refcount_decrease causes the counter to become 0, the callback we set when initializing is invoked, and we can free the instance.

This is basically how it's supposed to work: if you're familiar with reference counters, nothing too fancy or new, then. If you think this is already in a good shape, I'll start integrating this in the code we have now. In our case this will have the additional complexity of objects containing objects containing objects and so on, all of them reference counted, likely (e.g., just think of session → handle → streams → components → dtls stack → sctp stack), but if we do this right, by making sure that each father increases the counter of its child and that "consumers" of those instances will also increase/decrease, this should all work.

Feedback and suggestions welcome!

ploxiln · 2015-12-04T20:55:56Z

refcount.h

it's a tiny bit awkward that, in this function, free locally overrides the symbol for the libc free(). I might the argument free_fn

Good point, poor naming choice, I'll fix that.

ploxiln · 2015-12-04T20:58:30Z

This approach looks good to me. Obviously, it's going to be quite a lot of work to transition the whole code base and work out missed ref incr/decr bugs :)

lminiero · 2015-12-04T21:00:20Z

Yep, I don't expect it to be easy... especially as we're going to have cross references and the like.

ploxiln · 2015-12-16T03:20:28Z

just a humble git tip ... and feel free to tell me to shut up ... merging master into this branch every day just makes the git history more messy

Sometimes it makes sense to merge latest master into a branch, to test against latest changes in master, or merge before conflicts become too great. But if there are not yet any code-paths in common, or just documentation or other cosmetic changes on one side of the merge, it's really not needed.

lminiero · 2015-12-16T06:51:28Z

That's a fair point: it's just something I do for all pull requests to make sure someone wanting to try them has all the latest fixes available, and I agree it doesn't make much sense when changes are cosmetic.

Anyway, I'm planning to commit a first version of the refactored code in a few days, which will make mainstream changes pretty much unmergeable for quite some time, as it was for modular transports 😄

saghul · 2015-12-16T07:09:20Z

FWIW, I personally like the rebase + force push approach for feature branches. Ping me if you are willing to expore :-)

lminiero · 2015-12-16T09:09:31Z

I've heard of git rebase but never explored in much detail: in general this sounds like a good idea, but there are cases (modular-transports, this PR here) where keeping things in sync is much harder and manual fixes due to heavy differences are needed all the time. How does it handle that?

saghul · 2015-12-16T09:17:53Z

(I didn't want to hijack the thread, but since you asked 😄 )

The workflow would be as follows:

work on the branch
if master diverged, time to sync them up
git rebase origin/master
create patches which fix the merge conflicts when applying these patches on top of master
git push origin feature-branch -f
once the feature is complete, time to merge!
git rebase -i origin/master
squash all the commits into one/many
git push origin feature-branch -f
git checkout master
git merge feature-branch
git push
profit!

This would avoid merge commits, at the cost of requiring people who are tracking your work in the feature branch to do this:

git checkout feature-branch
git fetch
git reset --hard origin/feature-branch

The last command is basically a "force pull".

lminiero · 2015-12-16T09:21:42Z

Way too complicated for the incredibly tired Lorenzo of the ending year, I'll add this to my list of new resolutions for 2016! 😄
Thanks for the explaination, I'll study this a bit more in the next weeks.

saghul · 2015-12-16T09:23:14Z

Haha, no problem. It's a matter of preference, some people like it one way, some other the other way, just like tabs and spaces 😉

Either way, kudos on the good work, have some rest!

lminiero · 2015-12-18T17:08:09Z

At last, here's a first stab at the refactoring I've been working on. I wanted to make sure something would be available for you guys before the holidays, as for some days I'll try and enjoy some (hopefully) deserved holidays, and so wouldn't be able to work on this for a while.

First of all, I changed a bit the way reference counters are handled (macros instead of static inline), as that allowed me to selectively enable/disable counters debugging and tracking them. To make a simple example, the VideoRoom plugin is the last I've tested, and that's why all the _debug versions of the RC methods are still being used there: the same effect on a global way can be achieved by enabling RC debugging via admin API, but I was mostly interested in drilldown debugging now, so that's why I used the explicit methods.

That said, the current patch tries to address basically all of the structures, in core and plugins, that might have benefited from reference counters, with the exception of transports as I need to change something in the way they're handled (opaque pointers are passed right now, I'll need to do something like the janus_plugin_session for plugins).

Don't expect it to be very stable yet, and you'll immediately notice it's quite ugly here and there ATM. There are tons of LOG_WARN lines I temporarily put throughout the code, for the sole purpose of debugging and having a more or less explicit indication of when things are actually being freed. Besides, I expect it to crash now and then in some cases, as while I tried to test all plugins in several conditions, there's just so much I can do in limited experiments: so race conditions may still occur, and things may be freed before their time causing problems.

As a side note, you'll see how much faster Janus is to shutdown right now, as no plugins watchdogs are involved anymore: that said, it still crashes at shutdown sometimes, probably due to the fact that some threads are still up and running when closing things. I still have to look into this as I first want to give everything the right shape, but I guess this will be a pickle: in fact, joining threads when shutting down plugins may not be wise, especially in case due to some bug some reference counter may still be >0 and stay that way, and so plugins may be waiting forever for something that would never happen.

Finally, I took advantage of this major refactoring to also change a couple of tiny things here and there (e.g., the names of some structures in VideoRoom and the way subscribers are addressed in the API) but nothing you have to worry about right now.

Feedback more than welcome!

…port New janus_transport_session struct to bridge core and transports Fixed some bugs and typos from previous commits

lminiero · 2015-12-23T15:09:08Z

Added the same approach to the transports as well modifying how core and transport plugins talk to each other. Fixed a few bugs and typos here and there as well. It seems to be working as expected, but of course I'll have to test all of this more thoroughly. I'll do that after the holidays.

…count

MvEerd · 2018-02-13T11:22:01Z

I was wondering if there are still plans to merge this into master within the next few months?
The improvements the reference counters and lack of lazy cleanup made were noticeable but it has proven somewhat time consuming to switch and compare between this and the Master branch and we eventually reverted back to master to stay up to date and stable (despite your amazing efforts to merge this continually)

I'm currently working on a new project so I'd be happy to test the recount branch again but wonder how long it might be before it becomes the main branch?

Thanks for all your continued efforts and apologies for the PR bump

lminiero · 2018-02-13T14:08:13Z

@MvEerd that's a good question, actually. Yes, I personally believe it's a worthy effort and so the idea is indeed still to merge sooner or later, but as you noticed it's becoming harder and harder to align to the changes we keep on doing on master. Sometimes these merges introduced regression or issues, and those are hard to spot when we don't get feedback. Now we have 3-4 other PRs waiting to be merged, and it will be even more stuff we'll need to synchronize.

I'd love to say "we sync now and we merge", but that's probably not realistic or wise. If we sync now, we still have to verify everything works, and that it's as stable as master is, which will take time: even more time if we don't get feedback from others, as there's only so much we can do. Besides, we cannot just stop the engines while we wait to check if this all works, as that might take forever.

That said, one option might indeed be to just merge: we sync, do a basic check via the existing demos, bump the version, tag the previous one, merge and then just work on that from now on. As explained above, there will be an initial stage where things might (will) break, but we might try and converge ASAP via issues and bugfixing. I expect some will complain (mostly because the pre-refcount version would not be updated anymore, so no fixes), but there isn't a single thing that makes everybody happy, and our job is to move the project forward, not freeze it in time. Another option is to close this PR (which I guess has become a bit of a mess) and create a new one where we replicate the same exact mechanisms, but starting from the current master: this might be easier than aligning to master and hope we didn't break anything (it would definitely have a clearer history), but it may be just wishful thinking.

Open to suggestions!

MvEerd · 2018-02-13T21:09:26Z

+1 👍
I agree that it's a worthy effort and would like to help it get merged
Time has unfortunately proven that seemingly only a handful of people try this branch until it becomes master.

To lessen the impact for people not wanting to use the refcount branch or relying on things that change; tagging a stable release which can be used until the new master becomes stable seems like it might be sufficient but I can't speak for everyone ofcourse

I see and appreciate why you'd want to close this PR for commit history and to redo it with the things you've learned but I'm somewhat hesitant for it because the amount of segfaults that popped up initially.
Most of those seem to have been resolved in this branch by the many commits you and @andreasg123 have made over time

I wish I could help with a proper commit to help align the C code but would most likely just introduce more bugs 😄
I would however be more than happy to test and fix any of the demos I can to help merge this if you decide to go with that option

lminiero · 2018-04-04T16:25:19Z

Finally merging! Thanks all for the contributions and tremendous feedback!

Added reference counter implementation (integration in code still WIP)

08a839d

This was referenced Dec 4, 2015

Atomic reference counter for data structures #141

Closed

AddressSanitizer: heap-use-after-free janus_plugin_push_event #351

Closed

ploxiln reviewed Dec 4, 2015
View reviewed changes

lminiero added 3 commits December 7, 2015 11:41

Rename the free refcount init argument

d59902b

Merge branch 'master' into refcount

5a73c38

Merge branch 'master' into refcount

79df65a

lminiero mentioned this pull request Dec 9, 2015

Floating Point Exception when Stopping Janus #378

Closed

lminiero added 5 commits December 10, 2015 12:44

Merge branch 'master' into refcount

da3b40a

Merge branch 'master' into refcount

bf0c4e7

Merge branch 'master' into refcount

7773ebc

Merge branch 'master' into refcount

ff7b858

Merge branch 'master' into refcount

ef6e15a

lminiero added 4 commits December 18, 2015 17:29

Merge branch 'master' into refcount

4f72141

Merge branch 'master' into refcount

bc17aee

First take at refactoring Janus to use reference counters

6d1edc9

A couple of web demos changes to reflect the big RC commit

21c86a4

Changes to the modular transports to integrate reference counters sup…

b5577ab

…port New janus_transport_session struct to bridge core and transports Fixed some bugs and typos from previous commits

Merge branch 'master' into refcount

162db94

Alessandro Toppi and others added 4 commits January 17, 2018 16:03

Fix unreferences on participant and videoroom in message handler.

5ba3a77

Merge branch 'master' into refcount

79b9882

Merge branch 'refcount' of github.com:meetecho/janus-gateway into ref…

002c056

…count

Remove (again) useless NULL pointer checks.

68fa603

lminiero mentioned this pull request Jan 24, 2018

Time delay after receiving destroy mountpoint response and actually freeing the port. #1139

Closed

Aligned to several recent changes in master

8b78bae

lminiero mentioned this pull request Feb 15, 2018

Janus 0.3 crash when leaving videoroom #1136

Closed

lminiero and others added 16 commits February 21, 2018 15:20

Merge branch 'master' into refcount

33f7267

Merge branch 'master' into refcount

5ebf5fa

Merge branch 'master' into refcount

52d4c64

Merge branch 'master' into refcount

58a4550

Fixed wrong logs when debugging refcount initializations

26e620a

Removed unneeded checks

0788b0e

Removed extra verbose lines in TextRoom plugin

176e6af

Fixed leak in AudioBridge (missing refcount for participants)

638ab30

Merge branch 'master' into refcount

da89e66

Fixed crash when creating AudioBridge room

a2fbdcb

Merge branch 'master' into refcount

fceaa9f

Fixed deadlock when destroying AudioBridge rooms

92fe538

Merge branch 'master' into refcount

f1981a0

Merge branch 'master' into refcount

47da54e

Merge branch 'master' into refcount

c09d86a

Bumped version number(s), and disabled refcount debugging by default

6c5f13e

lminiero merged commit 7f59c27 into master Apr 4, 2018

lminiero deleted the refcount branch April 4, 2018 16:28

mqp mentioned this pull request Apr 25, 2018

dtls: memcpy with negative size #1213

Closed

Conversation

lminiero commented Dec 3, 2015

Uh oh!

ploxiln Dec 4, 2015

Choose a reason for hiding this comment

Uh oh!

lminiero Dec 4, 2015

Choose a reason for hiding this comment

Uh oh!

ploxiln commented Dec 4, 2015

Uh oh!

lminiero commented Dec 4, 2015

Uh oh!

ploxiln commented Dec 16, 2015

Uh oh!

lminiero commented Dec 16, 2015

Uh oh!

saghul commented Dec 16, 2015

Uh oh!

lminiero commented Dec 16, 2015

Uh oh!

saghul commented Dec 16, 2015

Uh oh!

lminiero commented Dec 16, 2015

Uh oh!

saghul commented Dec 16, 2015

Uh oh!

lminiero commented Dec 18, 2015

Uh oh!

lminiero commented Dec 23, 2015

Uh oh!

MvEerd commented Feb 13, 2018

Uh oh!

lminiero commented Feb 13, 2018

Uh oh!

MvEerd commented Feb 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lminiero commented Apr 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

MvEerd commented Feb 13, 2018 •

edited

Loading