Skip to content

Reference counters#403

Merged
lminiero merged 238 commits intomasterfrom
refcount
Apr 4, 2018
Merged

Reference counters#403
lminiero merged 238 commits intomasterfrom
refcount

Conversation

@lminiero
Copy link
Copy Markdown
Member

@lminiero lminiero commented Dec 3, 2015

This patch introduces a new file that provides a reference counter implementation that should hopefully be used across the whole Janus code, so that we can get rid of all the timed watchdogs/garbage collectors we have in place now (and the problems they can cause any time a race condition occurs), in both core and plugins/transports.

This PR does nothing but add this new implementation to the codebase. It's still not integrated in any part of Janus, as I first wanted to make sure this was a step in the right direction, and whether or not the mechanism itself as it is implemented right now can be improved. I didn't want to make use of GLib GObjects for several reasons, so I decided to take a lighter approach with something more "homemade". This excellent blog post provided some really nice info on how this could be done in an effective way, and that's what lead to the code I just added, which is quite obviously heavily based on that: atomic operations on counters, and a callback function that the application sets and is called when the counter gets to 0.

According to how the implementation works, now, this is how you'd use it. Assuming this is the struct we want to handle through the RC mechanism:

struct mystruct {
    int number;
    char *string;
}

that's how you'd modify it:

struct mystruct {
    int number;
    char *string;
    janus_refcount rc;
}

When allocating a new instance of that struct, you have to initialize the reference counter:

struct mystruct *instance = [..];
janus_refcount_init(&instance->rc, mystruct_free_cb);

This automatically sets the counter to 1 (as you obviously own yourself), and instructs the reference counter about which function must be invoked when the counter gets to 0 and so you can free the instance. The mystruct_free_cb method must have a signature like this:

void mystruct_free_cb(janus_refcount *counter);

and its implementation might look like this:

void mystruct_free_cb(janus_refcount *counter) {
    struct mystruct *instance = janus_refcount_containerof(counter, struct mystruct, rc);
    if(instance->string)
        free(instance->string);
    free(instance);
}

The important part here is janus_refcount_containerof, which is a macro we use to get the pointer to the instance that needs to be freed out of its reference counter. We pass it, in order:

  1. a pointer to the reference counter itself;
  2. the type of the instance (struct mystruct in this case);
  3. the name of the reference counter as member of the instance struct (rc in our example).

This is needed because, as explained in the mentioned blog post, you can make use of offsetof to programmatically calculate the pointer to the container instance starting from one of its members (our reference counter, in this case). This saves us the need to have another pointer (and so more bytes) in janus_refcount itself to keep track of the object instance we're handling.

To increase the counter, you just use janus_refcount_increase:

janus_refcount_increase(&instance->rc);

Unsurprisingly, janus_refcount_decrease decreases the counter instead:

janus_refcount_decrease(&instance->rc);

Both work on the actual counter using atomic operations, and so are inherently thread-safe. As soon as a call to janus_refcount_decrease causes the counter to become 0, the callback we set when initializing is invoked, and we can free the instance.

This is basically how it's supposed to work: if you're familiar with reference counters, nothing too fancy or new, then. If you think this is already in a good shape, I'll start integrating this in the code we have now. In our case this will have the additional complexity of objects containing objects containing objects and so on, all of them reference counted, likely (e.g., just think of session → handle → streams → components → dtls stack → sctp stack), but if we do this right, by making sure that each father increases the counter of its child and that "consumers" of those instances will also increase/decrease, this should all work.

Feedback and suggestions welcome!

refcount.h Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a tiny bit awkward that, in this function, free locally overrides the symbol for the libc free(). I might the argument free_fn

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, poor naming choice, I'll fix that.

@ploxiln
Copy link
Copy Markdown
Contributor

ploxiln commented Dec 4, 2015

This approach looks good to me. Obviously, it's going to be quite a lot of work to transition the whole code base and work out missed ref incr/decr bugs :)

@lminiero
Copy link
Copy Markdown
Member Author

lminiero commented Dec 4, 2015

Yep, I don't expect it to be easy... especially as we're going to have cross references and the like.

@ploxiln
Copy link
Copy Markdown
Contributor

ploxiln commented Dec 16, 2015

just a humble git tip ... and feel free to tell me to shut up ... merging master into this branch every day just makes the git history more messy

Sometimes it makes sense to merge latest master into a branch, to test against latest changes in master, or merge before conflicts become too great. But if there are not yet any code-paths in common, or just documentation or other cosmetic changes on one side of the merge, it's really not needed.

@lminiero
Copy link
Copy Markdown
Member Author

That's a fair point: it's just something I do for all pull requests to make sure someone wanting to try them has all the latest fixes available, and I agree it doesn't make much sense when changes are cosmetic.

Anyway, I'm planning to commit a first version of the refactored code in a few days, which will make mainstream changes pretty much unmergeable for quite some time, as it was for modular transports 😄

@saghul
Copy link
Copy Markdown
Contributor

saghul commented Dec 16, 2015

FWIW, I personally like the rebase + force push approach for feature branches. Ping me if you are willing to expore :-)

@lminiero
Copy link
Copy Markdown
Member Author

I've heard of git rebase but never explored in much detail: in general this sounds like a good idea, but there are cases (modular-transports, this PR here) where keeping things in sync is much harder and manual fixes due to heavy differences are needed all the time. How does it handle that?

@saghul
Copy link
Copy Markdown
Contributor

saghul commented Dec 16, 2015

(I didn't want to hijack the thread, but since you asked 😄 )

The workflow would be as follows:

  • work on the branch
  • if master diverged, time to sync them up
  • git rebase origin/master
  • create patches which fix the merge conflicts when applying these patches on top of master
  • git push origin feature-branch -f
  • once the feature is complete, time to merge!
  • git rebase -i origin/master
  • squash all the commits into one/many
  • git push origin feature-branch -f
  • git checkout master
  • git merge feature-branch
  • git push
  • profit!

This would avoid merge commits, at the cost of requiring people who are tracking your work in the feature branch to do this:

  • git checkout feature-branch
  • git fetch
  • git reset --hard origin/feature-branch

The last command is basically a "force pull".

@lminiero
Copy link
Copy Markdown
Member Author

Way too complicated for the incredibly tired Lorenzo of the ending year, I'll add this to my list of new resolutions for 2016! 😄
Thanks for the explaination, I'll study this a bit more in the next weeks.

@saghul
Copy link
Copy Markdown
Contributor

saghul commented Dec 16, 2015

Haha, no problem. It's a matter of preference, some people like it one way, some other the other way, just like tabs and spaces 😉

Either way, kudos on the good work, have some rest!

@lminiero
Copy link
Copy Markdown
Member Author

At last, here's a first stab at the refactoring I've been working on. I wanted to make sure something would be available for you guys before the holidays, as for some days I'll try and enjoy some (hopefully) deserved holidays, and so wouldn't be able to work on this for a while.

First of all, I changed a bit the way reference counters are handled (macros instead of static inline), as that allowed me to selectively enable/disable counters debugging and tracking them. To make a simple example, the VideoRoom plugin is the last I've tested, and that's why all the _debug versions of the RC methods are still being used there: the same effect on a global way can be achieved by enabling RC debugging via admin API, but I was mostly interested in drilldown debugging now, so that's why I used the explicit methods.

That said, the current patch tries to address basically all of the structures, in core and plugins, that might have benefited from reference counters, with the exception of transports as I need to change something in the way they're handled (opaque pointers are passed right now, I'll need to do something like the janus_plugin_session for plugins).

Don't expect it to be very stable yet, and you'll immediately notice it's quite ugly here and there ATM. There are tons of LOG_WARN lines I temporarily put throughout the code, for the sole purpose of debugging and having a more or less explicit indication of when things are actually being freed. Besides, I expect it to crash now and then in some cases, as while I tried to test all plugins in several conditions, there's just so much I can do in limited experiments: so race conditions may still occur, and things may be freed before their time causing problems.

As a side note, you'll see how much faster Janus is to shutdown right now, as no plugins watchdogs are involved anymore: that said, it still crashes at shutdown sometimes, probably due to the fact that some threads are still up and running when closing things. I still have to look into this as I first want to give everything the right shape, but I guess this will be a pickle: in fact, joining threads when shutting down plugins may not be wise, especially in case due to some bug some reference counter may still be >0 and stay that way, and so plugins may be waiting forever for something that would never happen.

Finally, I took advantage of this major refactoring to also change a couple of tiny things here and there (e.g., the names of some structures in VideoRoom and the way subscribers are addressed in the API) but nothing you have to worry about right now.

Feedback more than welcome!

…port

New janus_transport_session struct to bridge core and transports
Fixed some bugs and typos from previous commits
@lminiero
Copy link
Copy Markdown
Member Author

Added the same approach to the transports as well modifying how core and transport plugins talk to each other. Fixed a few bugs and typos here and there as well. It seems to be working as expected, but of course I'll have to test all of this more thoroughly. I'll do that after the holidays.

@MvEerd
Copy link
Copy Markdown
Contributor

MvEerd commented Feb 13, 2018

I was wondering if there are still plans to merge this into master within the next few months?
The improvements the reference counters and lack of lazy cleanup made were noticeable but it has proven somewhat time consuming to switch and compare between this and the Master branch and we eventually reverted back to master to stay up to date and stable (despite your amazing efforts to merge this continually)

I'm currently working on a new project so I'd be happy to test the recount branch again but wonder how long it might be before it becomes the main branch?

Thanks for all your continued efforts and apologies for the PR bump

@lminiero
Copy link
Copy Markdown
Member Author

@MvEerd that's a good question, actually. Yes, I personally believe it's a worthy effort and so the idea is indeed still to merge sooner or later, but as you noticed it's becoming harder and harder to align to the changes we keep on doing on master. Sometimes these merges introduced regression or issues, and those are hard to spot when we don't get feedback. Now we have 3-4 other PRs waiting to be merged, and it will be even more stuff we'll need to synchronize.

I'd love to say "we sync now and we merge", but that's probably not realistic or wise. If we sync now, we still have to verify everything works, and that it's as stable as master is, which will take time: even more time if we don't get feedback from others, as there's only so much we can do. Besides, we cannot just stop the engines while we wait to check if this all works, as that might take forever.

That said, one option might indeed be to just merge: we sync, do a basic check via the existing demos, bump the version, tag the previous one, merge and then just work on that from now on. As explained above, there will be an initial stage where things might (will) break, but we might try and converge ASAP via issues and bugfixing. I expect some will complain (mostly because the pre-refcount version would not be updated anymore, so no fixes), but there isn't a single thing that makes everybody happy, and our job is to move the project forward, not freeze it in time. Another option is to close this PR (which I guess has become a bit of a mess) and create a new one where we replicate the same exact mechanisms, but starting from the current master: this might be easier than aligning to master and hope we didn't break anything (it would definitely have a clearer history), but it may be just wishful thinking.

Open to suggestions!

@MvEerd
Copy link
Copy Markdown
Contributor

MvEerd commented Feb 13, 2018

+1 👍
I agree that it's a worthy effort and would like to help it get merged
Time has unfortunately proven that seemingly only a handful of people try this branch until it becomes master.

To lessen the impact for people not wanting to use the refcount branch or relying on things that change; tagging a stable release which can be used until the new master becomes stable seems like it might be sufficient but I can't speak for everyone ofcourse

I see and appreciate why you'd want to close this PR for commit history and to redo it with the things you've learned but I'm somewhat hesitant for it because the amount of segfaults that popped up initially.
Most of those seem to have been resolved in this branch by the many commits you and @andreasg123 have made over time

I wish I could help with a proper commit to help align the C code but would most likely just introduce more bugs 😄
I would however be more than happy to test and fix any of the demos I can to help merge this if you decide to go with that option

@lminiero
Copy link
Copy Markdown
Member Author

lminiero commented Apr 4, 2018

Finally merging! Thanks all for the contributions and tremendous feedback!

@lminiero lminiero merged commit 7f59c27 into master Apr 4, 2018
@lminiero lminiero deleted the refcount branch April 4, 2018 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.