Allow controlling the random number generator for RTrees training by pwuertz · Pull Request #16251 · opencv/opencv

pwuertz · 2019-12-28T15:08:28Z

Currently the random number generator for training cv::ml::RTrees is seeded with a hardcoded value, which ensures that every forest trained with the same data yields identical results.

This PR adds a method for modifying the training seed per instance in order to allow training new or
additional solutions if desired.

Ultimately, this enables parallelized training / evaluation via forest subdivision. I.e. use N threads with M trees each instead of training a single N*M wide forest.

There are other benefits to this as well, like determining the variance of a model for a given set of hyperparameters.

Update: PR modified to make use of global theRNG() for RTrees training, making random forests random by default. Previous deterministic behavior is achieved by calling cv::setRNGSeed before training.

pwuertz · 2019-12-28T18:26:03Z

Ok, so the way RTrees and RTreesImpl are designed there is no way of adding something without breaking the ABI. This is ok when switching from, say 4.2 to 4.3, right?

alalek · 2019-12-28T19:27:14Z

uint64_t seed

Please follow the existed API. Pass RNG instead of seed. See setAnnealEnergyRNG().

Also there is theRNG() (thread-local object). Perhaps it make sense to use it instead of custom per-algorithm).

ABI checker seems broken / misconfigured ...

alalek · 2019-12-28T22:13:44Z

modules/ml/src/rtrees.cpp

    vector<float> varImportance;
    vector<int> allVars, activeVars;
    RNG rng;
+    RNG initRng = RNG((uint64) - 1);


What is about this: alalek@pr16251_r ?
Lets re-use global theRNG() (it allows to save / restore RNG state too).

cv::theRNG() = RNG(your train seed); ... call train() ...

Personally, I'd be ok with this. But it would break the default behavior of RTrees.
People might be relying on deterministic RTrees results.
Honestly, I was actually surprised that OpenCV does this ^^.

pwuertz · 2019-12-28T22:29:54Z

Pass RNG instead of seed. See setAnnealEnergyRNG().

Applied the suggested change. The method now uses RNG instances instead of RNG states.

This makes the method inaccessible for Python though :/.

Also I think this sort of hides the true intent of the function - setting a fixed, specific seed for deterministic training.

Please follow the existed API.

cv::setRNGSeed seems to be a prominent API example using integer seed values and is closely related to the use-case of the function I suggested. Also I'd argue that setting integer seeds for quasi-random processes is a more common API pattern.

Also there is theRNG() (thread-local object). Perhaps it make sense to use it instead of custom per-algorithm).

I think the current idea of having a custom, per-algorithm RNG is very handy. It makes training reproducible by default and independent from other parts of your application.
Sharing a global or thread local RNG for training would break this behavior.

With a per-algorithm seed-setter the current behavior is maintained, yet it enables use-cases where (deterministic) randomness is required (batched or continuous training).

pwuertz · 2019-12-28T23:24:09Z

I think your solution (shared RNG) looks way better. You just have to decide if dropping the deterministic default behavior is acceptable.

It shouldn't be a massive shock to anyone, random forests are called "random" for a reason ;).

alalek · 2019-12-29T20:47:48Z

Thank you! Looks good to me.

This patch should go into 3.4 branch first. We will merge changes from 3.4 into master regularly (weekly/bi-weekly).

So, please:

change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
rebase your commits from master onto 3.4 branch. For example:
git rebase -i --onto upstream/3.4 upstream/master
(check list of your commits, save and quit (Esc + "wq" + Enter)
where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
push rebased commits into source branch of your fork (with --force option)

Note: no needs to re-open PR, apply changes "inplace".

alalek

Well done 👍

pwuertz · 2019-12-29T21:23:01Z

Thanks for your help!

alalek reviewed Dec 28, 2019

View reviewed changes

Use global RNG for training RTrees.

8aebef2

pwuertz changed the base branch from master to 3.4 December 29, 2019 21:01

alalek approved these changes Dec 29, 2019

View reviewed changes

pwuertz changed the title ~~Allow setting RNG seed for RTrees training.~~ Allow controlling the random number generator for RTrees training Dec 29, 2019

opencv-pushbot pushed a commit that referenced this pull request Dec 29, 2019

Merge pull request #16251 from pwuertz:rtrees_set_rng

b4759d7

opencv-pushbot merged commit 8aebef2 into opencv:3.4 Dec 29, 2019

pwuertz deleted the rtrees_set_rng branch December 29, 2019 22:49

alalek mentioned this pull request Dec 31, 2019

Merge 3.4 #16271

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow controlling the random number generator for RTrees training#16251

Allow controlling the random number generator for RTrees training#16251
opencv-pushbot merged 1 commit intoopencv:3.4from
pwuertz:rtrees_set_rng

pwuertz commented Dec 28, 2019 •

edited

Loading

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

alalek commented Dec 28, 2019 •

edited

Loading

Uh oh!

alalek Dec 28, 2019

Uh oh!

pwuertz Dec 28, 2019

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

alalek commented Dec 29, 2019

Uh oh!

alalek left a comment

Uh oh!

pwuertz commented Dec 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

pwuertz commented Dec 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

alalek commented Dec 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek Dec 28, 2019

Choose a reason for hiding this comment

Uh oh!

pwuertz Dec 28, 2019

Choose a reason for hiding this comment

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

pwuertz commented Dec 28, 2019

Uh oh!

alalek commented Dec 29, 2019

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

pwuertz commented Dec 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pwuertz commented Dec 28, 2019 •

edited

Loading

alalek commented Dec 28, 2019 •

edited

Loading