-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Description
When porting an application from OpenCV 2.4.11 to 3.2.0 on Windows 10/x64, it turned out the machine learning library was redesigned. One of the changes is the unification of training data to a TrainData class. When porting the code involved, I hit a minor snag: the OpenCV 2.4.x code is
svm.train(descriptor, cv::Mat(), cv::Mat(), cv::Mat(), svm_params);
Which I rewrote to:
auto trainData = cv::ml::TrainData::create(descriptor, cv::ml::ROW_SAMPLE, cv::Mat());
svm->train(trainData);
This compiles, but you will get an out of range crash in TrainDataImpl::setData() at the line:
if( varType.at<uchar>(ninputvars) == VAR_CATEGORICAL )
After some investigation, it turns out the code assumes you have responses, even if the classifier does not need them (in my case, a SVM one-class linear classifier was used). Further digging revealed that loadCSV() works around this by constructing a matrix of just zeroes - and indeed, changing my code to the following prevents the problem:
cv::Mat responses(descriptor.rows, 1, CV_32F, cv::Scalar(0));
auto trainData = cv::ml::TrainData::create(descriptor, cv::ml::ROW_SAMPLE, responses);
svm->train(trainData);
My feeling is that my initial cv::ml::TrainData::create() approach should work and that it should take care of the responses on its own. The code in loadCSV() feels more like a workaround to me for the underlying problem.
I'm not an export on machine learning, but I'm willing to write a patch to remedy this problem (or to review one if someone else feels up for it!)