Skip to content

cv::ml::TrainData cannot be instantiated without responses #8927

@zhmu

Description

@zhmu

When porting an application from OpenCV 2.4.11 to 3.2.0 on Windows 10/x64, it turned out the machine learning library was redesigned. One of the changes is the unification of training data to a TrainData class. When porting the code involved, I hit a minor snag: the OpenCV 2.4.x code is

svm.train(descriptor, cv::Mat(), cv::Mat(), cv::Mat(), svm_params);

Which I rewrote to:

auto trainData = cv::ml::TrainData::create(descriptor, cv::ml::ROW_SAMPLE, cv::Mat());
svm->train(trainData);

This compiles, but you will get an out of range crash in TrainDataImpl::setData() at the line:

if( varType.at<uchar>(ninputvars) == VAR_CATEGORICAL )

After some investigation, it turns out the code assumes you have responses, even if the classifier does not need them (in my case, a SVM one-class linear classifier was used). Further digging revealed that loadCSV() works around this by constructing a matrix of just zeroes - and indeed, changing my code to the following prevents the problem:

cv::Mat responses(descriptor.rows, 1, CV_32F, cv::Scalar(0));
auto trainData = cv::ml::TrainData::create(descriptor, cv::ml::ROW_SAMPLE, responses);
svm->train(trainData);

My feeling is that my initial cv::ml::TrainData::create() approach should work and that it should take care of the responses on its own. The code in loadCSV() feels more like a workaround to me for the underlying problem.

I'm not an export on machine learning, but I'm willing to write a patch to remedy this problem (or to review one if someone else feels up for it!)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions