Emit predicted category using an appropriate JSON type.#877
Emit predicted category using an appropriate JSON type.#877przemekwitek merged 7 commits intoelastic:masterfrom
Conversation
0825066 to
1eb3b44
Compare
droberts195
left a comment
There was a problem hiding this comment.
If you'd rather use standard library functions to do the string to number conversions instead of my core::CStringUtils suggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.
tveasey
left a comment
There was a problem hiding this comment.
LGTM (modulo adhering to naming conventions).
lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc
Outdated
Show resolved
Hide resolved
… the field is really used in C++ code
przemekwitek
left a comment
There was a problem hiding this comment.
If you'd rather use standard library functions to do the string to number conversions instead of my
core::CStringUtilssuggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.
You might think that problem 1 can be solved by using stol instead of stoi, but this is not the case, because on Windows a Java long is 64 bits but a C++ long is 32 bits. The C++ type that reliably corresponds to Java's long is int64_t.
Sounds like you've gone through this kind of issues before. Thanks for sharing. I'll gladly use the library method for doing conversions.
Additionally, I've renamed dependent_variable_type to prediction_field_type as that's how the field is really used in the code
lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc
Outdated
Show resolved
Hide resolved
|
run elasticsearch-ci |
|
@przemekwitek to get the tests to pass you need to run clang-format: |
Thanks for the hint. I'm wondering if it was possible to make this message stand out more in the console output. I was looking for the failure reason yesterday before you hint but couldn't find it. |
It gets printed from here: ml-cpp/dev-tools/check-style.sh Lines 68 to 72 in 881e5d0 You are welcome to open a PR to change that so that it stands out more. If you want to use something like escape sequences to change colours you can use the PR build to check the escape sequences are interpreted correctly by Jenkins - make a PR that messes up the formatting in a source file and modifies |
Currently, classification analysis allows dependent variable of integer or boolean type but in the results field, the prediction field is always emitted as JSON string (so
truebecomes"true",1becomes"1"etc.).A solution to that problem is to pass desired
prediction_field_typefrom Java to C++ and make C++ emitbool,intorstringJSON field depending on theprediction_field_typepassed.This PR implements the C++ part of this solution.
Relates elastic/elasticsearch#49796