Testing: Add config option to accept numeric keywords#193
Testing: Add config option to accept numeric keywords#193adriansr merged 10 commits intoelastic:masterfrom adriansr:accept_numeric_keyword
Conversation
ycombinator
left a comment
There was a problem hiding this comment.
Ah, nice enhancement for even stricter validation! Thanks and LGTM.
andrewkroh
left a comment
There was a problem hiding this comment.
I prefer seeing JSON strings in the _source for fields mapped as keyword types, but since it's completely safe to map numbers to keywords (the inverse not being true) I'm ok loosening the restrictions to make developing packages a little simpler.
|
In the end I think it makes more sense for this to be a configuration option to be set on a per-test-case basis, so I've modified the pipeline test runner to accept this new option. |
|
Let me explain the rationale behind these changes. Across the Beats code base, there are some instances of fields that are defined as Examples of such fields are:
There's probably many others, but it's difficult to spot them. I worry especially about cases where we are receiving an arbitrary set of keys (ex. via JSON) from an external source and storing those verbatim in the event with keyword type. I see three possibilities for handling this:
|
|
Thank you for the detailed explanation and possible options. I would for the first or third option:
Actually I like the third one when the validation is configurable (strictness level). Keep in mind that this option requires updating the package-spec: https://github.com/elastic/package-spec/ |
|
@mtojek can you have another look? |
Keyword fields are not necessarily ingested as JSON strings. This updates the fields validator to accept numeric values too.
mtojek
left a comment
There was a problem hiding this comment.
just a short one about the clean code
mtojek
left a comment
There was a problem hiding this comment.
LGTM
nit: it would be nice to describe this feature in the howto docs (not blocking this PR)
* Add system test for Office 365 audiit This changes mappings for a few fields to boolean. And it changes client.port and source.port to be numbers in the JSON source to match their mappings. The remaining issues will be handled by elastic/elastic-package#193. This is the test output after fixing the booleans. FAILURE DETAILS: o365/audit : [0] parsing field value failed: field "client.port"''s Go type, string, does not match the expected field type: long [1] parsing field value failed: field "event.code"''s Go type, float64, does not match the expected field type: keyword [2] parsing field value failed: field "o365.audit.ActorYammerUserId"''s Go type, float64, does not match the expected field type: keyword [3] parsing field value failed: field "o365.audit.AzureActiveDirectoryEventType"''s Go type, float64, does not match the expected field type: keyword [4] parsing field value failed: field "o365.audit.InternalLogonType"''s Go type, float64, does not match the expected field type: keyword [5] parsing field value failed: field "o365.audit.LogonType"''s Go type, float64, does not match the expected field type: keyword [6] parsing field value failed: field "o365.audit.RecordType"''s Go type, float64, does not match the expected field type: keyword [7] parsing field value failed: field "o365.audit.UserType"''s Go type, float64, does not match the expected field type: keyword [8] parsing field value failed: field "o365.audit.Version"''s Go type, float64, does not match the expected field type: keyword [9] parsing field value failed: field "o365.audit.YammerNetworkId"''s Go type, float64, does not match the expected field type: keyword [10] parsing field value failed: field "source.port"''s Go type, string, does not match the expected field type: long --- Test results for package: o365 - END --- * Configure numeric_keyword_fields * Sync pipeline JS to get elastic/beats#22939
Keyword fields are not necessarily ingested as JSON strings. It's common to ingest numbers (integer or floating point) as keyword type.
This updates the fields validator to accept numeric values. They are converted to string so that any defined patterns can be checked.This adds a new configuration option,
numeric_keyword_fields, to pipeline test cases and system tests so that selected fields can have numeric type and be ingested into a text-like field.Example configuration: