Skip to content

Accept strings that parse as numbers as numeric types#269

Closed
jsoriano wants to merge 2 commits intoelastic:masterfrom
jsoriano:accept-numeric-strings-as-float
Closed

Accept strings that parse as numbers as numeric types#269
jsoriano wants to merge 2 commits intoelastic:masterfrom
jsoriano:accept-numeric-strings-as-float

Conversation

@jsoriano
Copy link
Copy Markdown
Member

@jsoriano jsoriano commented Mar 2, 2021

elastic-package test fails if a document contains a string value with a number in a field with numeric type, such as:

[5] parsing field value failed: field "process.pid"'s Go type, string, does not match the expected field type: long (field value: 87)

But these values are accepted by Elasticsearch.

Modify the check to accept also strings that can be parsed as numbers.

@jsoriano jsoriano added the Team:Integrations Label for the Integrations team label Mar 2, 2021
@jsoriano jsoriano self-assigned this Mar 2, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@mtojek
Copy link
Copy Markdown
Contributor

mtojek commented Mar 2, 2021

We already have this feature, but the strict mode is enabled by default. If you prefer to enable conversion, please use test config (numeric_keyword_fields): https://github.com/elastic/integrations/blob/2b70a5437d506109aa9bc93516aebcb839999e01/packages/zoom/data_stream/webhook/_dev/test/pipeline/test-user.json-config.yml

cc @ycombinator

@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 2, 2021

@mtojek oh ok, thanks!

@jsoriano jsoriano closed this Mar 2, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

elasticmachine commented Mar 2, 2021

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: mtojek commented: /test

  • Start Time: 2021-03-03T09:01:40.758+0000

  • Duration: 5 min 13 sec

  • Commit: 4ba90b6

Trends 🧪

Image of Build Times

Steps errors 1

Expand to view the steps failures

Check
  • Took 3 min 1 sec . View more details on here
  • Description: make check

Log output

Expand to view the last 100 lines of log output

[2021-03-03T09:06:52.309Z] Clean used resources
[2021-03-03T09:06:52.309Z] 2021/03/03 09:06:52 DEBUG Clean build resources
[2021-03-03T09:06:52.309Z] 2021/03/03 09:06:52 DEBUG Build directory doesn't exist
[2021-03-03T09:06:52.309Z] 2021/03/03 09:06:52 DEBUG Clean built packages from the development stack
[2021-03-03T09:06:52.309Z] 2021/03/03 09:06:52 DEBUG Stack package is not part of the development stack (missing path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/stack/development/kubernetes)
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean all service logs
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Remove folder content (path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs)
[2021-03-03T09:06:52.310Z] Temporary service logs removed: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs
[2021-03-03T09:06:52.310Z] Done
[2021-03-03T09:06:52.310Z] + for d in test/packages/*/
[2021-03-03T09:06:52.310Z] + cd test/packages/log/
[2021-03-03T09:06:52.310Z] + elastic-package clean -v
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Enable verbose logging
[2021-03-03T09:06:52.310Z] Clean used resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean build resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Build directory doesn't exist
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean built packages from the development stack
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Stack package is not part of the development stack (missing path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/stack/development/log)
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean all service logs
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Remove folder content (path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs)
[2021-03-03T09:06:52.310Z] Temporary service logs removed: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs
[2021-03-03T09:06:52.310Z] Done
[2021-03-03T09:06:52.310Z] + for d in test/packages/*/
[2021-03-03T09:06:52.310Z] + cd test/packages/multiinput/
[2021-03-03T09:06:52.310Z] + elastic-package clean -v
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Enable verbose logging
[2021-03-03T09:06:52.310Z] Clean used resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean build resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Build directory doesn't exist
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean built packages from the development stack
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Stack package is not part of the development stack (missing path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/stack/development/multiinput)
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean all service logs
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Remove folder content (path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs)
[2021-03-03T09:06:52.310Z] Temporary service logs removed: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs
[2021-03-03T09:06:52.310Z] Done
[2021-03-03T09:06:52.310Z] + for d in test/packages/*/
[2021-03-03T09:06:52.310Z] + cd test/packages/nginx/
[2021-03-03T09:06:52.310Z] + elastic-package clean -v
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Enable verbose logging
[2021-03-03T09:06:52.310Z] Clean used resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean build resources
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Build directory doesn't exist
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean built packages from the development stack
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Stack package is not part of the development stack (missing path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/stack/development/nginx)
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Clean all service logs
[2021-03-03T09:06:52.310Z] 2021/03/03 09:06:52 DEBUG Remove folder content (path: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs)
[2021-03-03T09:06:52.310Z] Temporary service logs removed: /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/.elastic-package/tmp/service_logs
[2021-03-03T09:06:52.310Z] Done
[2021-03-03T09:06:52.310Z] + exit 1
[2021-03-03T09:06:52.310Z] Makefile:32: recipe for target 'test-stack-command' failed
[2021-03-03T09:06:52.310Z] make: *** [test-stack-command] Error 1
[2021-03-03T09:06:52.350Z] Post stage
[2021-03-03T09:06:52.363Z] Running in /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269/src/github.com/elastic/elastic-package
[2021-03-03T09:06:52.378Z] Archiving artifacts
[2021-03-03T09:06:52.393Z] Archiving artifacts
[2021-03-03T09:06:52.408Z] Archiving artifacts
[2021-03-03T09:06:52.439Z] Archiving artifacts
[2021-03-03T09:06:52.452Z] Recording test results
[2021-03-03T09:06:52.556Z] No test report files were found. Configuration error?
[2021-03-03T09:06:52.566Z] Error when executing always post condition:
[2021-03-03T09:06:52.572Z] Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from beats-ci-immutable-ubuntu-1804-1614762111374853585.c.elastic-ci-prod.internal/10.224.0.114:57164
[2021-03-03T09:06:52.572Z] 		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1800)
[2021-03-03T09:06:52.572Z] 		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
[2021-03-03T09:06:52.572Z] 		at hudson.remoting.Channel.call(Channel.java:1001)
[2021-03-03T09:06:52.572Z] 		at hudson.FilePath.act(FilePath.java:1158)
[2021-03-03T09:06:52.572Z] 		at hudson.FilePath.act(FilePath.java:1147)
[2021-03-03T09:06:52.572Z] 		at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:107)
[2021-03-03T09:06:52.572Z] 		at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:149)
[2021-03-03T09:06:52.572Z] 		at hudson.tasks.junit.JUnitResultArchiver.parseAndSummarize(JUnitResultArchiver.java:243)
[2021-03-03T09:06:52.572Z] 		at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:63)
[2021-03-03T09:06:52.572Z] 		at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:29)
[2021-03-03T09:06:52.572Z] 		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
[2021-03-03T09:06:52.572Z] 		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[2021-03-03T09:06:52.572Z] 		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[2021-03-03T09:06:52.572Z] 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-03-03T09:06:52.572Z] 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-03-03T09:06:52.572Z] hudson.AbortException: No test report files were found. Configuration error?
[2021-03-03T09:06:52.572Z] 	at hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:152)
[2021-03-03T09:06:52.572Z] 	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3314)
[2021-03-03T09:06:52.572Z] 	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
[2021-03-03T09:06:52.572Z] 	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
[2021-03-03T09:06:52.572Z] 	at hudson.remoting.Request$2.run(Request.java:375)
[2021-03-03T09:06:52.572Z] 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:73)
[2021-03-03T09:06:52.572Z] 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[2021-03-03T09:06:52.572Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-03-03T09:06:52.572Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-03-03T09:06:52.572Z] 	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:118)
[2021-03-03T09:06:52.572Z] 	at java.lang.Thread.run(Thread.java:748)
[2021-03-03T09:06:52.572Z] 
[2021-03-03T09:06:53.474Z] Running on worker-395930 in /var/lib/jenkins/workspace/t-manager_elastic-package_PR-269
[2021-03-03T09:06:53.500Z] [INFO] getVaultSecret: Getting secrets
[2021-03-03T09:06:53.573Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-03-03T09:06:55.400Z] + chmod 755 generate-build-data.sh
[2021-03-03T09:06:55.400Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/elastic-package/PR-269/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/elastic-package/PR-269/runs/4 FAILURE 313250
[2021-03-03T09:06:55.400Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/elastic-package/PR-269/runs/4/steps/?limit=10000 -o steps-info.json
[2021-03-03T09:06:56.853Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/elastic-package/PR-269/runs/4/tests/?status=FAILED -o tests-errors.json
[2021-03-03T09:06:57.554Z] Retry 1/3 exited 22, retrying in 1 seconds...
[2021-03-03T09:06:59.816Z] Retry 2/3 exited 22, retrying in 2 seconds...
[2021-03-03T09:07:02.078Z] Retry 3/3 exited 22, no more retries left.
[2021-03-03T09:07:02.078Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Ingest-manager/elastic-package/PR-269/runs/4/log/ -o pipeline-log.txt

@ycombinator
Copy link
Copy Markdown
Contributor

@mtojek Maybe we should add documentation for the test config options somewhere under https://github.com/elastic/elastic-package/tree/master/docs/howto?

@mtojek
Copy link
Copy Markdown
Contributor

mtojek commented Mar 2, 2021

It's a good idea to do. Frankly speaking I though we have already documented it somewhere, but it looks like it's missing.

@ycombinator
Copy link
Copy Markdown
Contributor

I'll make a PR for the docs.

@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 2, 2021

Umm, after doing some tests with numeric_keyword_fields, I have found that this is intended for the opposite of what I need 🤔

numeric_keyword_fields helps when the field is defined as keyword, but it is present as number in the document.

I need something for fields that are defined as numbers, but are present as strings in the documents, for example:

            "process": {
                "pid": "34"
            },

For process.pid defined as long, as is in ECS.

Should we add another option for this? Or should I reopen this PR? Or should I be doing something different? 🙂

For context, I am needing this to migrate the test files for PostgreSQL (elastic/integrations#747).

@ycombinator
Copy link
Copy Markdown
Contributor

I would be in favor of reopening this PR. From what I can tell the code in this PR is only converting string values to numbers if the field is defined as a number. So there is enough strictness already IMO. In fact, now I'm wondering if we could apply the same logic for numeric values that are intended to be keywords and not need the numeric_keyword_fields setting too. 🤔

The other option would be to introduce another setting, similar to numeric_keyword_fields, but I think it's not necessary given that the field definition already tells us that we want a field to be treated like a number.

@mtojek WDYT?

@mtojek mtojek reopened this Mar 2, 2021
@mtojek
Copy link
Copy Markdown
Contributor

mtojek commented Mar 2, 2021

Ok, I see the conversion direction now, missed that earlier (sorry about that).

Actually I would be in favor of preserving strict checks, but I understand this is a behavior that is enabled by default (dynamic field mapping)? We can merge this PR, but then it won't be consistent with numeric_keyword_fields concept, so maybe @ycombinator's advise is correct and we should get rid of it.

Let's review this PR to unblock @jsoriano and think what would be the solution for our cases.

/cc @andrewkroh as the Security Team uses numeric_keyword_fields in few places.

/cc @ruflin do we have any guidance in terms of fields strictness for agent/packages?

@mtojek mtojek requested review from mtojek and ycombinator March 2, 2021 19:34
Copy link
Copy Markdown
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor suggestion for adding a comment. Otherwise LGTM.

@ycombinator
Copy link
Copy Markdown
Contributor

It's a good idea to do. Frankly speaking I though we have already documented it somewhere, but it looks like it's missing.

I'll make a PR for the docs.

#270

@mtojek
Copy link
Copy Markdown
Contributor

mtojek commented Mar 3, 2021

/test

@andrewkroh
Copy link
Copy Markdown
Member

andrewkroh commented Mar 3, 2021

I think the processing pipeline should be using a convert processor (or similar feature) to ensure that the JSON types match the Elasticsearch types. The idea being that if we declare a value as a number in our fields.yml then we ought be delivering the value as a number. Relying on type coercion at indexing could result in ingest failures and then event loss, but if you handle this earlier you at least have an option on how to handle the data type exception (like you could use on_failure to drop the offending value or allow convert to fail, but always remove the source value).

If we do choose to go away from the strict type checking for strings that become numbers then it makes no sense to keep the numeric_keyword_fields option. This option allows for a validation exception when the _source contains a number but the mapping is as a keyword. This type of conversion carries no risk of an ES mapping exception. So if we kept this type of check by default but allowed the more risky string to number coercion then we'd have an "imbalance"

@mtojek
Copy link
Copy Markdown
Contributor

mtojek commented Mar 3, 2021

I think the processing pipeline should be using a convert processor (or similar feature) to ensure that the JSON types match the Elasticsearch types

I would prefer that option as well, but maybe I'm not familiar with historical reasons/customs/etc and it's natural to depend on dynamic mappings in Elasticsearch. @jsoriano did you consider just using convert?

As I said before I prefer strictness in this area, but I'm ok to merge this PR and then deprecate the numeric_keyword_fields option. Most likely it's something that needs be planned.

@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 3, 2021

Sounds good to me. I didn't try to modify the pipeline, I just imported it from beats, but I agree that it would be better to make it more strict, I'll give a try to add explicit conversions.

@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 3, 2021

elastic/integrations#747 updated to don't need this. Thanks for the suggestion!

@jsoriano jsoriano closed this Mar 3, 2021
@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 4, 2021

@mtojek @andrewkroh I am finding this issue on a metricset, I have tried to add an ingest pipeline and a processor in the stream, but none of them seemed to work. Should any of these work for a metrics data stream? or the only way is to fix this in Metricbeat?

postgresql/statement default:
[0] parsing field value failed: field "postgresql.statement.query.id"'s Go type, string, does not match the expected field type: long (field value: 1691311383)

@andrewkroh
Copy link
Copy Markdown
Member

I'm not aware of any reasons that a data stream for metrics would not allow an ingest pipeline to be used. So that's interesting and curious.

But I think Metricbeat should be sending a number for this value if that is its intent.

@jsoriano
Copy link
Copy Markdown
Member Author

jsoriano commented Mar 4, 2021

Open PR for Metricbeat: elastic/beats#24359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Integrations Label for the Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants