Skip to content

Change text fields to keyword for Metricbeat#10318

Merged
ruflin merged 6 commits intoelastic:masterfrom
ruflin:text-metricbeat
Jan 29, 2019
Merged

Change text fields to keyword for Metricbeat#10318
ruflin merged 6 commits intoelastic:masterfrom
ruflin:text-metricbeat

Conversation

@ruflin
Copy link
Copy Markdown
Contributor

@ruflin ruflin commented Jan 24, 2019

Searching for type: text revealed a few fields which were text but I think should be keyword.

@ruflin ruflin added module review Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Jan 24, 2019
@ruflin ruflin self-assigned this Jan 24, 2019
@ruflin ruflin requested a review from webmat January 24, 2019 14:03
@ruflin ruflin requested review from a team as code owners January 24, 2019 14:03
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso objections?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If text indexing is needed, it should be added as a multi-field at .meta.text

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator Ok? Don't see why this should be text.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense as keyword to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor

@ycombinator ycombinator Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I screwed up. This should remain text IMHO. See my more-enlightened comment here: https://github.com/elastic/beats/pull/10318/files#r250673640

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat should we map this to url.original?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

Copy link
Copy Markdown
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if Ceph's children is made into a keyword as well.

Is this search limited to Metricbeat? Or did you search everywhere, and only Metricbeat had text field left?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If text indexing is needed, it should be added as a multi-field at .meta.text

@webmat
Copy link
Copy Markdown
Contributor

webmat commented Jan 24, 2019

Got an answer to my question via the tag. We should do the same for all beats. Adding a note to #8655, in the section "Fields changes"

@ruflin ruflin mentioned this pull request Jan 24, 2019
@webmat webmat changed the title Change text fields to keyword Change text fields to keyword for Metricbeat Jan 24, 2019
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be text. This value comes from the server.name setting in kibana.yml. The setting is meant for display purposes, according to it's documentation:

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

So it could be a string like "Marketing's Fantastic Analytics UI". In that case I can see the benefit of letting ES analyze the string. Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If searching for specific words (e.g. "marketing", to follow your example), then we simply need server.name.text for these searches.

I agree it's really useful in situations where these resources are not always well tagged & so on.

But the canonical field -- server.name -- should really be keyword. Defaulting to keyword all the time, then only adding text as a multi-field when needed ensures this progression over time (adding text) is never a breaking change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me but I have a tangential question:

IIRC if the server.name field is mapped as keyword in ES, it wouldn't also get a server.name.text multi-field automatically. Wouldn't we need to explicitly specify that somehow in the fields.yml so ES knows to create the multi-field? Or do all keyword fields in fields.yml automatically get a multi-field mapping added to them in the template via dynamic templates or something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point: having server.name as keyword will enable users to aggregate per kibana server. It's the reason in ECS, we've decided to push for keyword across the board, by default

Copy link
Copy Markdown
Contributor

@webmat webmat Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I missed your answer when posting my previous message.

The template will have to create the multi-field explicitly for the .text.

Going for keyword across the board is also a small performance gain, so the default is kw only. Then we add text strategically, where it makes sense.

I'm not sure the exact incantation to make the multi-field in Beats fields.yml.

Copy link
Copy Markdown
Contributor

@ycombinator ycombinator Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I understand now. Thanks. I'm good with changing this to keyword for now and adding a text multi-field later if we need.

@ruflin
Copy link
Copy Markdown
Contributor Author

ruflin commented Jan 25, 2019

@webmat For the text fields I focused on Metricbeat, but we should probably to a broader search.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat Sounds like a field that should be mapped to ECS?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. url.original?

I noticed a few more things in this field defs:

  • user => user.name
  • script => file.path
  • content_length => http.request.body.size
  • request_method => http.request.method
  • request_duration => event.duration, with an adjustment to nanos
  • pid => process.pid

No longer sure if any of those were discussed in #10218

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should review php_fpm in detail again. Can you comment the above on #10218 so we can track it there?

@ruflin
Copy link
Copy Markdown
Contributor Author

ruflin commented Jan 25, 2019

This should be ready for review.

Copy link
Copy Markdown
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ceph children field still looks like it's text. Other than that, looking good :-)

Note that I trust your grepping skill. I haven't double checked whether you had missed any.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Searching for `type: text` revealed a few fields which were text but I think should be keyword.
@ruflin ruflin merged commit 0f3ea0d into elastic:master Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Metricbeat Metricbeat module review Team:Integrations Label for the Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants