[DOCS] Add field extraction use cases to scripting docs#71596
[DOCS] Add field extraction use cases to scripting docs#71596lockewritesdocs merged 6 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-docs (Team:Docs) |
|
run elasticsearch-ci/docs |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
|
||
| [discrete] | ||
| [[field-extraction-split]] | ||
| ==== Split values in a field by a separator (Dissect) |
|
|
||
| [[scripting-field-extraction]] | ||
| ==== Field extraction | ||
| The goal of this use case is simple; you have fields in your data with a bunch of |
There was a problem hiding this comment.
I tend to avoid the word "use case". I think folks don't think of themselves as having a use case. They've got a thing they want to do, but they don't call it a "use case".
There was a problem hiding this comment.
True -- I think we can just say, "The goal of field extraction is simple..."
|
|
||
| There are two options at your disposal: | ||
|
|
||
| * <<grok-basics,Grok>> uses a pattern like a regular expression that supports |
There was a problem hiding this comment.
It is a regular expression, just a kind of unexpected dialect. May be more correct to say "is a regular expression dialect that supports aliased expression reuse." - no need to explain that it sits "on top", I guess, if you say it that way.
There was a problem hiding this comment.
I'm +1 on that revision, but I still think it's valuable to call out the 1-to-1 mapping of regex in grok. Something like:
Grok is a regular expression dialect that supports aliased expressions that you can reuse. Because Grok sits on top of regular expressions, any regular expressions are valid in grok as well.
| aliased expressions that you can reuse. Grok sits on top of regular expressions, so | ||
| any regular expressions are valid in grok as well. | ||
| * <<dissect-processor,Dissect>> extracts structured fields out of a single text | ||
| field within a document, but doesn't use regular expressions. Instead, dissect uses |
There was a problem hiding this comment.
"extracts structured fields out of text using a pattern that defines delimiters"?
There was a problem hiding this comment.
How about:
Dissect extracts structured fields out of text, using delimiters to define the matching pattern. Unlike grok, dissect doesn't use regular expressions.
| } | ||
| ---- | ||
| // TEST[continued] | ||
| <1> This condition ensures that the script doesn't crash even if the pattern of |
There was a problem hiding this comment.
s/doesn't crash/doesn't emit anything/?
There was a problem hiding this comment.
++ I'l make that change.
| ---- | ||
| [2021-04-27T16:16:34.699+0000][82460][gc,heap,exit] class space used 266K, capacity 384K, committed 384K, reserved 1048576K | ||
| ---- | ||
| // NOTCONSOLE |
There was a problem hiding this comment.
You only need to declare something "NOCONSOLE" if the paranoid "is this json?" detector fails the build if you don't add the tag. Try removing this - if the build succeeds then you don't need it. I don't think you need it here.
There was a problem hiding this comment.
You're right! I removed both mentions of //NOTCONSOLE and local checks all passed.
| [source,txt] | ||
| ---- | ||
| emit("used" + ' ' + gc.usize + ', ' + "capacity" + ' ' + gc.csize + ', ' + "committed" + ' ' + gc.comsize) | ||
| ---- |
There was a problem hiding this comment.
This is a good thing for debugging but not super useful in production. In production you'll want to just emit(gc.usize) or something, right? Worth pointing out.
There was a problem hiding this comment.
Why is this not useful in production? Is it too slow or just not something that people would typically want to do?
There was a problem hiding this comment.
Yeah, its somewhat more slow than returning just the number you need, but I think the bigger thing is that folks will typically want to have range queries for the numbers or do math with them or group them with aggs or something - basically if you get numbers I think typically you want to extract them as long or double. But what you've got it useful to look at, especially because we don't yet have a way to emit all of the extracted values at once. When we have that I think it'd be easier to have folks do that and fetch them all in the fields, even when they are just looking at things.
| the value from `gc.usize` and a comma. This pattern repeats for the other data that you | ||
| want to retrieve: | ||
|
|
||
| [source,txt] |
There was a problem hiding this comment.
This is [source,painless]. I don't know that it makes a difference, but it is painless code.
There was a problem hiding this comment.
Good eye -- I'll change that to [source,painless].
* [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback
* [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback
… (#72648) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
… (#72647) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
#72646) * [DOCS] Add field extraction use cases to scripting docs (#71596) * [DOCS] Add field extraction use cases to scripting docs * Adding file * Remove extra space * Add dissect pattern to split and retrieve data * Fix list spacing * Incorporating review feedback * Adding type to console results
This PR adds a new page for common scripting use cases as part of a larger effort in #71576. This PR adds field extraction use cases for writing scripts that:
Preview link: https://elasticsearch_71596.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/common-script-uses.html
Relates to #71576