[WIP] Introduce the GrokProcessor#14132
Conversation
a813799 to
f782041
Compare
9a22b72 to
27d1cc2
Compare
There was a problem hiding this comment.
Stream is unfortunately java8 only, since we are likely to back port ingest to 2.x too we should try to avoid using java8 only constructs and apis.
But instead of fixing this, I think we can just remove this method as it is not used for now?
|
This looks great! I know it is WIP, but I left a couple of comments. |
There was a problem hiding this comment.
the grok field can be final too
There was a problem hiding this comment.
Debugging why this is flaky.
Value of doc.get("val") sometimes equals 123.42 and sometimes equals 123.41999816894531.
There was a problem hiding this comment.
s/HashMap<String, Object> fields/Map<String, Object> fields
|
Left a couple more comments. I think we should also add docs for the grok processor to the |
There was a problem hiding this comment.
What if we just load the grok expression only from the config dir. (ES_HOME/config/ingest/grok) We just make sure that when we create the distribution we also package the the ingest config into the zip file. Instead of loading stuff from the class path we can get it via the Environment class. (check the geoip PR how it is injected there)
This has as a nice side effect that users can also define their own named grok expressions without us adding extra code.
There was a problem hiding this comment.
(we just load all files in the ES_HOME/config/ingest/grok directory)
There was a problem hiding this comment.
I am confused... currently the geoip stores the database under resources/config and falls back on using the classpath to fetch the file.
There was a problem hiding this comment.
ok, I think I got it. I followed what jvm-example does to manage its config files. An additional assemblies task is executed in maven to load the config files into config/ingest during runtime so that this is the correct relative path from the environment: https://github.com/elastic/elasticsearch/pull/14132/files#diff-77d47a95d3d1f49700f95a7daff92e13R42
|
@martijnvg Similar to your comment about reusing the same GeoIP instance across GeoProcessors. Do you think the same should be done with Grok? As in to use the same Grok instance across all grok processors to avoid re-loading the same configs each time a processor is created? UPDATE: NEVERMIND |
b3cc633 to
3a00cc8
Compare
There was a problem hiding this comment.
I don't think we should make the patterns dir configurable? Outside the ES_HOME directory ES has insufficient permissions to read files. I think the patterns dir should always be $ES_HOME/config/ingest/grok/patterns.
a51728f to
1485785
Compare
plugins/ingest/build.gradle
Outdated
There was a problem hiding this comment.
I don't think the custom task isn't needed, if all resources are placed under src/main/packaging/config.
@rjernst added logic that bundles custom files that is placed in src/main/packaging into the plugin zip file: https://github.com/elastic/elasticsearch/blob/master/buildSrc/src/main/groovy/org/elasticsearch/gradle/plugin/PluginBuildPlugin.groovy#L97
There was a problem hiding this comment.
Yes, please use src/main/packaging!
|
@talevy left two minor comments, other then that LGTM. |
Also moved all processor classes into a subdirectory and introduced a ConfigException class to be a catch-all for things that can go wrong when constructing new processors with configurations that possibly throw exceptions. The GrokProcessor loads patterns from the resources directory. fix resource path issue, and add rest-api-spec test for grok fix rest-spec tests changes: license, remove configexception, throw IOException add more tests and fix iso8601-hour pattern move grok patterns from resources to config fix tests with pom changes, updated IngestClientIT with grok processor update gradle build script for grok deps and test configuration move config files to src/main/packaging move Env out of Processor, fix test for src/main/packaging change add docs clean up test resources task update Grok to be immutable - Updated the Grok class to be immutable. This means that all the pattern bank loading is handled by an external utility class called PatternUtils. - fixed tabs in the nagios patterns file's comments
[WIP] Introduce the GrokProcessor
Also moved all processor classes into a subdirectory and introduced a
ConfigException class to be a catch-all for things that can go wrong
when constructing new processors with configurations that possibly throw
exceptions. The GrokProcessor loads patterns from the resources
directory.
Running your first Grok Pipeline
pull changes from this PR
launch a single node with ingest plugin using the
IngestRunnerFind a log file you wish to parse
Create your desired pipeline in Elasticsearch
Use the elasticsearch python client to ingest your logs
your documents should be parsed and ready for searching within Elasticsearch