URI parts ingest processor #65150
Conversation
|
Pinging @elastic/es-core-features (Team:Core/Features) |
|
@leehinman and @andrewkroh, I would be interested in your feedback (and/or others, as appropriate) on whether this provides the functionality you described in #57481. |
andrewkroh
left a comment
There was a problem hiding this comment.
Nice, this will make writing pipelines simpler for us.
|
|
||
| URL url; | ||
| try { | ||
| url = new URL(value); |
There was a problem hiding this comment.
Are there limitations on which schemes that this can parse? Would these parse?
- ftp://ftp.is.co.za/rfc/rfc1808.txt
- ldap://[2001:db8::7]/c=GB?objectClass?one
- telnet://192.0.2.16:80/
Perhaps using java.net.URI would be more forgiving and not require a URLStreamHandler to be loaded.
There was a problem hiding this comment.
Similarly parts of the URL are required for parsing to work?
There was a problem hiding this comment.
@andrewkroh, thanks for looking it over and commenting. I switched over to java.net.URI which does support more schemes including all three of the examples you list above.
Right now, no parts of a URI are required beyond what java.net.URI needs to construct an instance. Is that what you would prefer?
| throw new IllegalArgumentException("unable to parse URL [" + value + "]"); | ||
| } | ||
| var urlParts = new HashMap<String, Object>(); | ||
| urlParts.put("domain", url.getHost()); |
There was a problem hiding this comment.
ECS isn't clear on what's correct here, but what does getHost return for bracked IPv6 addresses?
@elastic/ecs Should url.domain include the brackets that are required when using IPv6 addresses in URLs? https://www.ietf.org/rfc/rfc2732.txt
There was a problem hiding this comment.
Per this source, getHost() will return the IPv6 address enclosed in the brackets.
Since brackets are required with a literal IPv6 address, url.domain should include the brackets. We can improve the description of url.domain in the ECS docs to clarify.
|
This is great @danhermann 👍 Note that I think it's fine if this processor doesn't populate the domain breakdown fields |
|
There's a request for a registered domain processor (#57476). Perhaps if that gets implemented this one could internally use it. |
I can add that on this processor as an option once the registered domain processor is completed. |
|
Last functional question on this one -- should the processor be renamed to Edit: Though I do see that the ECS field names are all prefixed with |
|
This is awesome. Thank You. |
I don't really mind either way, but I think it'll indeed be clearer to users that they can use it on all sorts of URIs if it's named |
|
@elasticmachine run elasticsearch-ci/2 |
|
cc: @elastic/es-ui in case Kibana auto-complete needs to be updated with this new processor. |
andreidan
left a comment
There was a problem hiding this comment.
LGTM, thanks for adding this processor @danhermann
Left a minor comment
| if (userInfo.contains(":")) { | ||
| int colonIndex = userInfo.indexOf(":"); | ||
| uriParts.put("username", userInfo.substring(0, colonIndex)); | ||
| uriParts.put("password", colonIndex < userInfo.length() ? userInfo.substring(colonIndex + 1) : ""); |
There was a problem hiding this comment.
would this fail with IndexOutOfBounds for http://user:@www.google.com:80/blarg.gif#ref ? (no password)
Shall we add a test for this?
There was a problem hiding this comment.
In that case, the password is set to an empty string. I'll add another test case to make that clear.
Thanks for the heads up @danhermann! I've opened elastic/kibana#83915 to add support for this in the UI. |
Adds a new
uri_partsprocessor that decomposes a URI into its constituent parts. E.g.:results in:
The processor relies on the
java.net.URIclass to parse the URI and attempts to map the parts into ECS fields. Some ECS fields are not part of the URI spec, so see the table below for how those are handled:.exists in the pathusernameorpasswordthough they are commonly presented with theusername:passwordconvention. Theusernameandpasswordfields are parsed out of theuser_infofield on a best-effort basis if a:exists.passwordaboveAlso introduces a new module for ingest processors.
Closes #57481