Fingerprint processor#14205
Fingerprint processor#14205ycombinator merged 28 commits intoelastic:masterfrom ycombinator:lb-processor-fingerprint
Conversation
There was a problem hiding this comment.
exported const MethodSHA1 should have comment (or a comment on this block) or be unexported
There was a problem hiding this comment.
exported type Method should have comment or be unexported
|
This reminds me of https://github.com/andrewkroh/beats-processor-fingerprint. As well as #1872. |
Thanks for flagging these, @andrewkroh; I wasn't aware. @urso Looks like there's prior art (the processor in Andrew's repo) as well as discussions going on in the ES team (see elastic/elasticsearch#34085, which is eventually linked off #1872 and the related PR: elastic/elasticsearch#47047). Does it still make sense for me to continue working on this PR? |
I think yes, it makes sense. The one by Andrew is a private one. Its not to be found in Beats. Maybe we can take some of it. there is also the Logstash fingerprint processor. I hope the ES one will be similar to the Logstash one. But I didn't check yet. |
|
Per the discussion in elastic/elasticsearch#47047 (comment), we are going to resume working on a @andrewkroh Since you've already built a fingerprint processor in your repo, did you want to put up a PR with that code to the |
No, but please copy anything you want from my repo. 👍 |
|
jenkins, test this |
There was a problem hiding this comment.
Do we want to have an option to ignore missing fields in case we have at least one field present?
There was a problem hiding this comment.
Sorry, but is this the same as your suggestion in #14205 (comment) or something different?
There was a problem hiding this comment.
almost. My suggestion originally was to add support to ignore missing fields. But apparently we can have other error types as well. Would it make sense to treat those other types as 'missing' as well?
There was a problem hiding this comment.
I see. So if we can't "get" a field for whatever reason, we treat it as missing and then, if the missing_fields option is set, ignore it. Hmm, I think this makes sense but let me just look into what other types of errors (besides common.ErrKeyNotFound) might be returned here.
There was a problem hiding this comment.
The only other error that can be returned here is if we try to get a value for a nested field (e.g. a.b.c), but the ancestor path to the field (a.b) does not resolve to a map. To me this also feels like a missing field as, even in this case, we still could not find a.b.c for the user. So I'm okay with collapsing the two error cases into one and adding ignore_missing handling to it.
There was a problem hiding this comment.
Test names can be filtered using regexes. For this I like to give it some structure instead of having full names (assuming that the parent test its name is expressive enough). e.g. just pass encoding as name to t.Run. (In case I have multiple parameters or want a name I use fmt.Sprintf("field=%v" , value)).
In this case the test name could just be TestEncoding/base64. I can run the test using go test -run Encoding/base64 (the / acts as a delimiter).
urso
left a comment
There was a problem hiding this comment.
The tests are not well isolated. A many tests have the same test event as input. It is okay to have a similar event, but the processor modifies the original map. A deep copy of the fields is required to guarantee some isolation.
dedemorton
left a comment
There was a problem hiding this comment.
All the deets look good. I just have a few minor comments.
There was a problem hiding this comment.
Tables with more than 3 columns don't look very good in our HTML output (see screen capture in previous comment). Docbook swallows all table attributes, so we can't do anything about that right now. You could remove the example row and provide the example as part of the description. Or wait for some shift in the universe that will make this right.
There was a problem hiding this comment.
I replaced the table with a definition list, like the one used in the Extract Array processor, for instance. Let me know what you think. Thanks!
|
@urso @dedemorton Thanks for your reviews. I believe I've addressed all your feedback now. This PR is ready for re-review, when you get a chance. Thanks again! |
dedemorton
left a comment
There was a problem hiding this comment.
LGTM! In this case, I think the list is actually easier to scan.
|
jenkins, test this |
There was a problem hiding this comment.
would it make sense to apply 'IgnoreMissing' here as well?
There was a problem hiding this comment.
I'm not sure about this one. This case would be reached if the user specified a field that resolved to a non-scalar value. So the field would not actually be missing in this case.
I see this one as more of an error on the user's part with maybe making a mistake in specifying the field so I think we should tell the user about it via an error.
WDYT?
|
jenkins, test this |
1 similar comment
|
jenkins, test this |
|
Travis CI is green. Jenkins CI is red because of |
* WIP: fingerprint processor * Implementing SHA256 fingerprinter * Sort source fields * Refactoring * Add TODO * Convert time fields to UTC * Removing unnecessary function * Adding SHA1 * WIP: add encoding * Cleanup * Running mage fmt * More methods + consolidating tests * Fleshing out tests * Adding test for target field * Adding documentation * Adding CHANGELOG entry * Fixing test * Converting tests to map * Isolating tests * Use io.Writer to stream in fields * Implement ignore_missing setting * Replace table with definition list * Adding `ignore_missing` to doc * using io.Fprintf * Use common.StringSet * Adding typed errors * Adding more typed errors * Adding license header
|
I am not sure where to move this issue forward, but it should be noted that the fingerprint processor for ingest processor creates inconsistent values when compared to using a hashing technique like md5, sha1, sha256, sha512 in any other software - includ Elastic software like logstash and filebeat. |

Resolves #11173.
This PR implements a
fingerprintprocessor, similar to Logstash'sfingerprintfilter plugin.The processor will take the following configuration options:
fieldsignore_missingfalsetarget_fieldfingerprintmethodsha256md5,sha1,sha256(default),sha384,sha512encodinghexhex(default),base32, orbase64