[7.x] Extract device type from user agent info (#69322)#71952
Merged
danhermann merged 2 commits intoelastic:7.xfrom Apr 20, 2021
Merged
[7.x] Extract device type from user agent info (#69322)#71952danhermann merged 2 commits intoelastic:7.xfrom
danhermann merged 2 commits intoelastic:7.xfrom
Conversation
Collaborator
|
Pinging @elastic/es-core-features (Team:Core/Features) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a device type into user agent processor
Matching Algorithm
Process is pretty simple, based on OS and browser extracted via UA parser lib, this PR creates few simple patterns based on those , correct device type is matched,
one pattern for example to match Desktop devices is this
- regex: '^(Windows$|Windows NT$|Mac OS X|Linux$|Chrome OS|Fedora$|Ubuntu$)'so if extracted OS name is one of these, there are high chances that device is desktop. Same goes for mobile OS, along with this it tries to match browser names as well and correlates both results.
For bot, it looks for following words in any place
- regex: 'Bot|bot|spider|Spider|Crawler|crawler|AppEngine-Google'Same goes for tablet etc
Eample:
In dev tools:
result:
Real user data analysis
Ran anaylysis on data from elastic.co using rum-agent which is deployed on observability clusters,
there were unique 60019 user agent strings in the data, extracted those strings and pushed them into es using this PR user agent ingest pipeline
and this PR was able to match mora than 99% successfully , here is the analysis in lens
Note: This doesn't represent traffic, it represents ratio of extracted categories from uniquer UA strings
Testing
Tested by building it via kibana
yarn es sourceand tested via devtools as desribed above in example and screenshot
Backport of #69322