Skip to content

[7.x] Extract device type from user agent info (#69322)#71952

Merged
danhermann merged 2 commits intoelastic:7.xfrom
danhermann:backport_69322_ua_device_type
Apr 20, 2021
Merged

[7.x] Extract device type from user agent info (#69322)#71952
danhermann merged 2 commits intoelastic:7.xfrom
danhermann:backport_69322_ua_device_type

Conversation

@danhermann
Copy link
Copy Markdown
Contributor

Adds a device type into user agent processor

Matching Algorithm

Process is pretty simple, based on OS and browser extracted via UA parser lib, this PR creates few simple patterns based on those , correct device type is matched,

one pattern for example to match Desktop devices is this

- regex: '^(Windows$|Windows NT$|Mac OS X|Linux$|Chrome OS|Fedora$|Ubuntu$)'
so if extracted OS name is one of these, there are high chances that device is desktop. Same goes for mobile OS, along with this it tries to match browser names as well and correlates both results.

For bot, it looks for following words in any place

- regex: 'Bot|bot|spider|Spider|Crawler|crawler|AppEngine-Google'
Same goes for tablet etc

Eample:

In dev tools:

PUT _ingest/pipeline/user_agent
{
  "description" : "Add user agent information",
  "processors" : [
    {
      "user_agent" : {
        "field" : "agent"
      }
    }
  ]
}

PUT my-index-000001/_doc/my_id?pipeline=user_agent
{
  "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}
GET my-index-000001/_doc/my_id

result:

    "user_agent" : {
      "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
      "os" : {
        "name" : "Mac OS X",
        "version" : "10.10.5",
        "full" : "Mac OS X 10.10.5"
      },
      "name" : "Chrome",
      "device" : {
        "name" : "Mac",
        "type" : "Desktop"
      },
      "version" : "51.0.2704.103"
    }

image

Real user data analysis

Ran anaylysis on data from elastic.co using rum-agent which is deployed on observability clusters,

there were unique 60019 user agent strings in the data, extracted those strings and pushed them into es using this PR user agent ingest pipeline

and this PR was able to match mora than 99% successfully , here is the analysis in lens

Note: This doesn't represent traffic, it represents ratio of extracted categories from uniquer UA strings

image

Testing

Tested by building it via kibana

yarn es source
and tested via devtools as desribed above in example and screenshot

Backport of #69322

@danhermann danhermann added >enhancement :Distributed/Ingest Node Execution or management of Ingest Pipelines backport v7.13.0 labels Apr 20, 2021
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Apr 20, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann danhermann merged commit 0d87318 into elastic:7.x Apr 20, 2021
@danhermann danhermann deleted the backport_69322_ua_device_type branch April 20, 2021 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Distributed/Ingest Node Execution or management of Ingest Pipelines >enhancement Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v7.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants