Adds ML supervised model Problem child package#2115
Adds ML supervised model Problem child package#2115alvarezmelissa87 merged 28 commits intoelastic:mainfrom
Conversation
6a60abe to
fc68b33
Compare
fc68b33 to
b83c6dd
Compare
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
8074733 to
d6d356e
Compare
3876651 to
0cac4a9
Compare
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
| }, | ||
| { | ||
| "by_field_name": "host.hostname", | ||
| "detector_description": "high sum by host", |
There was a problem hiding this comment.
As above, could the description here be edited to indicate the field it is running on blocklist_label? I don't have enough context on what information is held in blocklist_label to advise here.
There was a problem hiding this comment.
@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.
| }, | ||
| { | ||
| "by_field_name": "user.name", | ||
| "detector_description": "high sum by host", |
There was a problem hiding this comment.
Could we include the field name in the description of this detector - what is it a high sum of?
| "detectors": [ | ||
| { | ||
| "by_field_name": "user.name", | ||
| "detector_description": "high sum by user", |
There was a problem hiding this comment.
Could we include the field name in the description of this detector - what is it a high sum of?
There was a problem hiding this comment.
As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.
There was a problem hiding this comment.
I agree with Apoorva, field names aren't always the most user-centric and don't add that much value (imo) for the complexity they could add
| "detectors": [ | ||
| { | ||
| "by_field_name": "process.parent.name", | ||
| "detector_description": "high sum by parent process", |
There was a problem hiding this comment.
Could we include the field name in the description of this detector - what is it a high sum of?
| }, | ||
| { | ||
| "by_field_name": "process.parent.name", | ||
| "detector_description": "high sum by host", |
There was a problem hiding this comment.
Could we include the field name in the description of this detector - what is it a high sum of?
ajosh0504
left a comment
There was a problem hiding this comment.
Thanks for getting this done! Super excited to see these models in packages. Just one question. So once a user installs a package, what's the process for them to update/change artifacts produced by the package?
| @@ -0,0 +1,6 @@ | |||
| # ML Problem Child | |||
There was a problem hiding this comment.
Can we keep the name consistent everywhere? I'm seeing ProblemChild, problem child and ProblemChild being used. We've been going with ProblemChild so far.
There was a problem hiding this comment.
Updated to use ProblemChild everywhere in 7d33b9d
| @@ -0,0 +1,6 @@ | |||
| # ML Problem Child | |||
|
|
|||
| The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. | |||
There was a problem hiding this comment.
| The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. | |
| The Problem child package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity. |
packages/ml_problem_child/kibana/ml_module/problem-child-ml.json
Outdated
Show resolved
Hide resolved
| }, | ||
| { | ||
| "by_field_name": "host.hostname", | ||
| "detector_description": "high sum by host", |
There was a problem hiding this comment.
@peteharverson So the way we classify processes as malicious is using either a supervised model or a blocklist for things the model may have missed. I think adding this in the description will introduce complexity.
| "detectors": [ | ||
| { | ||
| "by_field_name": "user.name", | ||
| "detector_description": "high sum by user", |
There was a problem hiding this comment.
As above, I think introducing field names in the descriptions might introduce complexity, since users may not know what the fields exactly mean.
|
As with the DGA model, should the model file name here be something more understandable? It's currently @ajosh0504 - this file name is the name of the model id since the file name is used to determine the model id to send up to the put trained models api to install it to ES. Filenames being the asset id is consistent with the way fleet installs things. |
| @@ -0,0 +1,10 @@ | |||
| --- | |||
| description: "A pipeline of pipelines for ProblemChild detection" | |||
There was a problem hiding this comment.
Is there a way to make this description a bit clearer? I feel like "pipeline of pipelines" at first read could be a bit confusing
There was a problem hiding this comment.
How about just Pipelines for ProblemChild detection? Is this label actually shown anywhere in the UI?
|
|
||
| The Problem child model package contains the [Problem child model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LOtl) activity. | ||
|
|
||
| To download the assets, click **Settings** > **Install ML Problem child assets**. |
There was a problem hiding this comment.
I'm assuming this is within a cluster? Are there any steps needed before this? i.e. ProblemChild requires Platinum level capabilities, is that denoted anywhere (unless I missed it)? Maybe even a quick note in the README would be helpful
There was a problem hiding this comment.
The package manifest file has the license type that is required - that is enforced in Fleet and the installation will fail due to failing to meet the required license type.
|
cc @lcawl for package/model description text in the readme and such. |
brokensound77
left a comment
There was a problem hiding this comment.
So cool to see this finally going into the product (after being in an experimental state for so long). I left a few suggestions.
@spong FYSA, if I am not mistaken, this and #2352 and the first integrations outside of the prebuilt rules to add rules destined for the detection engine. Not sure if there were any assumptions or considerations to be made.
| "author": [ | ||
| "Elastic" | ||
| ], | ||
| "description": "A supervised machine learning model (ProblemChild) or its blocklist has identified\na suspicious Windows process event to be malicious activity.\n", |
There was a problem hiding this comment.
is the line break in the middle intentional?
There was a problem hiding this comment.
Shouldn't need a trailing one either
There was a problem hiding this comment.
These lines breaks were part of the release and haven't been changed. I'd need confirmation that it's okay to remove them.
| "license": "Elastic License", | ||
| "max_signals": 10000, | ||
| "name": "Machine Learning Detected a Suspicious Windows Event with a High Malicious Probability Score", | ||
| "query": "(problemchild.prediction:1 and problemchild.prediction_probability \u003e 0.98) or blocklist_label:1", |
There was a problem hiding this comment.
was the \u003e a unicode conversion error?
There was a problem hiding this comment.
Hmmm, I'm still seeing this in the latest. The other changes from 7d0d452 appear to have made it though? (i.e. packages/problem_child/manifest.yml title has been updated to ProblemChild instead ML ProblemChild)
There was a problem hiding this comment.
Thanks for catching this - this should now be a > instead of the encoded version 👍 Fixed in c3692a17d515ebdd31e95e03652f735d04db5748
There was a problem hiding this comment.
@spong - looks like running the elastic-package format command is what's causing the change from > to the encoded version. Is this expected?
cc @mtojek - also - running elastic-package check locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.
There was a problem hiding this comment.
Oh wow, looks like you are right
I guess it is ok to leave it converted then 🤷
There was a problem hiding this comment.
What we can see in the CI output is:
[2022-03-18T15:59:14.998Z] Error: checking package failed: formatting the integration failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child, failFast: true): walking through the integration files failed: formatting file failed (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json): file is not formatted (path: /var/lib/jenkins/workspace/est-manager_integrations_PR-2115/src/github.com/elastic/integrations/packages/problem_child/kibana/security_rule/9a2e372a-cbeb-4ad6-a288-017ef086324c.json)
so it confirms that you have to post the formatted code. Sometimes the formatted code might be harder to read, but we don't have a choice here if we want to depend on standard JSON libraries. Please run elastic-package format and post the formatted content.
There was a problem hiding this comment.
cc @mtojek - also - running elastic-package check locally is failing with a reference to the ml_model file name - though the update was made to package-spec to match that file name pattern and I've tested with the package-spec validator test so curious if this could be something going on locally? Might need to chat with you quickly on that one.
It depends on which elastic-package release you have locally and which one is in the go.mod in the Integrations.
BTW we're struggling a bit with updating the dependency - here.
| The ProblemChild package contains the [ProblemChild model and associated assets](https://www.elastic.co/blog/problemchild-generate-alerts-to-detect-living-off-the-land-attacks), which are used to detect living off the land (LotL) activity. | ||
|
|
||
| To download the assets, click **Settings** > **Install ML ProblemChild assets**. | ||
|
|
There was a problem hiding this comment.
is it worth elaborating on this to explain the relationship between the model, jobs, and rules. I believe the job and rules will be installed but in a disabled state. Also if the rule is enabled before the job, it will throw an error.
The schema for rules accepts defining enabled = true, though I am not totally sure on the ml jobs
There was a problem hiding this comment.
How about adding something to the Configuration section of the readme - after the to download the assets bit
Something like Ingest data with the installed ingest pipeline to enrich your indices with inference data and run the provided anomaly detection jobs.
If rules are enabled in the same way as described in https://www.elastic.co/guide/en/security/master/rules-ui-management.html, worst case we can link there. For example, add a sentence like, "For more information about activating the detection rules, refer to .
There was a problem hiding this comment.
The goal is to ultimately have a page in the docs that walks through the installation and use of this integration's assets, but since that's not available to link to yet, I agree with your suggestion for a couple of lines in the Configuration section of the readme as a stop-gap.
There was a problem hiding this comment.
Added in 11eb4a152ddadf13c75d6e469e97f01d6db9dd06
cc @lcawl, @peteharverson
There was a problem hiding this comment.
@lcawl, @peteharverson, @Winterflower
The latest screenshots and README file have been updated with more detailed instructions on how to use the package - as discussed - in c3692a17d515ebdd31e95e03652f735d04db5748
Would appreciate confirmation that it's what we want when you get a chance! 🙏
peteharverson
left a comment
There was a problem hiding this comment.
Latest changes LGTM
4fe4ad2 to
22cc5f4
Compare
| } | ||
| ] | ||
| }, | ||
| "id": "problem_child", |
There was a problem hiding this comment.
@alvarezmelissa87 What about this "id"? CI mentioned that field.
There was a problem hiding this comment.
Ah, thanks - missed this the first time! Updated in 38fd11b
|
Just commenting to update that we've come to consensus on how we'll handle the aforementioned issue around |
|
/test |


What does this PR do?
Adds the ML supervised model package for Problem child model.
Package includes:
Checklist
changelog.ymlfile.manifest.ymlfile to point to the latest Elastic stack release (e.g.^7.13.0).Author's Checklist
How to test this PR locally
Related issues
Screenshots