[Security Solution] PoC of the rule upgrade and installation workflows#144060
[Security Solution] PoC of the rule upgrade and installation workflows#144060banderror wants to merge 1 commit intoelastic:mainfrom
Conversation
829b248 to
65dd3ac
Compare
65dd3ac to
0f95ae4
Compare
ffe26ab to
311d70f
Compare
b868dec to
6a31992
Compare
7ca12fe to
d6f263f
Compare
d6f263f to
ee5b361
Compare
3a70856 to
1b03974
Compare
b42199b to
30dbd3f
Compare
xcrzx
left a comment
There was a problem hiding this comment.
Excellent and very thorough work on designing and testing those data models and core algorithms 🎉 Thank you for doing it, @banderror 👍 The inclusion of three different data models allows us to make an informed decision on which one will be the most effective for our needs.
I've followed the described steps and tested all three models locally together with their data retrieval queries, plus inspected their performance aspects with APM traces. Also looked through the diff algorithm implementation, API, and interfaces. Everything looks good to me, just left a couple of minor suggestions/things to consider for the future.
In terms of performance, the flat model shows better results. However, the difference between flat and composite v2 is not that big. The composite v1 model shows the worst results, and I recommend not using it as it doesn't allow effective data extraction, and some operations must be performed in memory.
Speaking of the simplicity of use, the flat structure is a clear winner. It allows us to request data in many different ways and to use arbitrary aggregations. Having that flexibility will allow us to extend the model easily in the future if needed. The composite data model could constrain us in the future as some queries will be impossible or hard to implement.
Another thing is backward compatibility. Our current code seems to be compatible with the flat structure, meaning we wouldn't need to write complex migrations if we choose it as opposed to the composite structures.
So overall, it would be better to proceed with the flat structure if we can overcome current limitations on the Fleet side mentioned here and here.
..._engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite2_saved_objects_client.ts
Outdated
Show resolved
Hide resolved
..._engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite2_saved_objects_client.ts
Outdated
Show resolved
Hide resolved
..._engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite2_saved_objects_client.ts
Outdated
Show resolved
Hide resolved
..._engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite2_saved_objects_client.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
We could further speed up and simplify this query and get rid of aggregation if the saved object client supported collapse:
GET .kibana/_search
{
"size": 10000,
"query": {
"term": {
"type": "security-rule-flat"
}
},
"sort": {
"security-rule-flat.rule_content_version": {
"order": "desc"
}
},
"collapse": {
"field": "security-rule-flat.rule_id"
},
"_source": {
"includes": [
"security-rule-flat.rule_id",
"security-rule-flat.rule_content_version",
"security-rule-flat.stack_version_min",
"security-rule-flat.stack_version_max"
]
}
}Maybe it is worth checking with the Core folks to see if that's not too hard to add.
There was a problem hiding this comment.
@xcrzx I'd definitely postpone this until later.
There was a problem hiding this comment.
This _search query could be replaced with a _mget query, which should be more efficient:
const findResult = await savedObjectsClient.bulkGet<RuleAssetFlatAttributes>(
rules.map((rule) => ({
type: RULE_ASSET_FLAT_SO_TYPE,
id: `${rule.id}:${rule.version}`,
}))
);There was a problem hiding this comment.
@xcrzx This will couple the implementation with the format of asset ids, which I'm not sure is something we'd want to do. Open to discussing this!
...ty_solution/common/detection_engine/prebuilt_rules/poc/diff_algorithm/calculate_rule_diff.ts
Outdated
Show resolved
Hide resolved
💔 Build FailedFailed CI Steps
Test Failures
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @banderror |
|
@xcrzx I completely agree with your conclusions. Let's proceed with the flat data model and try to fix the issues on the Fleet side that are currently preventing us from picking the flat model. If it doesn't work out we can return back and use the composite model v2. I updated the Summary in the PR description with some notes and links. At this point, we consider this PoC as completed and I'm closing this PR. Follow-up implementation work will be done in #148392. |
Resolves: #137446
NOTE: You can comment this PR description line-by-line in
security_solution/common/detection_engine/prebuilt_rules/poc/README.md.Summary
PrebuiltRuleToInstallschemastatusand two_reviewendpoints. I think we could skip implementing the_performendpoints within this POC because they seem to be less risky (given the_reviewendpoints).TODO: https://github.com/elastic/kibana/pull/144060comments.Open questions to clarify in parallel or later:
server/fleet_integration/handlers/install_prepackaged_rules.tsto install only the promotion "Endpoint Security" rule if it's not installed. Don't install all the rules and don't upgrade rules.Workflows
Stage 1 (both workflows):
GET /internal/detection_engine/prebuilt_rules/status.Stage 2 (upgrade workflow):
POST /internal/detection_engine/prebuilt_rules/upgrade/_review.POST /internal/detection_engine/prebuilt_rules/upgrade/_perform.Stage 2 (installation workflow):
POST /internal/detection_engine/prebuilt_rules/installation/_review.POST /internal/detection_engine/prebuilt_rules/installation/_perform.Data models
This POC implements and compares 3 new data models for historical versioned rule asset saved objects.
See the implementation in
server/lib/detection_engine/prebuilt_rules/logic/poc/saved_objects.We will need to choose the one we will proceed with. Criteria considered so far:
Flat model
Every object is a historical rule version that contains the rule id, the content version and the content itself.
server/lib/detection_engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_flat_saved_objects_type.ts:Composite model v1
Every object is a rule, all historical content is stored in its nested field (an array).
server/lib/detection_engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite_saved_objects_type.ts:Composite model v2
Every object is a rule. Historical version information is stored as an array of small objects
which is mapped as a nested field. Historical content is stored in a map where keys are formed
in a special way so that we can fetch individual content versions for many rules in bulk.
server/lib/detection_engine/prebuilt_rules/logic/poc/saved_objects/rule_asset_composite2_saved_objects_type.ts:API endpoints
See the implementation in
x-pack/plugins/security_solution/server/lib/detection_engine/prebuilt_rules/api.Get status of prebuilt rules
Response body:
Implementation:
server/lib/detection_engine/prebuilt_rules/api/get_prebuilt_rules_status/route.ts.Review rules that can be upgraded
Response body:
Implementation:
server/lib/detection_engine/prebuilt_rules/api/review_rule_upgrade/route.ts.Perform rule upgrade
Request body:
Response body:
Implementation: not implemented in this POC.
Review rules that can be installed
Response body:
Implementation:
server/lib/detection_engine/prebuilt_rules/api/review_rule_installation/route.ts.Perform rule installation
Request body:
Response body:
Implementation: not implemented in this POC.
API performance considerations
Should be fast and lightweight (< 1 second). Should:
POST /internal/detection_engine/prebuilt_rules/*/_reviewCould be slightly slow (1 to 5 seconds). Can:
POST /internal/detection_engine/prebuilt_rules/*/_performCould be moderately slow (< 1 minute). Can:
API testing
All the 3 added endpoints accept a
data_modelparameter so we could test their work and performancein different conditions and with different data models.
Generate test prebuilt rule assets (the endpoint will do it for all 3 data models).
Pick whatever number of versions you want to be generated per each rule.
POST /internal/detection_engine/prebuilt_rules/_install_test_assets { "num_versions_per_rule": 10 }Test get status endpoint
Test review installation endpoint
POST /internal/detection_engine/prebuilt_rules/installation/_review { "data_model": "flat" } POST /internal/detection_engine/prebuilt_rules/installation/_review { "data_model": "composite" } POST /internal/detection_engine/prebuilt_rules/installation/_review { "data_model": "composite2" }Test review upgrade endpoint
POST /internal/detection_engine/prebuilt_rules/upgrade/_review { "data_model": "flat" } POST /internal/detection_engine/prebuilt_rules/upgrade/_review { "data_model": "composite" } POST /internal/detection_engine/prebuilt_rules/upgrade/_review { "data_model": "composite2" }Rule fields
I did some research on rule fields to be able to determine which rule fields will be "customizable" and which will be "technical".
Please find the result and follow-up work to do in a dedicated ticket:
#147239
Diff algorithm
This section describes an algorithm that returns a 3-way diff between 3 rule versions: base, current, target.
Definition: diffable rule
We have two data structures that represent a prebuilt rule:
PrebuiltRuleToInstall: schema for a prebuilt rule asset (filesystem or fleet package based).RuleResponse: schema for an Alerting Framework's rule.These data structures are similar but different. In order to be able to run a diff between
an already installed prebuilt rule (
RuleResponse) and its next version shipped by Elastic(
PrebuiltRuleToInstall) we would first need to normalize both of them to a common interfacethat would be suitable for passing to the diff algorithm. This common interface is
DiffableRule.common/detection_engine/prebuilt_rules/poc/diffable_rule_model/diffable_rule.ts:Details
Definition: 3-way diff
We will use a 3-way diff algorithm for two things:
DiffableRule).In order to calculate a 3-way diff result for a field, we will need 3 input values:
And, our goal will be:
Potential reasons for a conflict:
rule's behavior or introducing unintended side-effects in the behavior from the user's point of view
Below is a
ThreeWayDiffresult's interface.common/detection_engine/prebuilt_rules/poc/diff_model/three_way_diff.ts:Details
The algorithm itself
GIVEN a list of prebuilt rules that can be upgraded (
currentVersion[])AND a list of the corresponding base asset saved objects (
baseVersion[])AND a list of the corresponding target asset saved objects (
targetVersion[])DO run the diff algorithm for every match of these 3 versions.
common/detection_engine/prebuilt_rules/poc/diff_algorithm/calculate_rule_diff.ts:Details
The algorithm's overall structure is fully implemented and works, but it uses a simple diff algorithm
for calculating field diffs. This algorithm is kinda nasty: it doesn't try to merge any values
and marks a diff as conflict if base version != current version != target version.
common/detection_engine/prebuilt_rules/poc/diff_algorithm/calculation/algorithms/simple_diff_algorithm.ts.The idea is to write more flexible algorithms for different rule fields that would generate fewer
conflicts and would try to automatically merge changes when it can be technically done and it won't
result in inintended changes in the rule from the user standpoint.