Skip to content

[Security Solution] Smart limits for the package with prebuilt rules #187645

@banderror

Description

@banderror

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168

Summary

Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the security_detection_engine Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time, because we won't be able to release Milestone 3 of the prebuilt rule customization feature with the current limit of 2 versions per rule in the package.

Before we can release Milestone 3, we will need to increase back the number of versions per rule we ship in the package. In general, the more versions we ship, the better is the UX for upgrading prebuilt rules; the fewer versions we ship, the lighter is the package which also positively affects the UX and increases reliability.

Our goal is to find a balance between reliability and good UX and achieve both. For that, we need to come up with smart and efficient limits for the package with prebuilt rules.

Ideas

Total limits for the package as a whole:

  • Total number of package assets (in our case, rule versions). Currently set to 15000 in Kibana for all Fleet packages. We might want to enforce this limit on the package side and set it to a lower value.
  • Total size of the package in megabytes.

Per rule limits:

  • Hard cap: <= X number of versions no matter what. Exclude older versions and keep newer ones.
  • Hard time cap: exclude versions older than now - X days.
  • "Exponential" time caps: keep <= X versions created within last 3 months, <= Y within last 6 months, <= Z within last 12 months, etc; X < Y < Z < ..., e.g. 4 < 6 < 7. Time ranges grow exponentially while limits grow slower than that: logarithmically or linearly.
  • Time window cap: exclude versions if Elastic published more than X of them within a Y time window. E.g. it could be more than 2 per each 3 months. This could help prevent some "noisy" rules (in terms of the frequency of updates to them) from "eating" too much space of the package, as well as evicting older versions of a noisy rule by newer versions of the same rule.
  • We could include more versions for more popular rules (rules that are installed by users more often) and less versions for less popular rules.

Todo

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions