Spike: Reduce disk usage of screenshots / network data

We currently use more disk space than desired to store synthetics screenshots this issue serves to track our efforts to reduce the total amount of disk space.

We have a number of thoughts on how to improve this:

### Lifecycle by status

We currently store the following datasets, `browser`, `browser_network` and `browser_screenshot`. We could migrate to `browser`, `browser_network_{up|down}`, and `browser_screenshot_{up|down}`. With this strategy segmenting indices by result status users could etablish a separate data lifecycle for failures vs. successes. Most users we've talked to would prefer to keep failures around longer than successes. Note that the core `browser` dataset does not have in the example listed above a namespace by up and down status. The rationale for not doing that is that it might be nice to depend on the lightweight core metadata existing and mostly focus on life cycle for the much more resource intensive network data and screenshot use cases.

This approach would require that we buffer the Elasticsearch documents from the run on the box running the synthetics tests. the reason we would need to do this is that we don't know whether the run is a success or failure until it is complete. We could simply do this with temp files on the system.

### Sampled Screenshots

We could add an option to only index successful runs a certain percentage of the time, in a way that is somewhat reminiscent to APM sampling.  Since users mostly care about failures, we would only store the core metadata for each run on most synthetics executions, say only one in 10 runs. Since most of the storage space is taken up by screenshots and network data this would result in a tenfold reduction in storage costs for monitors and mostly succeed. We would make sure to always store data for the run on any state change in other words if it had gone from a success to a failure we would store the first failure event in full.

One challenge with this approach is that we would need UI support for it to make sense, users would need to have a different experience when viewing a sampled run versus Hey fully hydrated run with all screenshots. We could point them at the most recent run with full screenshots if they wanted to see the screenshots.

### Screenshot diffs

Another approach would be to steal from the world of video encoding. Most video codecs work by coding a mixture of keyframes and I-frames (intra frames). A keyframe is a full image, an I-frame is a much smaller set of pixels that changed since the last keyframe. We would not need to use an actual video codec for this, but rather could simply only encode keyframe images for a given step every once in awhile and use a simple diff algorithm to store the different pixels on subsequent runs as a PNG.  A downside of this approach is that we would need to use some custom logic to layer the diff pixels on top of the keyframe. Additionally the synthetic's runner or heartbeat would need to retain the last keyframe and do the diff. It would make more sense for heartbeat to manage the retention of the last keyframe, potentially querying Elasticsearch to retrieve it.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Reduce disk usage of screenshots / network data #266

Lifecycle by status

Sampled Screenshots

Screenshot diffs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spike: Reduce disk usage of screenshots / network data #266

Description

Lifecycle by status

Sampled Screenshots

Screenshot diffs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions