Skip to content

Spike: Reduce disk usage of screenshots / network data #266

@andrewvc

Description

@andrewvc

We currently use more disk space than desired to store synthetics screenshots this issue serves to track our efforts to reduce the total amount of disk space.

We have a number of thoughts on how to improve this:

Lifecycle by status

We currently store the following datasets, browser, browser_network and browser_screenshot. We could migrate to browser, browser_network_{up|down}, and browser_screenshot_{up|down}. With this strategy segmenting indices by result status users could etablish a separate data lifecycle for failures vs. successes. Most users we've talked to would prefer to keep failures around longer than successes. Note that the core browser dataset does not have in the example listed above a namespace by up and down status. The rationale for not doing that is that it might be nice to depend on the lightweight core metadata existing and mostly focus on life cycle for the much more resource intensive network data and screenshot use cases.

This approach would require that we buffer the Elasticsearch documents from the run on the box running the synthetics tests. the reason we would need to do this is that we don't know whether the run is a success or failure until it is complete. We could simply do this with temp files on the system.

Sampled Screenshots

We could add an option to only index successful runs a certain percentage of the time, in a way that is somewhat reminiscent to APM sampling. Since users mostly care about failures, we would only store the core metadata for each run on most synthetics executions, say only one in 10 runs. Since most of the storage space is taken up by screenshots and network data this would result in a tenfold reduction in storage costs for monitors and mostly succeed. We would make sure to always store data for the run on any state change in other words if it had gone from a success to a failure we would store the first failure event in full.

One challenge with this approach is that we would need UI support for it to make sense, users would need to have a different experience when viewing a sampled run versus Hey fully hydrated run with all screenshots. We could point them at the most recent run with full screenshots if they wanted to see the screenshots.

Screenshot diffs

Another approach would be to steal from the world of video encoding. Most video codecs work by coding a mixture of keyframes and I-frames (intra frames). A keyframe is a full image, an I-frame is a much smaller set of pixels that changed since the last keyframe. We would not need to use an actual video codec for this, but rather could simply only encode keyframe images for a given step every once in awhile and use a simple diff algorithm to store the different pixels on subsequent runs as a PNG. A downside of this approach is that we would need to use some custom logic to layer the diff pixels on top of the keyframe. Additionally the synthetic's runner or heartbeat would need to retain the last keyframe and do the diff. It would make more sense for heartbeat to manage the retention of the last keyframe, potentially querying Elasticsearch to retrieve it.

Metadata

Metadata

Assignees

Labels

discussionDiscuss about API changes, enhancements

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions