Context:
The shared-ux team is in the process of re-architecturing the way sample data works inside Kibana. We want to support larger datasets (1GB+ in size), and bundling them with Kibana distributable is not scalable. Our immediate goal is to support large Observability data set. We have considered a few options so far:
- host data externally, either make it a full-blown remote service (similar to what Maps does today) or simply load from a remote endpoint (like an S3 bucket)
- dynamically generate as much of the data as we can on install
- load the sample data from a side-loaded plugin, (e.g. not in the distro, have it available only on Cloud)
- have this as part of EPR
Is EPR a viable solution here?
Chatting with @joshdover, it seems that EPR already solved some of the problems we'd encounter with hosting data externally (scalability / latency / monitor). Also, seems like downloading a zip file with all the assets is exactly what we need here. However, it was pointed out that adding a new package of 1GB+ would cause serious performance concerns with the current Docker image. One thing to keep in mind is we'd likely like to add more datasets in the future (perhaps not every one will be as large though).
Naming scheme
One dataset we'd like to support immediately is Observability data set, which is a combination of filebeat + metricbeat indices and data views. Its naming scheme does not comply with the recommended elastic data stream naming scheme. Also, I am not sure if every sample data set we'd like to add in the future would need to follow this naming scheme?
Opening this issue so we can discuss if leveraging EPR here would be an option in the first place and how much effort it would require to workaround those problems.
Implementation plan
Context:
The
shared-uxteam is in the process of re-architecturing the way sample data works inside Kibana. We want to support larger datasets (1GB+ in size), and bundling them with Kibana distributable is not scalable. Our immediate goal is to support large Observability data set. We have considered a few options so far:Is EPR a viable solution here?
Chatting with @joshdover, it seems that EPR already solved some of the problems we'd encounter with hosting data externally (scalability / latency / monitor). Also, seems like downloading a zip file with all the assets is exactly what we need here. However, it was pointed out that adding a new package of 1GB+ would cause serious performance concerns with the current Docker image. One thing to keep in mind is we'd likely like to add more datasets in the future (perhaps not every one will be as large though).
Naming scheme
One dataset we'd like to support immediately is Observability data set, which is a combination of filebeat + metricbeat indices and data views. Its naming scheme does not comply with the recommended elastic data stream naming scheme. Also, I am not sure if every sample data set we'd like to add in the future would need to follow this naming scheme?
Opening this issue so we can discuss if leveraging EPR here would be an option in the first place and how much effort it would require to workaround those problems.
Implementation plan