[discuss] Support (fairly large) sample data set package 

### Context:
The `shared-ux` team is in the process of re-architecturing the way sample data works inside Kibana. We want to support larger datasets (1GB+ in size), and bundling them with Kibana distributable is not scalable. Our immediate goal is to support large Observability data set. We have considered a few options so far:

* host data externally, either make it a full-blown remote service (similar to what Maps does today) or simply load from a remote endpoint (like an S3 bucket)
* dynamically generate as much of the data as we can on install
* load the sample data from a side-loaded plugin, (e.g. not in the distro, have it available only on Cloud)
* have this as part of EPR

### Is EPR a viable solution here?
Chatting with @joshdover, it seems that EPR already solved some of the problems we'd encounter with hosting data externally (scalability / latency / monitor). Also, seems like downloading a zip file with all the assets is exactly what we need here. However, it was pointed out that adding a new package of 1GB+ would cause serious performance concerns with the current Docker image. One thing to keep in mind is we'd likely like to add more datasets in the future (perhaps not every one will be as large though).

### Naming scheme
One dataset we'd like to support immediately is Observability data set, which is a combination of filebeat + metricbeat indices and data views. Its naming scheme does not comply with the recommended elastic data stream naming scheme. Also, I am not sure if every sample data set we'd like to add in the future would need to follow this naming scheme?

Opening this issue so we can discuss if leveraging EPR here would be an option in the first place and how much effort it would require to workaround those problems.

### Implementation plan

- [ ] Make data conform with https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme.
- [ ] Define a new dataset package type to include datasets, with optional dependencies to other packages (see https://github.com/elastic/package-spec/issues/351)
- [ ] Add support for this new package type [Publication of packages will be blocked till storage v2 is ready].
  - [ ] In elastic-package.
  - [ ] In package-registry.
  - [ ] In Fleet. 
    - [ ] https://github.com/elastic/package-spec/issues/406 
- [ ] Start filtering out packages in the docker image of the package-registry distribution (see https://github.com/elastic/package-registry/issues/724).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[discuss] Support (fairly large) sample data set package #346

Context:

Is EPR a viable solution here?

Naming scheme

Implementation plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[discuss] Support (fairly large) sample data set package #346

Description

Context:

Is EPR a viable solution here?

Naming scheme

Implementation plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions