On this ticket, we could discuss the need and feasibility for dimensional lookup, to aid i18n, responsive annotation text lengths and assisting efficient, ad-hoc data exploration.
(image: Wikipedia)
Currently, textual data for display as legends, categorical axis tick labels etc. are expected to be in the data array, ie. currently we rely on denormalization. This is a good approach with Cartesian charts, especially when elastic-charts can expect a newly built denormalized array on every operation. But these items might be motivation for allowing a dimensional lookup:
- the user, or their clients or audience needs internationalization, and texts for dimension codes (eg. 'United Kingdom' for 'UK') are represented in several languates, uploaded into an index
- the report/visualization generally has constraints for text sizes; many systems solve it by using progressively shorter text for table headers, pie/treemap labels, categorical tick labels (also depending on horizontal vs vertical orientation) etc., for example, 'United Kingdom of Great Britain and Northern Ireland' -> 'United Kingdom' -> UK - storing just 5 languages and 3 text lengths would require 15 new fields on the document, and the fields would need to have naming conventions to encode the language/text length etc. so it'd get tedious very fast, and any change (eg. to 'Swaziland' -> 'eSwatini') would require reindexing
- There may be other types of lookup, eg. deriving the country from the city code, or deriving the region from the country code; if there's a maximum of one hop, it's a star schema, if it's more, it's a snowflake schema
- While denormalization works in theory, it fails once we consider ad-hoc reporting, eg. the user has a sudden interest in slicing/dicing, aggregating, annotating etc. on something that was not designed to be in the document
The discussion is needed, because
- such master data joins may eventually be needed or implemented for underlying ES query languages such as ESSQL, maybe lessening the need to do so
- even if a SQL JOIN were provided, there'd be technical merit to avoid re-querying some large dataset just because the user wants to now switch to another language, or responsive resizing leads to cramped space and there's less room for the table headers, tick labels or other annotations
Describe a solution
A star or snowflake schema for elastic-charts: besides accepting a facts table (documents/aggregates as now) it could accept an arbitrary, typically low number of dimension tables, while also providing the means for reaching into those dimension tables via (most importantly) equality checking of dimension values against document/aggregate values (star schema) or even allow multi-hop, eg. hierarchical lookup as in the case of eg. a product nomenclature or any other breakdown often needed for treemaps, sunburst charts, table etc. visualizations
While technically, denormalization works (with potentially large to infeasible runtime cost), it's at odds with our evolving visualization algebra, because it hides and obscures how facts (dimensions or granular aggregates) relate to dimensional data. But this very information is key for the user to navigate the chart design space.
Alternatives considered
- Continuing as now: denormalized everything
- ES schema is not denormalized, but ESSQL obtains a JOIN:
elastic-charts wouldn't need to change, but basic things eg. responsive resizing would need that ES is re-queried
- Handling dimensional lookup inside
elastic-charts (this ticket is biased toward this option)
- A comprehensive approach in which Kibana expressions and relational joins (including client-side joins where they're preferable, and server-side joins, once ES or ESSQL supports them) are considered at the Kibana level (ie. so it's equally available to Lens, Vega, 3rd party charts etc.
- Combining either or both of the latter two with crossfiltering, which also benefits from the efficiencies of not having to denormalize data
There's no rush on it as option 1 works right now, but it's good to foresee this as it often comes up in tools but it's often hard to refactor for such a general approach later, because by that time a solution could've evolved or conserved, maybe overly specialized, one-off solutions (eg. "let's add a PR for text in multiple languages", then "let's add a PR for differing text lengths" etc.). Even if we don't do anything now, we can keep the approach in mind or refer to it in other discussions.
Checklist
On this ticket, we could discuss the need and feasibility for dimensional lookup, to aid i18n, responsive annotation text lengths and assisting efficient, ad-hoc data exploration.
Currently, textual data for display as legends, categorical axis tick labels etc. are expected to be in the data array, ie. currently we rely on denormalization. This is a good approach with Cartesian charts, especially when
elastic-chartscan expect a newly built denormalized array on every operation. But these items might be motivation for allowing a dimensional lookup:The discussion is needed, because
Describe a solution
A star or snowflake schema for
elastic-charts: besides accepting a facts table (documents/aggregates as now) it could accept an arbitrary, typically low number of dimension tables, while also providing the means for reaching into those dimension tables via (most importantly) equality checking of dimension values against document/aggregate values (star schema) or even allow multi-hop, eg. hierarchical lookup as in the case of eg. a product nomenclature or any other breakdown often needed for treemaps, sunburst charts, table etc. visualizationsWhile technically, denormalization works (with potentially large to infeasible runtime cost), it's at odds with our evolving visualization algebra, because it hides and obscures how facts (dimensions or granular aggregates) relate to dimensional data. But this very information is key for the user to navigate the chart design space.
Alternatives considered
elastic-chartswouldn't need to change, but basic things eg. responsive resizing would need that ES is re-queriedelastic-charts(this ticket is biased toward this option)There's no rush on it as option 1 works right now, but it's good to foresee this as it often comes up in tools but it's often hard to refactor for such a general approach later, because by that time a solution could've evolved or conserved, maybe overly specialized, one-off solutions (eg. "let's add a PR for text in multiple languages", then "let's add a PR for differing text lengths" etc.). Even if we don't do anything now, we can keep the approach in mind or refer to it in other discussions.
Checklist
Kibana Cross Issueslistkibana cross issuetag is associated to the issue if any kibana cross issue is present