Skip to content

[ML] DF analytics: improve handling of text fields #51273

@LucaWintergerst

Description

@LucaWintergerst

when viewing the result of an outlier detection job, the following exception is thrown if the source index had text fields.

An error occurred loading the index data.
[illegal_argument_exception] Fielddata is disabled on text fields by default. Set fielddata=true on [foo] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

We should improve this by doing one of the following:

  • only display non-text fields
  • only allow sorting on non-text fields
  • try to access the corresponding .keyword multi field that exists in most cases
  • when running the outlier detection job, convert all text fields to type keyword (this would not be a Kibana problem then)

This problem can either happen when a user first clicks on view, or if he tries to sort on a text field.

To reproduce just index the following doc:

PUT test/_doc/1
{
  "foo": "bar",
  "number": 42
}

create the job (could also be done in the UI:

{
  "id": "test5",
  "source": {
    "index": [
      "test"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "test5",
    "results_field": "ml"
  },
  "analysis": {
    "outlier_detection": {
      "compute_feature_influence": true,
      "outlier_fraction": 0.05,
      "standardization_enabled": true
    }
  },
  "analyzed_fields": {
    "includes": [],
    "excludes": []
  },
  "model_memory_limit": "50mb",
  "version": "7.5.0"
} 

Then go to the UI and start the job. Then click on view

Metadata

Metadata

Labels

:mlFeature:Data Frame AnalyticsML data frame analytics featuresbugFixes for quality problems that affect the customer experiencev7.6.0

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions