Skip to content

[ML] DFA job gets stuck when no field except the dependent variable is included in the analysis #55593

@blookot

Description

@blookot

Elasticsearch version (bin/elasticsearch --version): 7.6.2

JVM version (java -version): running on ESS

Description of the problem including expected versus actual behavior:

i'm runnng a regression data frame analytics job and it stops at 50% (loading data is 100% and analyzing is 0%)
can't understand why...

Steps to reproduce:

  1. load the csv file attached (rename it with csv extension)
  2. create the ml regression job on it (data analytics with this index source, all the rest default)
  3. start the job

Here is an example of ml job:

{
  "id": "test8",
  "description": "",
  "source": {
    "index": [
      "disk_usage"
    ],
    "query": {
      "match_all": {}
    },
    "_source": {
      "includes": [],
      "excludes": []
    }
  },
  "dest": {
    "index": "test8",
    "results_field": "ml"
  },
  "analysis": {
    "regression": {
      "dependent_variable": "disk_percent",
      "prediction_field_name": "disk_percent_prediction",
      "training_percent": 80,
      "randomize_seed": -2904501521181443000
    }
  },
  "analyzed_fields": {
    "includes": [],
    "excludes": []
  },
  "model_memory_limit": "100mb",
  "create_time": 1587562995031,
  "version": "7.6.2",
  "allow_lazy_start": false
}

logs don't tell anything:

2020-04-22 15:43:15 | instance-0000000000 | Created analytics with analysis type [regression]
-- | -- | --
  | 2020-04-22 15:43:17 | instance-0000000000 | Estimated memory usage for this analytics to be [18.2mb]
  | 2020-04-22 15:43:17 | instance-0000000000 | Starting analytics on node [{instance-0000000002}{3pArSzZmQpiVzw8sQqmcQA}{FUHdB0WDRU-gNIz1SpthHQ}{10.43.1.93}{10.43.1.93:19669}{l}{logical_availability_zone=zone-0, server_name=instance-0000000002.4e4d9d9dbfd3428da12363c78f9aa352, availability_zone=europe-west1-b, ml.machine_memory=1073741824, xpack.installed=true, instance_configuration=gcp.ml.1, ml.max_open_jobs=20, region=unknown-region}]
  | 2020-04-22 15:43:17 | instance-0000000000 | Started analytics
  | 2020-04-22 15:43:17 | instance-0000000002 | Creating destination index [test8]
  | 2020-04-22 15:43:18 | instance-0000000002 | Finished reindexing to destination index [test8]
  | 2020-04-22 15:59:06 | instance-0000000002 | Finished analysis
  | 2020-04-22 15:59:06 | instance-0000000000 | Stopped analytics

disk_usage.txt

Metadata

Metadata

Labels

:mlMachine learning>bug

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions