Skip to content

Indices with "_source.enabled: false" same size as indices with "_source.enabled: true" #41628

@davemoore-

Description

@davemoore-

Elasticsearch version: 7.0.0

Plugins installed: []

JVM version: OpenJDK 1.8.0_191

OS version: Ubuntu 16.04 (or Elastic Cloud)

Description of the problem including expected versus actual behavior:

When setting _source.enabled: false in the index mapping, the _source should not be stored.

In 7.0.0, when two indices have identical data and mappings (except for one having _source.enabled: false), the indices will be almost exactly the same size. This isn't the expected behavior.

In 6.7.1, when two indices with identical data and mappings (except for one having source.enabled: false), the index with _source.enabled: false is roughly half the size of the one with _source enabled. This is the expected behavior.

Steps to reproduce:

Overview:

  1. Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.

  2. Create two index templates with identical mappings, but let the second template use _source.enabled: false. Put these two index templates in both clusters.

  3. Load data into the two indices on both clusters.

  4. Force merge the indices to a single segment.

  5. Compare the "Storage Size" of the two indices in Kibana for each cluster: /app/kibana#/management/elasticsearch/index_management/indices

More detailed:

Create the following templates and pipelines in the 7.0.0 cluster:

PUT _template/logs
{
  "index_patterns": ["logs"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "agent": {
        "type": "text"
      },
      "auth": {
        "type": "keyword"
      },
      "bytes": {
        "type": "long"
      },
      "clientip": {
        "type": "ip"
      },
      "httpversion": {
        "type": "double"
      },
      "ident": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "referrer": {
        "type": "keyword"
      },
      "request": {
        "type": "keyword"
      },
      "response": {
        "type": "long"
      },
      "verb": {
        "type": "keyword"
      }
    }
  }
}
PUT _template/logs-nosource
{
  "index_patterns": ["logs-nosource"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "agent": {
        "type": "text"
      },
      "auth": {
        "type": "keyword"
      },
      "bytes": {
        "type": "long"
      },
      "clientip": {
        "type": "ip"
      },
      "httpversion": {
        "type": "double"
      },
      "ident": {
        "type": "keyword"
      },
      "message": {
        "type": "text"
      },
      "referrer": {
        "type": "keyword"
      },
      "request": {
        "type": "keyword"
      },
      "response": {
        "type": "long"
      },
      "verb": {
        "type": "keyword"
      }
    }
  }
}
PUT _ingest/pipeline/logs
{
  "description": "Ingest pipeline for logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [
          "dd/MMM/yyyy:HH:mm:ss XX"
        ]
      }
    },
    {
      "remove": {
        "field": "timestamp"
      }
    }
  ]
}

Create the following indices and templates in the 6.7.1 cluster:

PUT _template/logs
{
  "index_patterns": ["logs"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "agent": {
          "type": "text"
        },
        "auth": {
          "type": "keyword"
        },
        "bytes": {
          "type": "long"
        },
        "clientip": {
          "type": "ip"
        },
        "httpversion": {
          "type": "double"
        },
        "ident": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "referrer": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "response": {
          "type": "long"
        },
        "verb": {
          "type": "keyword"
        }
      }
    }
  }
}
PUT _template/logs-nosource
{
  "index_patterns": ["logs-nosource"],
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "agent": {
          "type": "text"
        },
        "auth": {
          "type": "keyword"
        },
        "bytes": {
          "type": "long"
        },
        "clientip": {
          "type": "ip"
        },
        "httpversion": {
          "type": "double"
        },
        "ident": {
          "type": "keyword"
        },
        "message": {
          "type": "text"
        },
        "referrer": {
          "type": "keyword"
        },
        "request": {
          "type": "keyword"
        },
        "response": {
          "type": "long"
        },
        "verb": {
          "type": "keyword"
        }
      }
    }
  }
}
PUT _ingest/pipeline/logs
{
  "description": "Ingest pipeline for logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [
          "dd/MMM/yyyy:HH:mm:ss ZZ"
        ]
      }
    },
    {
      "remove": {
        "field": "timestamp"
      }
    }
  ]
}

Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the "logs" and "logs-nosource" indices on both clusters.

Force merge the indices to a single segment.

Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the "logs-nosource" index being roughly half the size of the "logs" index.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions