Elasticsearch version: 7.0.0
Plugins installed: []
JVM version: OpenJDK 1.8.0_191
OS version: Ubuntu 16.04 (or Elastic Cloud)
Description of the problem including expected versus actual behavior:
When setting _source.enabled: false in the index mapping, the _source should not be stored.
In 7.0.0, when two indices have identical data and mappings (except for one having _source.enabled: false), the indices will be almost exactly the same size. This isn't the expected behavior.
In 6.7.1, when two indices with identical data and mappings (except for one having source.enabled: false), the index with _source.enabled: false is roughly half the size of the one with _source enabled. This is the expected behavior.
Steps to reproduce:
Overview:
-
Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.
-
Create two index templates with identical mappings, but let the second template use _source.enabled: false. Put these two index templates in both clusters.
-
Load data into the two indices on both clusters.
-
Force merge the indices to a single segment.
-
Compare the "Storage Size" of the two indices in Kibana for each cluster: /app/kibana#/management/elasticsearch/index_management/indices
More detailed:
Create the following templates and pipelines in the 7.0.0 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss XX"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}
Create the following indices and templates in the 6.7.1 cluster:
PUT _template/logs
{
"index_patterns": ["logs"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}
PUT _template/logs-nosource
{
"index_patterns": ["logs-nosource"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"properties": {
"@timestamp": {
"type": "date"
},
"agent": {
"type": "text"
},
"auth": {
"type": "keyword"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "ip"
},
"httpversion": {
"type": "double"
},
"ident": {
"type": "keyword"
},
"message": {
"type": "text"
},
"referrer": {
"type": "keyword"
},
"request": {
"type": "keyword"
},
"response": {
"type": "long"
},
"verb": {
"type": "keyword"
}
}
}
}
}
PUT _ingest/pipeline/logs
{
"description": "Ingest pipeline for logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{COMBINEDAPACHELOG}"
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/yyyy:HH:mm:ss ZZ"
]
}
},
{
"remove": {
"field": "timestamp"
}
}
]
}
Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the "logs" and "logs-nosource" indices on both clusters.
Force merge the indices to a single segment.
Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the "logs-nosource" index being roughly half the size of the "logs" index.
Elasticsearch version: 7.0.0
Plugins installed: []
JVM version: OpenJDK 1.8.0_191
OS version: Ubuntu 16.04 (or Elastic Cloud)
Description of the problem including expected versus actual behavior:
When setting
_source.enabled: falsein the index mapping, the_sourceshould not be stored.In 7.0.0, when two indices have identical data and mappings (except for one having
_source.enabled: false), the indices will be almost exactly the same size. This isn't the expected behavior.In 6.7.1, when two indices with identical data and mappings (except for one having
source.enabled: false), the index with_source.enabled: falseis roughly half the size of the one with_sourceenabled. This is the expected behavior.Steps to reproduce:
Overview:
Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.
Create two index templates with identical mappings, but let the second template use
_source.enabled: false. Put these two index templates in both clusters.Load data into the two indices on both clusters.
Force merge the indices to a single segment.
Compare the "Storage Size" of the two indices in Kibana for each cluster:
/app/kibana#/management/elasticsearch/index_management/indicesMore detailed:
Create the following templates and pipelines in the 7.0.0 cluster:
Create the following indices and templates in the 6.7.1 cluster:
Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the
"logs"and"logs-nosource"indices on both clusters.Force merge the indices to a single segment.
Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the
"logs-nosource"index being roughly half the size of the"logs"index.