Index templates give us unmatched control for optimizing Elasticsearch performance. However, with great power comes great complexity.
In this advanced 3600+ word guide, I‘ll cover all the nuances so you can squeeze every ounce of speed from your Elasticsearch cluster.
How Index Templates Work Under the Hood
Before diving into configurations, let‘s explore what exactly templates do under the hood…
Index Templates VS Component Templates
There are two separate APIs for templates in the latest Elasticsearch:
-
Index Templates
Define settings, mappings, and aliases applied to matching indexes
-
Component Templates
Reusable blocks for settings, mappings, and aliases. Compounded together into Index Templates.
For example, I may have a component for @timestamp mapping, and another for handling Kubernetes metadata fields.
I can then reference these components from my main application index template. This keeps configurations DRY, while allowing mix and match reuse.
Internally, when I create an index called app-logs-2022:
- ES checks all index templates for ones that match
app-logs-* - My designated template is found
- The template config, along with any nested component templates, gets copied
- This merged configuration is applied to customize
app-logs-2022at initialization
Any subsequent changes to the base templates will only affect new indexes going forward.
Pro tip: I can reindex existing data manually to propagate template changes when needed.
Template Resolution Order
Index patterns are powerful but can sometimes match multiple templates. For example:
app-logs-*
logs-*
*
In situations like this, template order resolution works as:
- Most specific patterns ranked first (
app-logs-*) - Highest priority template wins
- Last template loaded takes precedence
So I always put app-specific templates like app-logs-* first in priority:
PUT app-logs-template
{
"index_patterns": ["app-logs-*"],
"priority": 100,
...
While using default priority 0 for generic fallbacks:
PUT logs-template
{
"index_patterns": ["logs-*"],
...
}
Now let‘s optimize configurations!
Optimizing Index Settings for Scale
Index settings control everything from storage to caching behavior. Configured correctly, they enable smooth scalability.
Shard Calculations
As mentioned in brief earlier, shards partition indexes across nodes for parallel processing. The number of shards has direct impact on performance and size limitations.
Let‘s explore proper shard calculations more closely…
Shard Count Guide
Total Documents / Max Documents Per Shard = Number of Shards
As a rule of thumb for logs:
1 Shard per 1–10 GB (depending on complexity)
So if I estimate 60 billion documents over 10 years:
- 60 billion docs
- Avg 500 bytes per doc = 30 TB Total
- 30 TB / 5 TB per shard = 6 shards
We could even go further by accounting for replica overhead:
- 30 TB / (5 TB * (1 + # of replicas))
- With 1 replica: 30 TB / 10 TB = 3 shards
Calculating expected size and growth helps right-size shards architecturally.
Replicas vs Redundancy
Adding index replicas improves redundancy but reduces write performance. The right balance depends on data sensitivity:
| Use Case | Replicas |
|---|---|
| Cache / Job Logs | 0 replicas |
| Business Transactions | 1 replica |
| Financial Systems | 2 replicas |
0 replicas gives maximum write speed while still having redundancy across nodes in case of failures.
For critical data, having an additional synchronous replica only cuts write throughput in half while preventing data loss. The cost of 2+ replicas may be reasonable for robustness.
In summary, gauge risk vs performance needs, but having at least one replicated shard strikes a good balance.
Refresh Interval
The refresh interval controls write visibility. A 5s interval batches updates nicely:

Source: elastic.co
Faster refreshes improve consistency but impact throughput. Generally having queries available within seconds is reasonable.
Advanced Mapping Optimization
In addition to settings, reconstructing the right index mappings can pay huge dividends long-term.
Avoiding Dynamic Mappings
By default, Elasticsearch dynamically guesses field types as data comes in. However, these automatic assumptions are rarely optimal and get baked in immediately:
PUT my_index
POST my_index/_doc
{
"timestamp": "2022-01-01", // added as text
"views": 100 // added as integer
}
GET my_index/_mapping
Notice timestamp got added as text rather than a date! Once set, we can‘t easily change it without reindexing all data.
Avoiding dynamic mappings by defining explicit ones upfront prevents this fate:
PUT my_index
{
"mappings": {
"properties": {
"timestamp": { "type": "date" }
}
}
}
POST my_index/_doc
{
"timestamp": "2022-01-01" // correctly added as date!
}
So even if you don‘t know all future fields, starting with expected ones avoids nasty surprises.
Multi-Field Data Types
For high cardinality string fields like usernames, emails, or tags, keywords make ideal index performance:
"mappings": {
"properties": {
"email": {
"type": "keyword"
}
}
}
However, keyword fields don‘t support partial or fuzzy searches.
Multi-fields map the same data to multiple data types, giving you the best of both worlds:
"mappings": {
"properties": {
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Now you can aggregate on email.keyword while getting full text search on email!
Nested Fields
Nested fields are useful for semi-structured array data like sensors, product attributes, etc:
PUT products
{
"mappings": {
"properties": {
"variants": {
"type": "nested",
"properties": {
"color": { "type": "keyword" },
"price": { "type": "float" }
}
}
}
}
}
POST products/_doc
{
"name": "T-shirt",
"variants": [
{ "color": "red", "price": 19.99 },
{ "color": "blue", "price": 24.99 }
]
}
This keeps variants indexed as arrays while allowing direct filtering on colors or prices!
The key benefit is avoiding joins for aggregated reporting:
GET products/_search
{
"query": {
"nested": {
"path": "variants",
"query": { "match": { "variants.color": "red" } }
}
}
}
So properly architecting index schemas unlocks speed!
Putting It All Together
Building on earlier examples, here is an expert-level index template:
PUT high-volume-logs-template
{
"index_patterns": ["high-volume-logs-*"],
"priority": 100,
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index": {
"refresh_interval": "10s",
"translog": { "durability": "async" }
}
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"hostname": {
"type": "keyword"
},
"apps": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"version": { "type": "integer" }
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Let‘s walk through some highlights:
- 3 optimized shards – Selected based on our projected index size math
- 1 replica – For redundancy without 2x write slowdown
- 10s refresh – Batching visibility updates
- Async translogs – Favoring writes over 100% consistency
- Nested apps – For direct app data filtering & aggregation
- Multi-field message – Enabling full text + performant keywords
With blueprints like this, I know indexes get ultra-optimized right from creation!
Common Pitfalls & Troubleshooting
While templates provide great control, there are still plenty of footguns:
Pitfall #1 – Existing Index Remapping
Updates to component and index templates apply only to newly created indexes.
The bloated legacy indexes keep running suboptimally!
This trips up many new template creators.
Solutions:
- Reindex existing data
- Create fresh indexes from templates then swap aliases
Pitfall #2 – Unintended Template Overrides
Order of template loading and priorities matter.
If a new team member adds logs-* template, it may override finer-grained app indexes unexpectedly.
Solutions:
- Name templates by usage explicitly
- Add override protection via high priority numbers
Pitfall #3 – Troubleshooting Resolution Issues
Debugging why wrong/multiple templates applied requires checking resolution order:
- Retrieved all template index patterns
- Matched against created index name
- Checked priorities
- Loaded order if ties
I‘ve run into mismatches even as an expert!
Solutions:
- Use GET API to view index templates, components, and index settings
- Check index name against expected patterns
- Monitor wildcard templates carefully
- Assign unique priority numbers
Learning where templates are applied takes practice as we shape large clusters!
Key Takeaways
Getting the most from templates requires mastering both configurations and architectural practices:
🔹 Prevent default dynamic mappings with explicit properties
🔹 Model index schemas around usage patterns
🔹 Estimate shard counts based on projected sizes
🔹 Configure index replicas depending on robustness needs
🔹 Remember new templates only apply to future indexes
While indexing details can seem esoteric at first, they are truly the foundations on which large-scale Elasticsearch architectures are built!
Hopefully this guide has shed light on best practices and pitfalls alike. Optimizing indexes with templates may take some up-front effort, but pays back exponentially over the cluster lifecycle.
Now you have an extensive toolbox to make any index sing 😊! Please drop me any follow-up questions.


