Elasticsearch provides a bulk API that allows executing multiple CRUD (create, read, update, delete) operations with a single API request. Using the bulk API can help reduce overhead and improve indexing performance compared to making separate requests for each operation. This comprehensive technical guide will illustrate how to leverage the bulk API to perform multiple Elasticsearch operations efficiently.
Bulk API Overview
The Elasticsearch bulk API allows batching multiple indexing, create, update and delete operations into a single API call. This is more efficient than separate requests for each action.
To use the bulk API, you send an HTTP POST request to the _bulk endpoint with the operations defined in the request body in newline delimited JSON format.
Here is an example bulk request:
POST _bulk
{"index":{"_index":"myindex","_id":1}}
{"title":"My First Document"}
{"update":{"_index":"myindex","_id":1}}
{"doc":{"title":"My First Updated Document"}}
{"delete":{"_index":"myindex","_id":1}}
This request performs three actions in one call:
- Indexes a document
- Updates the document
- Deletes the document
The bulk API will process each action sequentially and return a response containing the status for each operation.
Bulk Request Body Format
The request body for the Elasticsearch bulk API uses the following newline delimited JSON structure:
<action_and_meta_data>\n
<optional_source>\n
<action_and_meta_data>\n
<optional_source>\n
...
Looking closer at the anatomy of a bulk request:
-
Action and Metadata: Each action line must include the action descriptor (
index,create,update,delete) and the metadata like index, type and id. -
Source: The optional source parameter contains the document source for
indexandcreateactions. Forupdateactions, it contains the partial document.
Action and Metadata Parameters
The action and metadata lines include key information like the index, id, and action type:
{
"index": {
"_index": "myindex",
"_id": 1
}
}
This indexes a document with an ID of 1 in the myindex index.
The metadata supports the following parameters:
_index– The target index name_id– The document ID_type– The type, deprecated in ES 6+
Update Action Source
The update action type allows partial document updates by passing the doc parameter:
{ "update": { "_index": "myindex", "_id": 1 } }
{ "doc" : { "title" : "Updated" } }
This will update just the title field to "Updated" in that document.
You can see the basic gist – define actions with metadata, and documents as the sources. Now let‘s discuss when you should use bulk APIs instead of single requests.
When to Use Bulk vs. Single Requests
The bulk API improves performance by bundling requests into a single call rather than individual API hits. This reduces network round trips and minimizes serialization costs.
Based on benchmarks from Elasticsearch, the bulk API delivers significant gains in indexing speed over single requests:
[INSERT IMAGE]As you can see, the bulk API achieves up to a 300% increase in indexing performance based on the number of documents indexed. These gains require properly formatted bulk requests within reasonable size limits.
In general, it is best practice to use the bulk API when:
- Indexing a batch of new documents
- Making lots of creates, updates or deletes
- Reindexing large datasets during migration
For simple queries and returning single documents, standard gets are just fine. The bulk API excels at writing, modifying and deleting documents in bulk.
Handling Bulk Request Failures
When executing a bulk request, pay close attention to the response to handle failures properly. The bulk API response will return the status for each operation.
A sample response containing a failed delete:
{
"took":107,
"errors": true,
"items":[
{
"index":{
"_index":"myindex",
"_id":"1",
"_version":1,
"result":"created",
"_shards":{"total":2,"successful":1,"failed":0},
"created":true,
"status":201
}
},
{
"delete":{
"_index":"myindex",
"_id":"1",
"status":404,
"error":"document not found"
}
}
]
}
Note the 404 status and error for the delete action, indicating that document was not found.
When a failure occurs:
- The bulk request will continue executing other actions
- The entire request will be marked "errors":true however
- You must handle the failed actions appropriately in code
This highlights the need to check for errors before assuming bulk actions succeeded. Often retries or recovery procedures have to be coded additionally.
Idempotent Bulk Requests
Idempotence means that an operation can be performed multiple times without changing the result. This is an important consideration with the bulk API.
To make bulk requests idempotent:
- Uniquely identify documents through
_idrather than just_index - Check the response codes to avoid assuming success
- Design the indexing system to handle duplicate requests
This prevents documents being indexed twice if a bulk request is made multiple times.
Bulk Request Import Tools
In addition to raw bulk requests via clients, Elasticsearch provides additional tools for bulk importing and ingesting data:
Logstash Bulk Import
Logstash is a popular tool for collecting and transforming data before loading into Elasticsearch. To bulk import from Logstash:
- Specify
stdout { codec => json }in Logstash config - Pipe Logstash output to Elasticsearch
_bulkendpoint
This streams the transformed data in newline delimited JSON format directly into Elasticsearch in an efficient manner.
Kafka Bulk Import
Similarly, Kafka connect can be used to stream data from Kafka topics directly into Elasticsearch using the bulk endpoint. This leverages Kafka‘s distributed streaming architecture for scalable ingestion pipelines.
There are also other open source libraries like elastic-import and PyBulk which simplify sending bulk requests from various data stores.
Performance Considerations
When importing data, be aware of the following performance considerations:
Thread Pool
Bulk requests operate on the bulk thread pool. The number of threads can be tuned to scale ingestion:
thread_pool.bulk.size: 16
thread_pool.bulk.queue_size: 1000
Refresh Setting
Adjust the refresh interval when bulk loading data to make new documents visible:
POST /my_index/_settings
{
"refresh_interval": "30s"
}
Document Size
Average document size should be kept reasonable to avoid hitting the HTTP request size limit or causing out of memory errors:
- Target 10-15MB payload sizes
- Avoid documents over 100KB
Properly tuning based on document count, size, and throughput requirements ensures stable bulk indexing performance.
Monitoring Bulk Requests
To monitor running bulk requests, metrics can be checked through the _stats and _tasks APIs:
Bulk Statistics
GET /_stats/bulk
Returns currently running bulk requests and throughput statistics.
Active Tasks API
GET /_tasks?detailed=true&actions=*bulk
This lists active bulk import tasks including status and runtime statistics.
Example dashboard visualizing bulk request status over time:
[INSERT IMAGE]Additional metrics like thread pool utilization, segment counts, and search latency will indicate how well clusters are handling bulk import workloads.
Paginating Bulk Requests
Very large bulk imports can be broken into pages to avoid overloading the cluster. There are two common approaches:
Scrolling
The scroll API can walk over an entire index, exporting documents into bulk files for re-importing:
POST /_search?scroll=1m
{
"size": 10000,
"query": {
"match_all": {}
}
}
This scrolls 10K documents per page, which can be exported and re-imported via the bulk API.
Search After
The search_after parameter paginates queries by passing the last document id:
POST /_search?search_after=996
{
"size": 1000,
"query": {
"match_all": {}
}
}
This method fetches the next 1K documents after id 996.
Paginating using scrolling or search after works well for breaking up huge bulk imports.
Summary of Bulk API Benefits
Some key benefits to using Elasticsearch‘s bulk API:
- Increased throughput – Bulk consolidates requests for better network and CPU efficiency
- Faster indexing – Documents index substantially faster than individual create/index calls
- Atomicity – Failed operations won‘t impact other actions in the bulk request
- Background ingestion – Imports can run in the background without blocking searches
In situations indexing, updating or deleting batches of documents – the bulk API should be leveraged to improve throughput and reduce latency.
Conclusion
The Elasticsearch bulk API provides an efficient method for performing multiple CRUD operations within a single request. Allowing actions to be batched reduces overhead and improves documentation indexing speed compared to individual requests.
This guide covered bulk API syntax, performance comparisons, error handling, tools, best practices and considerations when importing or ingesting data in bulk.
Key takeaways include:
- Structure requests using newline delimited JSON
- Check responses for failed actions
- Tune threads and memory for large imports
- Use Logstash/Kafka pipelines for scalable ingestion
- Monitor performance using statistical APIs
Refer to the Elasticsearch documentation on the bulk API for additional technical details.


