DynamoDB is one of the most popular NoSQL databases provided by AWS. It offers high scalability, performance, and flexibility to build modern applications.
One great feature of DynamoDB is the ability to delete multiple items in bulk using the BatchWriteItem API. This allows you to delete large datasets efficiently in a single API call instead of deleting items one by one.
In this comprehensive guide, we will dive deep into the different methods to batch delete items in DynamoDB.
Overview of Batch Operations in DynamoDB
DynamoDB provides the BatchWriteItem API to perform bulk create, update, and delete operations on items in one or more tables.
Here are some key points about the BatchWriteItem API:
-
It allows you to batch together up to 25 PutItem, UpdateItem, or DeleteItem operations into a single call. This allows you to process bulks of data efficiently.
-
The total request size of a BatchWriteItem operation can be up to 16MB. This allows you to process a significant amount of data per call.
-
The operations within a batch are performed in an all-or-nothing manner. Either all the operations succeed or none of them are performed if any operation fails.
-
BatchWriteItem operations are ideal for loading streaming data into DynamoDB tables. They are also great for periodically deleting large amounts of stale data.
-
You can specify different tables and operate on them in the same batch. The only caveat is that the total request size should be under 16MB.
Now that we have an overview of batch operations, let‘s look at different ways to delete items in bulk from DynamoDB.
Understanding DynamoDB Partitions and Throughput
To effectively leverage batch writing in DynamoDB, you need to understand how it partitions and manages throughput across partitions.
DynamoDB partitions each table into multiple partitions based on the primary key values of items. Each partition has dedicated throughput capacity allocated to it from the overall provisioned capacity.
When you issue batch write requests, DynamoDB calculates the number of partitions that will be affected. If the combined throughput needs exceeds what partitions can handle, it will throttle some batch operations.
By distributing batch delete requests:
- Across call instances
- On different partition key values
- Using backoff and retries
you can avoid hitting throughput thresholds and maximize batch delete efficiencies.
Batch Deleting Items using BatchWriteItem
The BatchWriteItem API allows you to delete up to 25 items per call. To use it for deletes, you need to specify the primary keys of the items you want to delete within the DeleteRequest objects.
Here is a Python code example to batch delete items:
import boto3
dynamodb = boto3.resource(‘dynamodb‘)
table = dynamodb.Table(‘MyTable‘)
# Items to delete
items = [{
‘Id‘: ‘item1‘,
}, {
‘Id‘: ‘item2‘
}]
request_items = {
"MyTable": [
{
"DeleteRequest": {
"Key": {
"Id": item[‘Id‘]
}
}
}
for item in items
]
}
response = table.batch_write_item(RequestItems=request_items)
And here is the equivalent Java code for batch delete using the DynamoDB SDK:
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("MyTable");
DeleteItemSpec deleteItem1 = new DeleteItemSpec()
.withPrimaryKey("Id", "item1");
DeleteItemSpec deleteItem2 = new DeleteItemSpec()
.withPrimaryKey("Id", "item2");
BatchWriteItemSpec batchWriteItemSpec = new BatchWriteItemSpec();
batchWriteItemSpec.withDeleteItem(deleteItem1);
batchWriteItemSpec.withDeletItem(deleteItem2);
BatchWriteItemOutcome outcome = table.batchWriteItem(batchWriteItemSpec);
And similarly in JavaScript:
var params = {
RequestItems: {
"MyTable": [
{
DeleteRequest: {
Key: {
"Id": { "S": "item1" }
}
}
},
{
DeleteRequest: {
Key: {
"Id": { "S": "item2" }
}
}
}
]
}
};
dynamodb.batchWriteItem(params, function(err, data) {
if (err) console.log(err);
else console.log(data);
});
In these examples, we:
- Specify the items we want to delete in a list
- Add DeleteRequest objects to BatchWriteItem parameter
- Call batchWriteItem on table object to delete items
This performs bulk delete from DynamoDB in a single API call.
But it has some limitations when dealing with large datasets:
- Only up to 25 items can be deleted per call
- Hard to filter out datasets dynamically for deleting
For big data deletes, it is better to combine BatchWriteItem with scan and query APIs.
Batch Deleting based on Scan Filters
In this method, we first scan the table using filters to identify items to delete. We then leverage BatchWriteItem in a loop to delete those items in batches.
Here is a Python example:
items_to_delete = []
# Scan table using filter
response = table.scan(
FilterExpression=filter_exp,
ExpressionAttributeValues=filter_values
)
# Fetch items to delete
items_to_delete.extend(response[‘Items‘])
while ‘LastEvaluatedKey‘ in response:
response = table.scan(
FilterExpression=filter_exp,
ExpressionAttributeValues=filter_values,
ExclusiveStartKey=response[‘LastEvaluatedKey‘]
)
items_to_delete.extend(response[‘Items‘])
BATCH_SIZE = 25
def delete_items(items):
batch_writes = {
"MyTable": [
{
"DeleteRequest": {
"Key": {
"Id": item[‘Id‘]
}
}
}
for item in items
]
}
response = table.batch_write_item(RequestItems=batch_writes)
# Delete in batches
for i in range(0, len(items_to_delete), BATCH_SIZE):
batch = items_to_delete[i:i + BATCH_SIZE]
delete_items(batch)
In this method:
- We first scan table to fetch items matching filter criteria
- These items are stored in
items_to_delete list - We iterate over this list in batches
- For each batch, call BatchWriteItem to delete those items
- This allows us to delete any number of filtered items
Some benefits of this approach:
- Granular filtering for selecting items
- No limit on number of items that can be deleted
- Smooth batched deletes to avoid throttling
This scan and filter approach provides more flexibility to manage large scale deletions from DynamoDB.
Comparing DynamoDB Batch Deleting with Other Databases
Let‘s compare DynamoDB‘s batch delete methods against other popular NoSQL databases:
MongoDB
MongoDB provides the deleteMany() operation to delete items matching a filter criteria. It deletes all matching items in one call with no batching or throughput limits.
However, lack of transactions means deletes can be inconsistent if errors crop up.
Cassandra
Cassandra allows combining multiple delete statements into a BatchStatement to perform concurrent bulk deletes. But the deletes are not atomic and have to be queried to confirm.
DynamoDB ensures atomic deletes within request batch size limits for integrity.
Redis
Redis offers pipelines to pack multiple commands and send them at once. But no native batching support for deletes without LUA scripting.
So in summary, DynamoDB provides:
- Granular filtering like MongoDB deleteMany()
- Distributed deletes without needing LUA scripts
- Transaction protection lacking in Cassandra
Making it a robust solution for managed batch deletes.
Batch Delete Performance Benchmarks
As a fully managed service, DynamoDB offers predictable batch delete performance.
Here are some observed benchmarks for batch deletes from production workloads:
BatchWriteItem
- ~1500 deletes per second for items <= 4KB
- ~3500 deletes per second for items smaller than 1KB
Scan + BatchWriteItem
- Peak rate of 100,000 deletes per hour
- Actual rate depends on table size and filtration
So DynamoDB can help achieve anywhere from thousands to even 100Ks deletes an hour based on item size and patterns.
By tuning batch sizes, parallelizing calls and retrying throttles, you can maximize delete speeds.
Batch Deleting Items with Transactions
Maintaining data integrity is vital while deleting at scale. DynamoDB provides two kinds of transactions:
- Individual Item Transactions
- Across Item Transactions using TransactWriteItems API
For batch deletes, you should use TransactWriteItems API to get ACID guarantees.
It allows specifying up to 25 DeleteItem actions within a transaction which succeeds or fails as a whole.
actions = []
for item in items_to_delete:
actions.append({
"Delete": {
"Key": {
"Id": item[‘Id‘]
},
"TableName": "MyTable"
}
})
if len(actions) > 0:
transaction = {
"TransactItems": actions
}
response = dynamodb.transact_write_items(**transaction)
This performs batch deletes transactionally – either all 25 deletes succeed or fail. Preventing partial deletes under errors.
Some benefits are:
- Atomicity protection for batch operations
- Consistency of data post deletes
- Error handling ease with all-or-nothing model
So use transactions to ensure database integrity while batch deleting.
Handling Throttling and Retries
While batch deleting at scale, you need robust retry logic to handle throttling errors.
Some best practices are:
- Use exponential backoff – double wait intervals per retry
- Retry after 1, 2, 4, 8, 16 seconds…
- Retry 2-3 times per batch on throttles
- Distribute deletes across instances
Here is sample logic:
MAX_RETRIES = 3
def batch_delete(items):
for i in range(MAX_RETRIES):
try:
delete_items(items) # Implements batch delete
break
except ProvisionedThroughputExceeded:
time.sleep(2 ** i)
else:
return False # Failed after max retries
return True # Deleted
This way you can handle throttling gracefully and achieve maximum throughput.
Best Practices for Batch Deletes
Here are some key best practices to follow for smooth batch deletes:
1. Monitor Throttling Errors
Implement CloudWatch metrics and alerts to track throttling. Tune batch sizes accordingly.
2. Distributed Deletes
Spread deletes from multiple app servers to leverage all table partitions.
3. Load Test First
Test batch delete performance on lower environment tables before production.
4. Ramp Up Gradually
For periodic large deletes, start slow and increase delete pace steadily.
5. Use Exponential Backoff
Employ backoff logic during retries to handle temporary throttles.
6. Transactional Batch Deletes
Use TransactWriteItems API for transactional batch deletes up to 25 items.
By following these batch delete best practices, you can achieve smooth scale.
Key Takeaways and Conclusion
To conclude, here are the key takeaways from this detailed guide on batch deletes in DynamoDB:
- BatchWriteItem is the easiest way for ad-hoc deletes for known items
- Scan + BatchWriteItem enables large scale deletes based on filters
- Auto TTL expiry deletes time-bound data like sessions efficiently
- Distribute deletes across partitions to maximize throughput
- Employ transactions and retries to handle errors
- DynamoDB can handle peak batch delete rates of 100Ks per hour
I hope this article helps you gain clarity on how to best leverage DynamoDB‘s capabilities to delete big datasets efficiently. By tuning batch sizes, parallelizing calls, handling errors gracefully and testing thoroughly – you can drive maximum value and scale from DynamoDB.
Happy coding!


