Less is more, Bulk operation in ElasticSearch
Usage Scenarios
ElasticSearch Bulk API is mainly used for high-performance operations like indexing, updating, and deleting large amounts of data. It's useful when you need to insert millions of documents efficiently without stressing the ElasticSearch cluster.
Principle
The Bulk API processes multiple index, update, and delete requests in a single call, significantly reducing overhead caused by multiple HTTP requests.
Execute Operation
To execute a bulk operation, I create a BulkRequest and add multiple IndexRequest, UpdateRequest, or DeleteRequest.
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.add(new IndexRequest("index").id("1").source(jsonBuilder()
.startObject()
.field("field", "value")
.endObject()));
bulkRequest.add(new UpdateRequest("index", "1")
.doc(jsonBuilder()
.startObject()
.field("field", "new_value")
.endObject()));
bulkRequest.add(new DeleteRequest("index", "1"));
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
View and Verify Results
I check the response to ensure all operations succeeded. Each response has a status to indicate success or failure.
for (BulkItemResponse bulkItemResponse : bulkResponse) {
if (bulkItemResponse.isFailed()) {
BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
System.out.println("Error: " + failure.getMessage());
} else {
System.out.println("Operation successful: " + bulkItemResponse.getId());
}
}
Control Batch Size
Controlling the batch size is crucial to avoid overwhelming the ElasticSearch cluster. It's recommended to send smaller batches if errors occur or the cluster becomes unstable.
BulkRequest bulkRequest = new BulkRequest();
int batchSize = 1000;
for (int i = 0; i < documents.size(); i++) {
bulkRequest.add(new IndexRequest("index").id(documents.get(i).getId()).source(documents.get(i).getSource()));
if (i % batchSize == 0 && i != 0) {
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
bulkRequest = a new BulkRequest();
}
}