By default, the Elasticsearch configuration does not change while it receives data. However, when Elasticsearch stores a large amount of data over time, the default configuration becomes an obstacle in scaling for better performance. Besides this, the machine that hosts Elasticsearch will have limitations on its specifications, like memory size. A solution to this problem is to tune the parameter configuration of Elasticsearch, which leads to achieving better performance.
Why should you tune your Elasticsearch performance?
The amount of generated data is remarkably increasing every day, 3.8 billion people use the internet as of 2020. It is also estimated that 1.7 MB of data will be created for every person every second by 2021. Also, other data will be generated by servers to maintain their status and stored in files, which are referred to as log files. Log files can include different types of data, such as web requests from users, user activities, and server events. Log files are considered as a part of big data.
Due to the increase in the number of users and machines, there will be a large amount of data to analyze. There are different aspects when discussing large data, one of these aspects is searching among big data. When it comes to searching methods, the time is taken to fetch the right information crucially influences the quality of a search engine.
The correct parameters are key for your Elasticsearch performance
Running Elasticsearch with a good performance level depends on the server specifications you have. However, there are other parameters that affect the performance of Elasticsearch. These parameters can provide fast searching or indexing if configured correctly.
Finding the right configuration parameters is difficult when Elasticsearch’s clusters rely on both the amount of data being indexed and the host machine specifications. Moreover, it is also hard to combine different parameters to get the best outcome of configuration parameters.
Several factors play a crucial part in the performance of Elasticsearch.
A few examples:
• Cluster Status
• Node Performance
• Java Heap
• Query Load and Query Latency
• Index Latency and Flush Latency
• Custom routing
• Force Merging
There are many parameters to consider when it comes to both searching speed and indexing speed in Elasticsearch. The list below summarizes the most parameters that influence indexing performance and hence searching performance.
- refresh.interval : Time to wait before copying in-buffer memory
- number.of.replicas The number of replicas each primary shard has
- memory.index.buffer.size : Allocation of heap memory
- memory.min.index.buffer.size : Allocation of heap memory
- memory.max.index.buffer.size: Allocation of heap memory
- translog.flush.threshold.size: Make a flush after reaching specific size
- translog.retention.age: Duration for keeping a translog files
- translog.sync.interval: How often the translog is synced to disk
- number.of.shards: The number of primary shards per index
- shard.check.on.startup: shards should be checked for corruption before opening