Reviving a struggling Elastic cluster for Fixami

In today’s data-driven landscape, efficient data retrieval is the backbone of success. Imagine a situation where a company’s Elastic cluster, which used to work smoothly, suddenly starts running very slowly. Searches take forever and finding important documents becomes a real challenge. This customer success story is about a consulting session that turned things around, fixing the struggling Elastic cluster and making document searches seamless again.

About Fixami

Fixami is an e-commerce organization focused on selling power tools to professionals and home handymen. Their headquarters and adjacent warehouse is located in a new building, with top facilities, in Tilburg.

They have been one of the fastest growing companies in the south of the Netherlands for over five years. Continuous focus on innovation and optimization ensures that Fixami continues to grow and is partly responsible for the transition from ‘bricks to clicks’ of the tool market.

Figuring out the issue

When Fixami came to us, they were dealing with a big problem. Their Elastic cluster, which was the heart of their data system, was not performing well. Searches that used to be fast were now incredibly slow and they couldn’t find the documents needed. It was clear that something was seriously wrong and the client was getting frustrated.

The issues

The first step in reviving the Elastic cluster was to figure out what was causing the slowdown. We began by looking at the Fixami’s system setup, cluster configuration, and use of data. Here are the main problems we found:

Hardware and Resources: The system wasn’t using its resources efficiently. This was causing the slow performance. The memory and processing power allocated to the cluster didn’t match the workload, creating bottlenecks. It was hosted on GCP and using Kubernetes (ECK) setup.
Cluster size and Data Volume: The cluster size was quite large for the volume of data entering the cluster.
Shard Overload: The number of shards in the cluster was much higher than the number of nodes. This imbalance was causing a strain on the system and hampering performance.
Node Configuration: The Elastic nodes, the individual parts of the cluster, were not set up properly. This was causing them to not communicate efficiently, slowing down the entire system. Node roles were set to Default i.e. all roles were enabled on all the nodes.

The solutions to get back on track

To fix these issues, we had to make some changes to how the cluster was set up and how it worked. Below are the main things Devoteam did to make the Elastic cluster work better.

Making better use of the resources

We recommend giving the cluster more memory and processing power. This would help it handle searches faster and more efficiently.

Node configuration

We reconfigured the Elastic nodes so that they communicated more efficiently. Updated existing nodes to create 1 dedicated Master node and other remaining 2 nodes as Data and Master node. Removed ML, Coordinator and other irrelevant node roles.

Balancing shards

We adjusted the number of shards in the cluster to match the number of nodes. This balanced setup eased the strain on the system and improved performance.

The number of shards you should allocate in an Elasticsearch cluster depends on various factors, including your data volume, use case, hardware resources, and anticipated growth. However, a general guideline is to aim for an even distribution of shards across your data nodes while considering factors like hardware capacity and query performance.

Here’s a simple formula to calculate the number of primary shards based on the number of data nodes:

Number of Primary Shards = Total Number of Data Nodes * Desired Shards per Node

For example, if you have 3 data nodes and you want to allocate 1 shard per node:

Number of Primary Shards = 3 * 1 = 3

This approach aims to evenly distribute the data and workload across the nodes. Each primary shard is stored on a separate node, allowing Elasticsearch to parallelize search and indexing operations.

Result

With these changes in place, the Elastic cluster’s performance improved significantly. Slow searches are no longer a problem, and finding documents became quick and easy.

Elastic documentation references

Our solutions were in line with the recommendations given in Elastic’s official documentation.

Elastic Documentation: https://www.elastic.co/guide/index.html
Hardware Optimization: https://www.elastic.co/guide/en/elasticsearch/reference/current/hardware.html
Index Configuration: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
Shard Management: https://www.elastic.co/guide/en/elasticsearch/reference/current/shards.html