Tag Archives: curl

Elasticsearch Sizing on Nutanix

One node, one index, one shard

The answer to the question : “how big should I size my Elasticsearch VMs and how what kind of performance will I get?”, always comes down to the somewhat disappointing answer of “It depends!?” It depends on the workload – be it index or search heavy, on the type of data being transformed and so on. 

The way to size your Elasticsearch environment is by finding your “unit of scale”, this is the performance characteristics you will get for your workload via a single shard index running in a single Virtual Machine (VM). Once you have a set of numbers for a particular VM config then you can scale throughput etc, via increasing the number of VMs and/or indexes to handle additional workload.

Virtual Machine Settings

The accepted sweet spot for VM sizing an indexing workload is something like 64GB RAM/ 8+ vCPUs. You can of course right size this further where necessary, thanks to virtualisation. I assign just below half the RAM (31GB) to the heap for the Elasticsearch instance. This is to ensure that the JVM uses compressed Ordinary Object Pointers (OOPs) on a 64 bit system. This heap memory also needs to be locked into RAM

# grep -v ^# /etc/elasticsearch/elasticsearch.yml

cluster.name: esrally
node.name: esbench

path.data: /elastic/data01    # <<< single striped data volume 
bootstrap.memory_lock: true   # <<< lock heap in RAM
network.host: 10.68.68.202
http.port: 9200
discovery.zen.minimum_master_nodes: 1  # <<< single node test cluster
xpack.security.enabled: false

# grep -v ^# /etc/elasticsearch/jvm.options
…
-Xms31g
-Xmx31g
…

From the section above , notice the single mount point for the path.data entry. I am using a 6 vdisk LVM stripe. While you can specify per-vdisk mount points in a comma separated list, unless you have enough indices to make sure all the spindles turn (all the time) then you are better off with logical volume management. You can ensure you are using compressed OOPs by checking for the following log entry at startup

[2017-08-07T11:06:16,849][INFO ][o.e.e.NodeEnvironment ] [esrally02] heap size [30.9gb], compressed ordinary object pointers [true]

Operation System Settings

Set the required kernel settings 

# sysctl -p 
…
vm.swappiness = 0
vm.overcommit_memory = 0
vm.max_map_count = 262144
…

Ensure file descriptors limits are increased

# ulimit –n 65536

verify...

curl –XGET http://10.68.68.202:9200/_nodes/stats/process?filter_path=**.max_file_descriptors
…
{"process":{"max_file_descriptors":65536}}}}
…

Disable swapping, either via the cli or remove swap entries from /etc/fstab

# sudo swapoff –a 

Elasticsearch Bulk Index Tuning

In order to improve indexing rate and increase shard segment size, you can disable refresh interval on an initial load.  Afterwards, setting this to 30s (default=1s) in production means larger segments sizes and potentially less merge pressure at a later date.

curl -X PUT "10.68.68.202:9200/elasticlogs/_settings" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "refresh_interval" : "-1"
    }
}’

Recall that we only want a single shard index and no replication for our testing. We can achieve this by either disabling replication on the fly or creating a template that configures the desired settings at index creation 

Disable replication globally ...

curl -X PUT "10.68.68.202:9200/_settings" -H 'Content-Type: application/json' -d '{"index" : {"number_of_replicas" : 0}}’

or create a template - in this case, for a series of index name regex patterns...

# cat template.json
{
        “index_patterns": [ “ray*”, "elasticlogs”],
        "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 0
        }
}
curl -s -X PUT "10.68.68.202:9200/_template/test_template" -H 'Content-Type: application/json' -d @template.json

Elasticsearch Benchmarking tools

esrally is a macrobenchmarking tool for elasticsearch. To install and configure – use the following quickstart guide. Full information is available here :

 https://github.com/elastic/rally

rally-eventdata-track –  is repository containing a Rally track for simulating event-based data use-cases. The track supports bulk indexing of auto-generated events as well as simulated Kibana queries.

 https://github.com/elastic/rally-eventdata-track

esrally --pipeline=benchmark-only --target-hosts=10.68.68.202:9200 
--track=eventdata --track-repository=eventdata --challenge=bulk-size-evaluation
eventdata bulk index - 5000 events/request highlighted @indexing rate of ~50k docs/sec
eventdata bulk index – 5000 events/request highlighted @indexing rate of ~50k docs/sec
httpd logs index test - highlighted @indexing rate ~80k docs/s
httpd logs index test – highlighted @indexing rate ~80k docs/s

Elasticsearch is just one of a great many cloud native applications that can run successfully on Nutanix Enterprise Cloud. I am seeing more and more opportunities to assist our account teams in the sizing and deployment of Elasticsearch. However, unlike other Search and Analytics platforms Elasticsearch has no ready made formula for sizing. This post will hopefully allow people to make a start on their Elasticsearch sizing on Nutanix and, in addition, help identify future steps to improve their performance numbers.

Further Reading

Elasticsearch Reference

ELK on Nutanix : Kibana

It might seem like I am doing things out of sequence by looking at the visualisation layer of the ELK stack next. However, recall in my original post , that I wanted to build sets  of unreplicated indexes and then use Logstash to fire test workloads at them. Hence, I am covering Elasticsearch and Kibana initially. This brings me to another technical point that I need to cover. In order for a single set of indexes to be actually recoverable, when running on a single node, we need to invoke the following parameters in our Elasticsearch playbook :

So in file: roles/elastic/vars/main.yml
...
elasticsearch_gateway.recover_after_nodes: 1
elasticsearch_gateway.recover_after_time: 5m
elasticsearch_gateway.expected_nodes: 1
...

These are then set in the elasticsearch.yml.j2 file as follows:

# file: roles/elastic/templates/elasticsearch.yml.j2
#{{ ansible_managed }}

...

# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 2
{% if elasticsearch_gateway_recover_after_nodes is defined %}
gateway.recover_after_nodes : {{ elasticsearch_gateway_recover_after_nodes}}
{% endif %}

and so on ....

This allows the indexes to be recovered when there is only a single node in the cluster. See below for the state of my indexes after a reboot:

[root@elkhost01 elasticsearch]# curl -XGET http://localhost:9200/_cluster/health?pretty
{
 "cluster_name" : "nx-elastic",
 "status" : "yellow",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 4,
 "active_shards" : 4,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 4,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

Lets now look at the Kibana playbook I am attempting. Unfortunately, Kibana is distributed as a compressed tar archive. This means that the yum or dnf modules are no help here. There is however a very useful unarchive module, but first we need to download the tar bundle using get_url as follows :

- name: download kibana tar file
 get_url: url=https://download.elasticsearch.org/kibana/kibana/kibana-{{ kibana_version }}-linux-x64.tar.gz
 dest=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz mode=755
 tags: kibana

I initially tried unarchiving the Kibana bundle into /tmp. I then intended to copy everything below the version specific directory (/tmp/kibana-4.0.1-linux-x64) into the Ansible created /opt/kibana directory. This proved problematic as neither the synchronize nor the copy modules seemed setup to do mass copy/transfer between one directory structure to another. Maybe I am just not getting it – I even tried using with_item loops but no joy as fileglobs are not recursive. Answers on a postcard are always appreciated? In the end I just did this :

- name: create kibana directory
 become: true
 file: owner=kibana group=kibana path=/opt/kibana state=directory
 tags: kibana

- name: extract kibana tar file
 become: true
 unarchive: src=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz dest=/opt/kibana copy=no
 tags: kibana

The next thing to do was to create a systemd service unit. There isn’t one for Kibana as there is no rpm package available. Usual templating applies here :

- name: install kibana as systemd service
 become: true
 template: src=kibana4.service.j2 dest=/etc/systemd/system/kibana4.service owner=root \
           group=root mode=0644
 notify:
 - restart kibana
 tags: kibana

And the service unit file looked like:

[ansible@ansible-host01 templates]$ cat kibana4.service.j2
{{ ansible_managed }}

[Service]
ExecStart=/opt/kibana/kibana-{{ kibana_version }}-linux-x64/bin/kibana
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=kibana4
User=root
Group=root
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

This all seemed to work as I could now access Kibana via my browser. No indexes yet of course :

kibana_initial_install

There are one or two plays I would like still like to document. Firstly, the ‘notify’ actions in some of the plays. These are used to call – in my case – the restart handlers. Which in turn causes the service in question to be restarted – see the next section :

# file: roles/kibana/handlers

- name: restart kibana
 become: true
 service: name=kibana state=restarted

I wanted to document this next feature simply because it’s so useful – tags. I have assigned a tag to every play/task in the playbook so far you will have noticed. For testing purposes they allow you to run specific plays. You can then troubleshoot just that particular play and see what’s going on.

 ansible-playbook -i ./production site.yml --tags "kibana" --ask-sudo-pass

Now that I have the basic plays to get my Elasticsearch and Kibana services up and running via Ansible, it’s time to start looking at Logstash. Next time I post on ELK type stuff, I will try to look at logging and search use cases. Once I crack how they work of course.