Category Archives: Elasticsearch

Using CALM Blueprints – Automation is the new punk!

Repeatability

I can imagine there are a lot of people like me who are continually setting up and tearing down environments in order to run application benchmarks, test out APIs or run various new features etc. The consequence of this is that I have umpteen sources of best practice notes for each and every technology stack I get involved with. What’s worse is that some of the configurations and their changes are identical across multiple applications. So often, I am digging around in directories entitled somewhat unhelpfully like “Notes” and “Best_Practices” or …. wait for it …..”Tunings”.

As part of an ongoing move towards Infrastructure as Code, I am on a mission now to get all of the crufty bits of info I keep here, there and everywhere, into a source code repository format. To that end I have been looking at using the Multi-VM blueprint functionality of Nutanix Calm (Automated Lifecycle Management). Calm allows me to create a blueprint and reuse all my original code snippets and config edits. They can be in Bash, Python or Powershell and so on. Once created, the blueprint can be stored in a repository on Github, for example. Then everytime I use that blueprint I get a repeatable deployment that is the same, each & every time I run it.

Here’s one I made earlier

Let’s take a look at building out a stack to benchmark Elasticsearch using esrally. I covered some of this in a my last post. I want to start off by discussing a few prerequisites that will be needed. First and foremost – the image used to create the virtual machines (VMs). I used CentOS 7 cloud images which will require ssh key based access for the default user (centos). This means I need to store both public and private keys in the various parts of the configuration. See below for the Configuration > DOWNLOADABLE IMAGE CONFIGURATION and Credentials sections in the blueprint

The blueprint automatically creates three virtual machines (VMs), one to host a single Elasticsearch instance, one for the Kibana instance and another that will run the esrally workload generator. See below for the basic layout of the blueprint. As the Kibana instance needs to know the address of the Elasticsearch instance, I need to create a dependency between the Elasticsearch and Kibana services. I do this by creating “an edge” between the services. This is delineated by the white line. That way the Kibana configuration/install only proceeds when the Elasticsearch configuration/install has completed. However, all underlying VMs are created simultaneously.

Services and dependencies

Each service requires a virtual machine in order to provide that service. So configure each VM with storage (vDISKS), network (NIC), ssh access (Credentials), along with any guest customisation and so on.  For the Search_Index (Elasticsearch) service, I built the Elasticsearch VMs to host six 200GB vdisks, and used the cloud-config already installed in the image to set access keys and permissions. See below…

Application Profiles and variables

The use of application profiles not only allows you to specify the platform (or substrate in Calm speak). You can also encapsulate variables which are then passed to that application. I am deploying to a Nutanix platform in this case. This works just as well however, with AWS, GCP and Azure. You can see from the application profile below the variables I have created. I could very quickly deploy several application stacks using this in a blueprint and each one could have a different java heap size. I could then make performance comparisons between the two. Each application stack would be exactly the same apart from the one changed variable. By extension I could add other variables I am interested in, like LVM stripe width or filesystem block size and so on.

Application Installation and Configuration

How variables in the application profiles get used, can be shown below in the package install task. The bulk of any configuration is done here. Tasks can be assigned to any action that are related to a service or the application profile. So a start, restart, stop or delete can have an associated task. For each service there’s a package install task and that’s where we use the application profile variables. Each of the services I configured have a package install task, below is the task for the Elasticsearch/Search_Index service 

The canvas (above) shows a number of ways to update or edit files based on various patterns. Note that all config file edits/updates are done in place. You should avoid using a CLI that relies on creating temporary files. Your package install script could end up trying to write/access files outside of the deployment environment. This is a potential security hole which Calm will not allow. Notice how the variable macros in the above package tasks are invoked below :

...
sudo sed -i 's/-Xms1g/-Xms@@{java_heap_size}@@g/' /etc/elasticsearch/jvm.options
...
sudo sed -i 's%path.data: /var/lib/elasticsearch%path.data: @@{elastic_data_path}@@%' /etc/elasticsearch/elasticsearch.yml
...

Calm internal macros are also available. For example: passing the address of one service into another – this is from the package task for the Data Visualisation service (kibana instance):

...
sudo sed -i 's%^#elasticsearch.hosts: \["http://localhost:9200"\]%elasticsearch.hosts: \["http://@@{Search_Index.address}@@:@@{elastic_http_port}@@"\]%' /etc/kibana/kibana.yml
...

or for cardinal numbers for unique VM names (see the VM configuration section of any service):

elastic-@@{calm_array_index}@@

Provisioning and Auditing

That’s the the blueprint complete. It should be saved without errors or warnings. Now it’s time to launch the blueprint to build the application stack. At this point you can name what will be your running application instance and change/set any runtime variables. Once launched the blueprint is queued, verified and then cloned ready to run. While its running you can audit the steps of the workflow in the blueprint:

 

Once the application is marked RUNNING, you can then either connect to individual VMs, or access an application via a browser. It’s common for all means of VM or application access to be placed in the blueprint description (Note: it also expands macro variables – see below):

The following is an example of the /etc/motd when logging into the VM installed with esrally

# ssh -i ./keys.pem -l centos 10.68.58.87
Last login: Wed Jul 17 15:46:57 2019 from 10.68.64.60

Configuration successfully written to /home/centos/.rally/rally.ini. Happy benchmarking!

More info about Rally:

* Type esrally --help
* Read the documentation at https://esrally.readthedocs.io/en/1.2.1/
* Ask a question on the forum at https://discuss.elastic.co/c/elasticsearch/rally

To get started:
esrally list tracks

Or....

esrally --pipeline=benchmark-only --target-hosts=10.68.58.177:9200 \
--track=eventdata --track-repository=eventdata --challenge=bulk-size-evaluation

Conclusion 

The final version (for now) of the blueprint is available to clone or download at:

https://github.com/rayhassan/calm-bp-elastic

Upload the blueprint to the Calm service on Prism Central. Then work through it as you read this post. Make your own changes if required. At the end (~10 minutes) you will have a running environment with which to test various Elasticsearch workloads. I intend to work through more blueprints related to other cloud native applications, with a view to developing larger scale deployments. Stay tuned,

 

Elasticsearch Sizing on Nutanix

One node, one index, one shard

The answer to the question : “how big should I size my Elasticsearch VMs and how what kind of performance will I get?”, always comes down to the somewhat disappointing answer of “It depends!?” It depends on the workload – be it index or search heavy, on the type of data being transformed and so on. 

The way to size your Elasticsearch environment is by finding your “unit of scale”, this is the performance characteristics you will get for your workload via a single shard index running in a single Virtual Machine (VM). Once you have a set of numbers for a particular VM config then you can scale throughput etc, via increasing the number of VMs and/or indexes to handle additional workload.

Virtual Machine Settings

The accepted sweet spot for VM sizing an indexing workload is something like 64GB RAM/ 8+ vCPUs. You can of course right size this further where necessary, thanks to virtualisation. I assign just below half the RAM (31GB) to the heap for the Elasticsearch instance. This is to ensure that the JVM uses compressed Ordinary Object Pointers (OOPs) on a 64 bit system. This heap memory also needs to be locked into RAM

# grep -v ^# /etc/elasticsearch/elasticsearch.yml

cluster.name: esrally
node.name: esbench

path.data: /elastic/data01    # <<< single striped data volume 
bootstrap.memory_lock: true   # <<< lock heap in RAM
network.host: 10.68.68.202
http.port: 9200
discovery.zen.minimum_master_nodes: 1  # <<< single node test cluster
xpack.security.enabled: false

# grep -v ^# /etc/elasticsearch/jvm.options
…
-Xms31g
-Xmx31g
…

From the section above , notice the single mount point for the path.data entry. I am using a 6 vdisk LVM stripe. While you can specify per-vdisk mount points in a comma separated list, unless you have enough indices to make sure all the spindles turn (all the time) then you are better off with logical volume management. You can ensure you are using compressed OOPs by checking for the following log entry at startup

[2017-08-07T11:06:16,849][INFO ][o.e.e.NodeEnvironment ] [esrally02] heap size [30.9gb], compressed ordinary object pointers [true]

Operation System Settings

Set the required kernel settings 

# sysctl -p 
…
vm.swappiness = 0
vm.overcommit_memory = 0
vm.max_map_count = 262144
…

Ensure file descriptors limits are increased

# ulimit –n 65536

verify...

curl –XGET http://10.68.68.202:9200/_nodes/stats/process?filter_path=**.max_file_descriptors
…
{"process":{"max_file_descriptors":65536}}}}
…

Disable swapping, either via the cli or remove swap entries from /etc/fstab

# sudo swapoff –a 

Elasticsearch Bulk Index Tuning

In order to improve indexing rate and increase shard segment size, you can disable refresh interval on an initial load.  Afterwards, setting this to 30s (default=1s) in production means larger segments sizes and potentially less merge pressure at a later date.

curl -X PUT "10.68.68.202:9200/elasticlogs/_settings" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "refresh_interval" : "-1"
    }
}’

Recall that we only want a single shard index and no replication for our testing. We can achieve this by either disabling replication on the fly or creating a template that configures the desired settings at index creation 

Disable replication globally ...

curl -X PUT "10.68.68.202:9200/_settings" -H 'Content-Type: application/json' -d '{"index" : {"number_of_replicas" : 0}}’

or create a template - in this case, for a series of index name regex patterns...

# cat template.json
{
        “index_patterns": [ “ray*”, "elasticlogs”],
        "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 0
        }
}
curl -s -X PUT "10.68.68.202:9200/_template/test_template" -H 'Content-Type: application/json' -d @template.json

Elasticsearch Benchmarking tools

esrally is a macrobenchmarking tool for elasticsearch. To install and configure – use the following quickstart guide. Full information is available here :

 https://github.com/elastic/rally

rally-eventdata-track –  is repository containing a Rally track for simulating event-based data use-cases. The track supports bulk indexing of auto-generated events as well as simulated Kibana queries.

 https://github.com/elastic/rally-eventdata-track

esrally --pipeline=benchmark-only --target-hosts=10.68.68.202:9200 
--track=eventdata --track-repository=eventdata --challenge=bulk-size-evaluation
eventdata bulk index - 5000 events/request highlighted @indexing rate of ~50k docs/sec
eventdata bulk index – 5000 events/request highlighted @indexing rate of ~50k docs/sec
httpd logs index test - highlighted @indexing rate ~80k docs/s
httpd logs index test – highlighted @indexing rate ~80k docs/s

Elasticsearch is just one of a great many cloud native applications that can run successfully on Nutanix Enterprise Cloud. I am seeing more and more opportunities to assist our account teams in the sizing and deployment of Elasticsearch. However, unlike other Search and Analytics platforms Elasticsearch has no ready made formula for sizing. This post will hopefully allow people to make a start on their Elasticsearch sizing on Nutanix and, in addition, help identify future steps to improve their performance numbers.

Further Reading

Elasticsearch Reference

ELK on Nutanix : Kibana

It might seem like I am doing things out of sequence by looking at the visualisation layer of the ELK stack next. However, recall in my original post , that I wanted to build sets  of unreplicated indexes and then use Logstash to fire test workloads at them. Hence, I am covering Elasticsearch and Kibana initially. This brings me to another technical point that I need to cover. In order for a single set of indexes to be actually recoverable, when running on a single node, we need to invoke the following parameters in our Elasticsearch playbook :

So in file: roles/elastic/vars/main.yml
...
elasticsearch_gateway.recover_after_nodes: 1
elasticsearch_gateway.recover_after_time: 5m
elasticsearch_gateway.expected_nodes: 1
...

These are then set in the elasticsearch.yml.j2 file as follows:

# file: roles/elastic/templates/elasticsearch.yml.j2
#{{ ansible_managed }}

...

# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 2
{% if elasticsearch_gateway_recover_after_nodes is defined %}
gateway.recover_after_nodes : {{ elasticsearch_gateway_recover_after_nodes}}
{% endif %}

and so on ....

This allows the indexes to be recovered when there is only a single node in the cluster. See below for the state of my indexes after a reboot:

[root@elkhost01 elasticsearch]# curl -XGET http://localhost:9200/_cluster/health?pretty
{
 "cluster_name" : "nx-elastic",
 "status" : "yellow",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 4,
 "active_shards" : 4,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 4,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

Lets now look at the Kibana playbook I am attempting. Unfortunately, Kibana is distributed as a compressed tar archive. This means that the yum or dnf modules are no help here. There is however a very useful unarchive module, but first we need to download the tar bundle using get_url as follows :

- name: download kibana tar file
 get_url: url=https://download.elasticsearch.org/kibana/kibana/kibana-{{ kibana_version }}-linux-x64.tar.gz
 dest=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz mode=755
 tags: kibana

I initially tried unarchiving the Kibana bundle into /tmp. I then intended to copy everything below the version specific directory (/tmp/kibana-4.0.1-linux-x64) into the Ansible created /opt/kibana directory. This proved problematic as neither the synchronize nor the copy modules seemed setup to do mass copy/transfer between one directory structure to another. Maybe I am just not getting it – I even tried using with_item loops but no joy as fileglobs are not recursive. Answers on a postcard are always appreciated? In the end I just did this :

- name: create kibana directory
 become: true
 file: owner=kibana group=kibana path=/opt/kibana state=directory
 tags: kibana

- name: extract kibana tar file
 become: true
 unarchive: src=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz dest=/opt/kibana copy=no
 tags: kibana

The next thing to do was to create a systemd service unit. There isn’t one for Kibana as there is no rpm package available. Usual templating applies here :

- name: install kibana as systemd service
 become: true
 template: src=kibana4.service.j2 dest=/etc/systemd/system/kibana4.service owner=root \
           group=root mode=0644
 notify:
 - restart kibana
 tags: kibana

And the service unit file looked like:

[ansible@ansible-host01 templates]$ cat kibana4.service.j2
{{ ansible_managed }}

[Service]
ExecStart=/opt/kibana/kibana-{{ kibana_version }}-linux-x64/bin/kibana
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=kibana4
User=root
Group=root
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

This all seemed to work as I could now access Kibana via my browser. No indexes yet of course :

kibana_initial_install

There are one or two plays I would like still like to document. Firstly, the ‘notify’ actions in some of the plays. These are used to call – in my case – the restart handlers. Which in turn causes the service in question to be restarted – see the next section :

# file: roles/kibana/handlers

- name: restart kibana
 become: true
 service: name=kibana state=restarted

I wanted to document this next feature simply because it’s so useful – tags. I have assigned a tag to every play/task in the playbook so far you will have noticed. For testing purposes they allow you to run specific plays. You can then troubleshoot just that particular play and see what’s going on.

 ansible-playbook -i ./production site.yml --tags "kibana" --ask-sudo-pass

Now that I have the basic plays to get my Elasticsearch and Kibana services up and running via Ansible, it’s time to start looking at Logstash. Next time I post on ELK type stuff, I will try to look at logging and search use cases. Once I crack how they work of course.

ELK on Nutanix : Elasticsearch

In this second post on using Ansible to deploy the ELK stack on Nutanix, I will cover my initial draft at a playbook for Elasticsearch (ES).  Recall from my previous post, the playbook layout looks like:

[ansible@ansible-host01 roles]$ tree elastic
elastic
├── files
│   └── elasticsearch.repo
├── handlers
│   └── main.yml
├── tasks
│   └── main.yml
├── templates
│   ├── elasticsearch.default.j2
│   ├── elasticsearch.in.sh.j2
│   └── elasticsearch.yml.j2
└── vars
 └── main.yml

There’s also an additional role at play here, config – which is the basic config for the underlying VM guest OS, which we also need to look at :

[ansible@ansible-host01 roles]$ tree config
config
├── files
├── handlers
├── tasks
│   └── main.yml
├── templates
└── vars
 └── main.yml

the common role is where I set things via the Ansible sysctl module, or add entries to files (using lineinfile) in order to set max memory and ulimits etc. It’s generic system configuration, so for example:

#installing java runtime pkgs (pre-req for ELK)
- name: install java 8 runtime
 become: true
 yum: name=java state=installed
 tags: config

#set system max/min numbers...
- name: set maximum map count in sysctl/systemd
 become: true
 sysctl: name=vm.max_map_count value={{ os_max_map_count }} state=present
 tags: config

...

- name: set soft limits for open files
 become: true
 lineinfile: dest=/etc/security/limits.conf line="{{ elasticsearch_user }} soft nofile {{ elasticsearch_max_open_files }}" insertafter=EOF backup=yes
 tags: config

- name: set max locked memory
 become: true
 lineinfile: dest=/etc/security/limits.conf line="{{ elasticsearch_user }} - memlock {{ elasticsearch_max_locked_memory }}" insertafter=EOF backup=yes
 tags: config

...

Here might be a good time to touch upon how Ansible allows you to set variables. Within the directory of each role there’s a subdir called vars and all the variables needed for that role are contained in the YAML file (main.yml). Here’s a snippet:

# can use vars to set versioning and user 
elasticsearch_version: 1.7.0
elasticsearch_user: elasticsearch
...

# here's how we can specify the data volumes that ES will use 
elasticsearch_data_dir: /esdata/data01,/esdata/data02,/esdata/data03,/esdata/data04,/esdata/data05,/esdata/data06

...

# Virtual memory settings - ES heap is set to half my current VM RAM
# but no greater than 32GB for performance reasons
elasticsearch_heap_size: 16g
elasticsearch_max_locked_memory: unlimited
elasticsearch_memory_bootstrap_mlockall: "true"

....

# Good idea not to go with the ES default names of Franz Kafka etc
elasticsearch_cluster_name: nx-elastic
elasticsearch_node_name: nx-esnode01

# My initial nodes will be both cluster quorum members and data "workhorse" nodes.
# I will # separate duties as I scale. Also I set the min master nodes to 1 so that 
# my ES cluster comes up while initially testing a single index 
elasticsearch_node_master: "true"
elasticsearch_node_data: "true"
elasticsearch_discovery_zen_minimum_master_nodes: 1

We’ll see how we use these variables as we cover more features. Next up I used some nice features like shell and also register variables to be able to provide conditional behaviour for package install :

- name: check for previous elasticsearch installation
 shell: if [ -e /usr/share/elasticsearch/lib/elasticsearch-{{ elasticsearch_version }}.jar ]; then echo yes; else echo no; fi;
 register: version_exists
 always_run: True
 tags: elastic

- name: uninstalling previous version if applicable
 become: true
 command: yum erase -y elasticsearch
 when: version_exists.stdout == 'no'
 ignore_errors: true
 tags: elastic

and similarly for the marvel plugin :

- name: check marvel plugin installed
 become: true
 stat: path={{ elasticsearch_home_dir }}/plugins/marvel
 register: marvel_installed
 tags: elastic

- name: install marvel plugin
 become: true
 command: "{{ elasticsearch_home_dir }}/bin/plugin -i elasticsearch/marvel/latest"
 notify:
 - restart elasticsearch
 when: not marvel_installed.stat.exists
 tags: elastic

The Marvel plugin stanza above also makes use of the stat module – this is a really great module. It returns all kinds of goodness you would normally expect from a stat() system call and yet you are doing it in your Ansible playbook.  There are a couple more things I will cover and then leave the rest for when I talk about Kibana and Logstash in a follow up post. First up then are templates. Ansible uses Jinja2 templating in order to transform a file and install it on your host, you can create a file with appropriate templating as below. The variables in {{ .. }} are from the roles ../var directory containing the yaml file already described earlier.

Note : I stripped all comment lines for sake of brevity:

[ansible@ansible-host01 templates]$ pwd
/home/ansible/elk/roles/elastic/templates
[ansible@ansible-host01 templates]$ grep -v ^# elasticsearch.yml.j2
{% if elasticsearch_cluster_name is defined %}
cluster.name: {{ elasticsearch_cluster_name }}
{% endif %}

...

{% if elasticsearch_node_name is defined %}
node.name: {{ elasticsearch_node_name }}
{% endif %}

...

{% if elasticsearch_node_master is defined %}
node.master: {{ elasticsearch_node_master }}
{% endif %}
{% if elasticsearch_node_data is defined %}
node.data: {{ elasticsearch_node_data }}
{% endif %}

...

{% if elasticsearch_memory_bootstrap_mlockall is defined %}
bootstrap.mlockall: {{ elasticsearch_memory_bootstrap_mlockall }}
{% endif %}
....

The template  file when run in the play is then transformed using the provided variables and copied into place on my  ELK host target VM…

- name: copy elasticsearch defaults file
 become: true
 template: src=elasticsearch.default.j2 dest=/etc/sysconfig/elasticsearch owner={{ elasticsearch_user }} group={{ elasticsearch_group }} mode=0644
 notify:
 - restart elasticsearch
 tags: elastic

So let’s see how our playbook runs and what the output looks like

[ansible@ansible-host01 elk]$ ansible-playbook -i ./production site.yml \
--tags "config,elastic" --ask-sudo-pass
SUDO password:

PLAY [elastic-hosts] **********************************************************

GATHERING FACTS ***************************************************************
ok: [10.68.64.117]

TASK: [config | install java 8 runtime] ***************************************
ok: [10.68.64.117]

TASK: [config | set swappiness in sysctl/systemd] *****************************
ok: [10.68.64.117]

TASK: [config | set maximum map count in sysctl/systemd] **********************
ok: [10.68.64.117]

TASK: [config | set hard limits for open files] *******************************
ok: [10.68.64.117]

TASK: [config | set soft limits for open files] *******************************
ok: [10.68.64.117]

TASK: [config | set max locked memory] ****************************************
ok: [10.68.64.117]

TASK: [config | Install wget package (Fedora based)] **************************
ok: [10.68.64.117]

TASK: [elastic | install elasticsearch signing key] ***************************
changed: [10.68.64.117]

TASK: [elastic | copy elasticsearch repo] *************************************
ok: [10.68.64.117]

TASK: [elastic | check for previous elasticsearch installation] ***************
changed: [10.68.64.117]

TASK: [elastic | uninstalling previous version if applicable] *****************
skipping: [10.68.64.117]

TASK: [elastic | install elasticsearch pkgs] **********************************
skipping: [10.68.64.117]

TASK: [elastic | copy elasticsearch configuration file] ***********************
ok: [10.68.64.117]

TASK: [elastic | copy elasticsearch defaults file] ****************************
ok: [10.68.64.117]

TASK: [elastic | set max memory limit in systemd file (RHEL/CentOS 7+)] *******
changed: [10.68.64.117]

TASK: [elastic | set log directory permissions] *******************************
ok: [10.68.64.117]

TASK: [elastic | set data directory permissions] ******************************
ok: [10.68.64.117]

TASK: [elastic | ensure elasticsearch running and enabled] ********************
ok: [10.68.64.117]

TASK: [elastic | check marvel plugin installed] *******************************
ok: [10.68.64.117]

TASK: [elastic | install marvel plugin] ***************************************
skipping: [10.68.64.117]

NOTIFIED: [elastic | restart elasticsearch] ***********************************
changed: [10.68.64.117]

PLAY RECAP ********************************************************************
10.68.64.117 : ok=19 changed=4 unreachable=0 failed=0

[ansible@ansible-host01 elk]$

I can verify that my ES cluster is working by querying the Cluster API – note that the red status is down to the fact I have no other cluster nodes yet on which to replicate the index shards:

# curl -XGET http://localhost:9200/_cluster/health?pretty
{
 "cluster_name" : "nx-elastic",
 "status" : "red",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 0,
 "active_shards" : 0,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 0,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

You can use further API queries to verify that the desired configuration is in place and at that point you have a solid, repeatable deployment with a known outcome ie: you are doing DevOps.