Tag Archives: virtual machine

Elasticsearch Sizing on Nutanix

One node, one index, one shard

The answer to the question : “how big should I size my Elasticsearch VMs and how what kind of performance will I get?”, always comes down to the somewhat disappointing answer of “It depends!?” It depends on the workload – be it index or search heavy, on the type of data being transformed and so on. 

The way to size your Elasticsearch environment is by finding your “unit of scale”, this is the performance characteristics you will get for your workload via a single shard index running in a single Virtual Machine (VM). Once you have a set of numbers for a particular VM config then you can scale throughput etc, via increasing the number of VMs and/or indexes to handle additional workload.

Virtual Machine Settings

The accepted sweet spot for VM sizing an indexing workload is something like 64GB RAM/ 8+ vCPUs. You can of course right size this further where necessary, thanks to virtualisation. I assign just below half the RAM (31GB) to the heap for the Elasticsearch instance. This is to ensure that the JVM uses compressed Ordinary Object Pointers (OOPs) on a 64 bit system. This heap memory also needs to be locked into RAM

# grep -v ^# /etc/elasticsearch/elasticsearch.yml

cluster.name: esrally
node.name: esbench

path.data: /elastic/data01    # <<< single striped data volume 
bootstrap.memory_lock: true   # <<< lock heap in RAM
network.host: 10.68.68.202
http.port: 9200
discovery.zen.minimum_master_nodes: 1  # <<< single node test cluster
xpack.security.enabled: false

# grep -v ^# /etc/elasticsearch/jvm.options
…
-Xms31g
-Xmx31g
…

From the section above , notice the single mount point for the path.data entry. I am using a 6 vdisk LVM stripe. While you can specify per-vdisk mount points in a comma separated list, unless you have enough indices to make sure all the spindles turn (all the time) then you are better off with logical volume management. You can ensure you are using compressed OOPs by checking for the following log entry at startup

[2017-08-07T11:06:16,849][INFO ][o.e.e.NodeEnvironment ] [esrally02] heap size [30.9gb], compressed ordinary object pointers [true]

Operation System Settings

Set the required kernel settings 

# sysctl -p 
…
vm.swappiness = 0
vm.overcommit_memory = 0
vm.max_map_count = 262144
…

Ensure file descriptors limits are increased

# ulimit –n 65536

verify...

curl –XGET http://10.68.68.202:9200/_nodes/stats/process?filter_path=**.max_file_descriptors
…
{"process":{"max_file_descriptors":65536}}}}
…

Disable swapping, either via the cli or remove swap entries from /etc/fstab

# sudo swapoff –a 

Elasticsearch Bulk Index Tuning

In order to improve indexing rate and increase shard segment size, you can disable refresh interval on an initial load.  Afterwards, setting this to 30s (default=1s) in production means larger segments sizes and potentially less merge pressure at a later date.

curl -X PUT "10.68.68.202:9200/elasticlogs/_settings" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "refresh_interval" : "-1"
    }
}’

Recall that we only want a single shard index and no replication for our testing. We can achieve this by either disabling replication on the fly or creating a template that configures the desired settings at index creation 

Disable replication globally ...

curl -X PUT "10.68.68.202:9200/_settings" -H 'Content-Type: application/json' -d '{"index" : {"number_of_replicas" : 0}}’

or create a template - in this case, for a series of index name regex patterns...

# cat template.json
{
        “index_patterns": [ “ray*”, "elasticlogs”],
        "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 0
        }
}
curl -s -X PUT "10.68.68.202:9200/_template/test_template" -H 'Content-Type: application/json' -d @template.json

Elasticsearch Benchmarking tools

esrally is a macrobenchmarking tool for elasticsearch. To install and configure – use the following quickstart guide. Full information is available here :

 https://github.com/elastic/rally

rally-eventdata-track –  is repository containing a Rally track for simulating event-based data use-cases. The track supports bulk indexing of auto-generated events as well as simulated Kibana queries.

 https://github.com/elastic/rally-eventdata-track

esrally --pipeline=benchmark-only --target-hosts=10.68.68.202:9200 
--track=eventdata --track-repository=eventdata --challenge=bulk-size-evaluation
eventdata bulk index - 5000 events/request highlighted @indexing rate of ~50k docs/sec
eventdata bulk index – 5000 events/request highlighted @indexing rate of ~50k docs/sec
httpd logs index test - highlighted @indexing rate ~80k docs/s
httpd logs index test – highlighted @indexing rate ~80k docs/s

Elasticsearch is just one of a great many cloud native applications that can run successfully on Nutanix Enterprise Cloud. I am seeing more and more opportunities to assist our account teams in the sizing and deployment of Elasticsearch. However, unlike other Search and Analytics platforms Elasticsearch has no ready made formula for sizing. This post will hopefully allow people to make a start on their Elasticsearch sizing on Nutanix and, in addition, help identify future steps to improve their performance numbers.

Further Reading

Elasticsearch Reference

Using Nutanix clones to deploy MongoDB replica set

In this post I am going to look at setting up a replica set to support high availability in a MongoDB environment. Replica sets contain a primary MongoDB database and a number of additional secondary replica databases. Any one of the allowed replicas can become primary in the event that the original primary fails for whatever reason. Replica set membership count is usually an odd number in order that new primary elections are not tied.

Building out an HA MongoDB setup on Nutanix is relatively easy to do. Each MongoDB instance is hosted in a separate, sandboxed environment. In our case a virtual machine (VM). Each VM is then located on a separate physical hypervisor host. I have a gold image VM that has a MongoDB instance installed along recommended best practice guidelines. This VM gets cloned as required when I need to build out a new MongoDB environment. So for a 3 member replica set I need 3 clones.

three-replicaset

From a cluster CVM node type:

$ acli 

<acropolis> vm.clone mongodb01,mongodb02,mongodb03 clone_from_vm=mongodb30-gold
mongodb01: complete
mongodb02: complete
mongodb03: complete
 
<acropolis> vm.list
...
mongodb01: 2b9498c1-502e-454e-93c8-931a45a321b6
mongodb02: 9a445d26-caf9-4ddf-9d8e-296ea8b6e19e
mongodb03: 9a5512fa-3d19-4ddc-8cac-11721f999459
...

<acropolis> vm.on mongodb01,mongdb02,mongodb03
mongodb01: complete
mongodb01: complete
mongodb01: complete

After powering on the VMs, check that mongod starts correctly on default port 27017 on each VM. First thing to make sure is that the mongod process is listening on the correct address. I have set my VMs to use DHCP and this is the address that the service needs to listen on.

# ip a

2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether 52:54:00:db:17:76 brd ff:ff:ff:ff:ff:ff
 inet 10.68.64.111/24 brd 10.68.64.255 scope global eth0


# cat /etc/mongod.conf | grep -i bind_ip
 bind_ip=127.0.0.1,10.68.64.111

# service mongod restart
# service mongod status

Once all of the VMs are up and running on their respective address:port tuples, make sure that we enable firewall access via iptables. Each VM, that will form part of the replica set, needs to allow access to the other members via mongod port 27017. So for a replica set with members 10.68.64.111, 10.68.64.114, 10.68.64.113, then for each member, in this example 10.68.64.111, run…

# iptables -A INPUT -s 10.68.64.113 -p tcp --destination-port 27017 -m state \
--state NEW,ESTABLISHED -j ACCEPT
# iptables -A INPUT -s 10.68.64.114 -p tcp --destination-port 27017 -m state \
--state NEW,ESTABLISHED -j ACCEPT

# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
# service iptables reload

abridged iptables -L output after the above changes….

Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT tcp -- 10.68.64.113 anywhere tcp dpt:27017 state NEW,ESTABLISHED
ACCEPT tcp -- 10.68.64.114 anywhere tcp dpt:27017 state NEW,ESTABLISHED

Check access by performing a series of bi-directional tests between all the replica set members:

<10.68.64.111>$ mongo --host 10.68.64.113 --port 27017
MongoDB shell version: 3.0.3
connecting to: 10.68.64.113:27017/test
>
> quit()

Should any of the connection tests fail then revisit the iptables entries. Usual troubleshooting applies with telnet or nc, netstat etc.

In order to create the replica set, connect via ssh to each VM and edit the mongod.conf to include the replSet functionality:

$ grep -i replSet /etc/mongod.conf
replSet=rs01

Restart the mongod process (sudo service mongod restart) and then start a mongo shell session, the first member of the set (primary) needs to run :

$ mongo
MongoDB shell version: 3.0.3
connecting to: test
> rs.initiate()
{
 "info2" : "no configuration explicitly specified -- making one",
 "me" : "10.68.64.111:27017",
 "ok" : 1
}
rs01:PRIMARY>

You can use the shell commands rs.conf() and rs.status() to check the replica set at any point. We’ll look at one of these outputs after completing the replica set creation. Next, from the same mongo shell session, add the other two replica nodes:

rs01:PRIMARY> rs.add("10.68.64.113")
{ "ok" : 1 }

rs01:PRIMARY> rs.add("10.68.64.114")
{ "ok" : 1 }

Potential error scenarios

  •  if you didn’t clone the VMs for the replica set from a blank gold image but rather from a VM already running a replicated mongodb configuration. Then the replication commands report errors similar to this :
{
 "info2" : "no configuration explicitly specified -- making one",
 "me" : "10.68.64.111:27017",
 "info" : "try querying local.system.replset to see current configuration",
 "ok" : 0,
 "errmsg" : "already initialized",
 "code" : 23
}

On the proviso that this is a greenfield install, delete the local db config files in the data directory and re-run the rs.initiate()

  • if the firewall rules are not set correctly then the following error message is thrown:
 "errmsg" : "Quorum check failed because not enough voting nodes responded; 
required 2 but only the following 1 voting nodes responded: 10.68.64.111:27017; 
the following nodes did not respond affirmatively: 
10.68.64.131:27017 failed with Failed attempt to connect to 10.68.64.131:27017; 
couldn't connect to server 10.68.64.131:27017 (10.68.64.131), 
connection attempt failed",

Ensure that the firewall rules allow proper access between the VM’s.

  • if replication is not enabled correctly in the mongod configuration files on each host of the replica set :
"errmsg" : "Quorum check failed because not enough voting nodes responded; 
required 2 but only the following 1 voting nodes responded: 10.68.64.110:27017; 
the following nodes did not respond affirmatively: 
10.68.64.114:27017 failed with not running with --replSet",

Once the replica set configuration is complete, check the setup by running rs.status() or rs.conf() to confirm :

rs01:PRIMARY> rs.conf()
{
 "_id" : "rs01",
 "version" : 3,
 "members" : [
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "10.68.64.113:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "10.68.64.114:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatTimeoutSecs" : 10,
 "getLastErrorModes" : {

 },
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 }
 }
}

From the output above we can see the full replica set membership, both the member function and status. Things like priority settings and whether or not the replica is hidden to user applications queries etc. Also, whether a replica is a full mongod instance or an arbiter (simply there to mitigate against primary election ties). Or, if any of the replicas have a delay enabled (used for backup/reporting duties).

In an earlier post I have shown the available mongo shell commands to calculate the working set for the database. For read intensive workloads, where your working set is sized to fit available RAM in the mongod server VMs; a replica set deployment can be used to run MongoDB and support high availability.