Category Archives: XCP

Xtreme Computing Platform

Openstack + Nutanix : Neutron Networking

In my last post we covered the all-in-one installation of the OpenStack controller with the Nutanix shipped Acropolis Openstack drivers. The install created a single virtual machine, the Openstack Services VM (OVM). In this post I intend to talk about setting up a Network Topology using the Openstack dashboard and the Neutron service integration with Nutanix. I will be able to show how this gets reflected in the Acropolis Prism GUI. First, let’s create a public network for our VMs to reside on. Navigate via the Horizon dashboard to Admin > Networks

Navigate to System > Networks on the Horizon dashboard

1. Navigate to System > Networks on the Horizon dashboard and select +Create Network

Create Network

Currently, only local and VLAN “Provider Network Types” are supported by the Nutanix Openstack drivers. In the screenshot below, I am creating a segmented network (ID 64), named public-network, in the default admin tenant. I specify the network as shared and external.

Pro Tip

Do not use a VLAN/network assignment that has already been defined within the Nutanix cluster. Any network/subnet assignment should be done within Openstack using network parameters reserved specifically for your Openstack deployments.



2. Add your network details – Name, Project (tenant), Network Type, VLAN ID, etc

Create Subnet

Each network needs to have a subnet created with an associated DHCP pool. This DHCP pool information gets sent via the appropriate API call to Acropolis. Acropolis management associates the IP address with the vnic on the Acropolis VM. The Acropolis Openstack driver reads this configuration and, when the cloud instance gets powered on in Openstack, it will register the IP address with the Openstack VM. See setup screenshots below…

Pro Tip

When creating a subnet, you must specify a DNS server.




3. Subnet creation – requires subnet name,  network address (CIDR notation), gateway address, DHCP pool range, DNS servers

We can see the newly created network reflected in Acropolis via the Prism GUI in the screenshot below:



Pro Tip

Once a network has been configured and you decide to add an additional cluster. That network will not be extended across the new cluster. You have a choice, you can either add a new network. Or, you can remove the network and re-add it so that it gets created across all the currently configured clusters.


Now that we have a network configured we can look at setting up cloud instances to run on it. To do that we need to set up the Glance Image Service and that’s the subject of my next post.

Additional reading

Openstack + Nutanix Integration : A Configuration Primer

As of Acropolis Base Software (NOS) version 4.6, Nutanix released a set of Acropolis drivers that provide Openstack + Nutanix integration. These drivers allow an Openstack deployment to consume the Acropolis management infrastructure in a similar way to a cloud service or within a datacenter. I intend to use this series of blog postings to cover a walk-through of setting up the Nutanix Openstack drivers deployment and configuring cloud instances.

The integration stack works by having the Openstack controller installed in a separate Nutanix Openstack Services VM (Nutanix OVM). The Acropolis drivers can be installed into that same OVM. These drivers then interpose on the Openstack services for compute, image, network and volume. By subsequently translating Openstack requests, into the appropriate REST API calls in Acropolis management layer, a series of Nutanix clusters are then managed by the Openstack controller.

Openstack Acropolis integrated stack

Openstack – Acropolis Driver integrated stack

The Acropolis drivers are installed in either one of :

All-In-One Mode: You use the OpenStack controller included in the Nutanix OVM to manage the Nutanix clusters. The Nutanix OVM runs all the OpenStack services and the Acropolis OpenStack drivers.

Driver-Only Mode: You use a remote (or upstream) OpenStack controller  to manage the Nutanix clusters, and the Nutanix OVM includes only the Acropolis OpenStack drivers.

In either case, Nutanix currently only supports the Kilo release of Openstack.

I will go into further detail around Openstack and Acropolis architecture integration in future posts. For now let’s start by getting things set up. First requirement is to download the OVM image – from the Nutanix Portal – and then add it to the Acropolis Image Service….

$ wget

and upload locally....

<acropolis> image.create ovm source_url=nfs://freenas/naspool/openstack/nutanix_openstack-2015.1.0-1.ovm.qcow2 container=Image-Store

Also, Prism allows upload from your desktop if preferred/possible
or, go direct via the internet...
<acropolis> image.create ovm source_url= container=Image-Store

Note : As I had already created several containers on my cluster, I needed to specify the name of the preferred container in the above syntax . Otherwise, the container name of default is expected, if only one container exists and no container name is supplied. The following error is shown otherwise…

 kInvalidArgument: Multiple containers have been created, cannot auto select

Create the Openstack Services VM (OVM) – using Acropolis command line on a CVM on the Nutanix cluster. This can all be done very easily via the Prism GUI but for reasons of space I am going the CLI route. Refer to the Install Guide on the Nutanix Portal. Select Downloads > Tools & Firmware from the menus/drop-downs

<acropolis> vm.create nx-ovm num_vcpus=2 memory=16G
nx-ovm: complete
<acropolis> vm.disk_create nx-ovm clone_from_image=ovm
DiskCreate: complete
<acropolis> vm.nic_create nx-ovm network=vlan.64
NicCreate: complete
<acropolis> vm.on nx-ovm

Note: if you are unfamiliar with creating a network for your VMs to reside on then take a look here  , where I discuss setting up VMs and associated disks and networking on the Nutanix platform.

For now let’s consider the all-in-one install mode , there are just three steps….

o Login to the VM using the supplied credentials (via ssh)

o Add the OVM
[root@nx-ovm]#  ovmctl --add ovm --name nx-ovm --ip --netmask --gateway --nameserver --domain

o Add the Openstack Controller 
[root@nx-ovm]# ovmctl --add controller --name kilo --ip

o Add the Nutanix clusters that you want to manage with OpenStack, 
one cluster at a time.
[root@nx-ovm]# ovmctl --add cluster --name SAFC --ip --username admin --password nutanix/4u --container_name DEFAULT-CTR --num_vcpus_per_core 4

Just to point out one or two things in the above CLI. The IP address for both OVM and Openstack controller are the same – should make sense as they are both part of  the same VM in this case. Also, use the Cluster Virtual IP when registering the cluster. This is for HA reasons. Rather than use the IP of an individual CVM, use the failover IP for the cluster itself.  You need to specify a container name if it is not default (or there are more than one).

Pro Tip


If you remove or rename the container with which you added a Nutanix cluster, you must restart the services on the OpenStack Controller VM by running the following command:

ovmctl --restart services


You can now verify the base install using….

[root@nx-ovm ~]# ovmctl --show

Allinone - Openstack controller, Acropolis drivers

OVM configuration:
1 OVM name : nx-ovm
 IP :
 Netmask :
 Gateway :
 Nameserver :
 Domain :

Openstack Controllers configuration:
1 Controller name : kilo
 IP :
 Auth strategy : keystone
 Auth region : RegionOne
 Auth tenant : services
 Auth Nova password : ********
 Auth Glance password : ********
 Auth Cinder password : ********
 Auth Neutron password : ********
 DB Nova : mysql
 DB Cinder : mysql
 DB Glance : mysql
 DB Neutron : mysql
 DB Nova password : ********
 DB Glance password : ********
 DB Cinder password : ********
 DB Neutron password : ********
 RPC backend : rabbit
 RPC username : guest
 RPC password : ********
 Image cache : disable

Nutanix Clusters configuration:
1 Cluster name : SAFC
 IP :
 Username : admin
 Password : ********
 Vnc : 49795
 Vcpus per core : 4
 Container name : DEFAULT-CTR
 Services enabled : compute, volume, network

Version : 2015.1.0
Release : 1
Summary : Acropolis drivers for Openstack Kilo.

Additionally, pointing your browser at the IP address of the OVM – and navigating to Admin > System Information and selecting the Services Tab, you should see that all Openstack services are provided by the OVM IP address. Similarly, under the Compute, Block Storage and Network tabs it should also report that these services are being provided via the OVM


One other check would be to look at Admin > Hypervisors, a Nutanix cluster reports as a single hypervisor in the Openstack config – see below:


I hope  this is enough to get people started looking at and trying out Openstack deployments using Nutanix. In the next series of posts I will look at configuring images, setting up networks/subnets and then move on to creating instances.

Additional info / reading

Nutanix: Cloud-like DevOps powering NoSQL for BigData

The popularity of NoSQL has increasingly come about as developers want to use the same in-memory data structures in their applications and have them map directly into a database persistence layer. For example, storing data in XML or JSON format is often hierarchical and potentially does not lend itself to being easily stored in row based tables. It becomes more complicated if the data also contains lists and objects. Not having to convert these in-memory structures into relational database structures is a major advantage in terms of time to value. Such considerations have been made all the more acute by the rise of the web as a platform for services. There’s also an economic aspect, like the prohibitive infrastructure costs required to scale up traditional RDBMS to support high availability etc. Compare this to such Web-Scale or cloud aware apps like NoSQL, which expects to “just drop in” commodity hardware at the infrastructure layer and scale out horizontally on demand.

So if we were to consider the requirements from a modern hyper-converged infrastructure (HCI) that employed the same Web-Scale paradigms used by modern cloud-aware applications. Then to deploy apps, like a NoSQL database for example, the first thing I would want to do is virtualise. This means a right-sized, sandboxed environment (ie a virtual machine) to run individual NoSQL instances. If there was a need to scale up, then it’s a simple case of increasing RAM and CPU. As the application landscape grows over time and starts to scale out, there’s increased need for more nodes/VMs.  Hence, any HCI platform needs cloud like provisioning of nodes. So providing faster time to deploy and time to value. The ability to auto-discover and add new nodes by the click of a button is quite compelling. In short, horizontal scale out needs to be easily undertaken. Say, in the middle of the production day, while running the month end workload?

Intelligent, automated data tiering, locality and balancing via post-process techniques like Mapreduce is another key requirement. As any database working set grows over time, ie: more users will mean more queries, new tables, indexes, aggregations, etc. So the ability to maintain a responsive I/O profile via SSD, as more I/O is periodically obtained from disk, will be key. If all VMs are then able to get local access to their data via SSD from a global storage fabric so much the better. While we are here, consider how you would migrate to a new(er) hardware fleet with and without a distributed storage fabric. Far easier to just drop in units of converged compute/storage and then migrate VMs to it. Compare how that would work with a large white box server estate spread across numerous racks in a DC? There’s yet another aspect of economics to all this. In that auto tiering of the storage layer means the current “working set” data is held at the most performant (and by comparison more expensive) layer. While colder data sits on cheaper spinning disk.

Another advantage of a distributed storage fabric is one of data service features. Take point in time (PIT) backups of sharded DBs, which can sometimes be a complicated issue. In which case, a data service that supports VM centric snapshots of key VMs in a consistency group can avoid another potential pain point. Also, rapid cloning of preconfigured VMs will improve deployment times and speaks to the DevOps workflows that many IT shops have increasingly adopted. Consider how easy might it be to create dev/QA environments with production style data using such mechanisms? What about burst workloads? The ability to migrate VMs between public and private cloud would bring further benefits, both as a means to provide offsite backups or move VMs between geographies.

Bear in mind there isn’t 20+ years of ecosystem software (or even tribal knowledge perhaps?) in the NoSQL community – unlike in traditional RDBMS. For this reason continual monitoring is a major requirement. The ability to support a floor to ceiling overview of VMs, hypervisor and hardware platform in terms of performance, alerts and events is paramount. We mentioned briefly above how working set size and IO throughput could affect end user experience. So the ability to predict trends in such behaviour means timely decisions about when to scale or shard an application can be made.  No discussion of any DevOps processes is complete without including REST API and/or Powershell automation capabilities. Automation is key in terms of workflow agility, allowing routine tasks to be performed repeatedly with a well understood outcome. Dev/QA environments can benefit greatly from the features already described. In addition, via the API, developers can build self-service portal software allowing them to spin up new environments in a matter of minutes.

In previous roles I worked with customers running UNIX based failover clusters protecting traditional SQL RDBMS and ERP software. Think Solaris and SUN Cluster, underpinning Oracle and SAP installs.  While running this kind of ‘Big Iron’ was considered ‘state of the art’. Coming up fast on the inside was ‘Big Data’ and with it a complete rethink on how to achieve massive scale. Traditionally, systems had scaled vertically by adding more CPU and RAM to the host platform, and horizontally by adding system boards to a midframe chassis. This came at a price and often a staggering level of administrative complexity. While Web-Scale technologies may not have completely replaced this approach yet, large scale big iron systems will continue to become more niche as time goes on in my opinion.

So, coming back to the beginning of this post. HCI is not only about scaling just to support Big Data workloads, it’s also about creating lower time to value and radical ease of use synergies with the application that sits on top of the stack. Having a HCI platform designed from the ground up with the same underlying principles as modern Web-Scale applications, means we are able to remove the operational delays and complexity that tend to act as drag anchors in today’s rapid deployment environments. IT departments are then free to focus on innovations that help the business succeed.

Configuring Docker Storage on Nutanix

I have recently been looking at how best to deploy a container ecosystem on my Nutanix XCP environment. At present I can run a number of containers in a virtual machine (VM). This will give me the required networking, persistent storage and the ability to migrate between hosts. Something that containers are only just becoming capable of in many cases. So that I can scale out my container deployment within my Docker host VM, I am going to have to consider increasing  the available space within /var/lib/docker. By default, if you provide no additional storage/disk for your Docker install, loopback files get created to store your containers/images. This configuration is not particularly performant, so it’s not supported for production.  You can see below how the default setup looks…

# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.821 GB
 Data Space Total: 107.4 GB
 Data Space Available: 50.2 GB
 Metadata Space Used: 1.479 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-client

Configuring Docker Storage Options

Looking at the various methods we can use to provide dedicated block storage for Docker containers. With the Devicemapper storage driver Docker automatically creates a base thin device, this is two block devices, one for data and one for metadata. The thin device is automatically formatted with an empty filesystem on creation. This device is the base of all docker images and containers. All base images are snapshots of this device and those images are then in turn used as snapshots for other images and eventually containers. This is the Docker supported production setup. Also, by using LVM based devices as the underlying storage you are then accessing them as raw devices and no longer go through the VFS layer.

Devicemapper : direct-lvm

To begin, create two LVM devices, one for container data and another to hold metadata. By default the loopback method creates a storage pool with 100GB of space. In this example I am creating a 200G LVM volume for data and a 5G metadata volume. I prefer separate volumes where possible for performance reasons. We start by hot-adding the required Nutanix vDisks to the virtual machine (docker-directlvm) guest OS (Fedora22)

<acropolis> vm.disk_create docker-directlvm create_size=200g container=DEFAULT-CTR
DiskCreate: complete
<acropolis> vm.disk_create docker-directlvm create_size=10g container=DEFAULT-CTR
DiskCreate: complete

[root@docker-directlvm ~]# lsscsi 
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdb 
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdc 

The next step is to create the individual LVM volumes

# pvcreate /dev/sdb /dev/sdc
# vgcreate direct-lvm /dev/sdb /dev/sdc

# lvcreate --wipesignatures y -n data direct-lvm -l 95%VG
# lvcreate --wipesignatures y -n metadata direct-lvm -l 5%VG

If setting up a new metadata pool need to zero the first 4k to indicate empty metadata:
# dd if=/dev/zero of=/dev/direct-lvm/metadata bs=1M count=1

For sizing the metadata volume above, the rule of thumb seems to be 0.1% of the data volume. This is somewhat anecdotal, so size with a little headroom perhaps? Next, start the Docker daemon using the required options in the file /etc/sysconfig/docker-storage.

DOCKER_STORAGE_OPTIONS="--storage-opt dm.datadev=/dev/direct-lvm/data --storage-opt \
dm.metadatadev=/dev/direct-lvm/metadata --storage-opt dm.fs=xfs"

You can then verify that the requested underlying storage is in use, with the docker info command

# docker info
Containers: 5
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/direct-lvm/data
 Metadata file: /dev/direct-lvm/metadata
 Data Space Used: 10.8 GB
 Data Space Total: 199.5 GB
 Data Space Available: 188.7 GB
 Metadata Space Used: 7.078 MB
 Metadata Space Total: 10.5 GB
 Metadata Space Available: 10.49 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-directlvm

All well and good so far, but the storage options to expose data and metadata locations namely dm.datadev and dm.metadatadev have been deprecated in favour of a preferred model, which is to have a thin pool reserved outside of Docker and passed to the daemon via the dm.thinpooldev storage option.  There’s a helper script in some Linux distros called  /etc/sysconfig/docker-storage-setup. This does all the heavy lifting, you just need to supply a device  and, or a volume group name.

Devicemapper: Thinpool

Once again start by creating the virtual device – a Nutanix vDisk –  and adding it to the virtual machine guest OS

<acropolis> vm.disk_create docker-thinp create_size=200g container=DEFAULT-CTR
DiskCreate: complete

[root@localhost sysconfig]# lsscsi
[0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0
[2:0:0:0] disk NUTANIX VDISK 0 /dev/sda
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdd

Edit the file /etc/sysconfig/docker-storage-setup as follows:

root@localhost sysconfig]# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
# For more details refer to "man docker-storage-setup"

Then run the storage helper script:

[root@docker-thinp ~]# pvcreate /dev/sdd
 Physical volume "/dev/sdd" successfully created

[root@docker-thinp ~]# vgcreate docker /dev/sdd 
 Volume group "docker" successfully created 

[root@docker-thinp ~]# docker-storage-setup 
Rounding up size to full physical extent 192.00 MiB 
Logical volume "docker-poolmeta" created. 
Wiping xfs signature on /dev/docker/docker-pool. 
Logical volume "docker-pool" created. 
WARNING: Converting logical volume docker/docker-pool and docker/docker-poolmeta to pool's data and metadata volumes. 
Converted docker/docker-pool to thin pool. 
Logical volume "docker-pool" changed.

You can then verify the underlying storage being used in the usual way

[root@docker-thinp ~]# lsblk

sdd 8:48 0 186.3G 0 disk
├─docker-docker--pool_tmeta 253:5 0 192M 0 lvm
│ └─docker-docker--pool 253:7 0 74.4G 0 lvm
└─docker-docker--pool_tdata 253:6 0 74.4G 0 lvm
 └─docker-docker--pool 253:7 0 74.4G 0 lvm

[root@docker-thinp ~]# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 62.39 MB
 Data Space Total: 79.92 GB
 Data Space Available: 79.86 GB
 Metadata Space Used: 90.11 kB
 Metadata Space Total: 201.3 MB
 Metadata Space Available: 201.2 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-thinp

On completion this will have created the correct entries in /etc/sysconfig/docker-storage

[root@docker-thinp ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs \
 --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

and runtime looks like...

[root@docker-thinp ~]# ps -ef | grep docker
root 8988 1 0 16:13 ? 00:00:11 /usr/bin/docker daemon --selinux-enabled
--storage-driver devicemapper --storage-opt dm.fs=xfs 
--storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

Bear in mind that when you are changing the underlying docker storage driver or storage options similar to the examples described above.  Then, typically, the following destructive command sequence is run (be sure to backup any important data before running the following ….

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker 

post changes 

$ systemctl daemon-reload
$ systemctl start docker

Additional Info

ELK on Nutanix : Kibana

It might seem like I am doing things out of sequence by looking at the visualisation layer of the ELK stack next. However, recall in my original post , that I wanted to build sets  of unreplicated indexes and then use Logstash to fire test workloads at them. Hence, I am covering Elasticsearch and Kibana initially. This brings me to another technical point that I need to cover. In order for a single set of indexes to be actually recoverable, when running on a single node, we need to invoke the following parameters in our Elasticsearch playbook :

So in file: roles/elastic/vars/main.yml
elasticsearch_gateway.recover_after_nodes: 1
elasticsearch_gateway.recover_after_time: 5m
elasticsearch_gateway.expected_nodes: 1

These are then set in the elasticsearch.yml.j2 file as follows:

# file: roles/elastic/templates/elasticsearch.yml.j2
#{{ ansible_managed }}


# Allow recovery process after N nodes in a cluster are up:
#gateway.recover_after_nodes: 2
{% if elasticsearch_gateway_recover_after_nodes is defined %}
gateway.recover_after_nodes : {{ elasticsearch_gateway_recover_after_nodes}}
{% endif %}

and so on ....

This allows the indexes to be recovered when there is only a single node in the cluster. See below for the state of my indexes after a reboot:

[root@elkhost01 elasticsearch]# curl -XGET http://localhost:9200/_cluster/health?pretty
 "cluster_name" : "nx-elastic",
 "status" : "yellow",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 4,
 "active_shards" : 4,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 4,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0

Lets now look at the Kibana playbook I am attempting. Unfortunately, Kibana is distributed as a compressed tar archive. This means that the yum or dnf modules are no help here. There is however a very useful unarchive module, but first we need to download the tar bundle using get_url as follows :

- name: download kibana tar file
 get_url: url={{ kibana_version }}-linux-x64.tar.gz
 dest=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz mode=755
 tags: kibana

I initially tried unarchiving the Kibana bundle into /tmp. I then intended to copy everything below the version specific directory (/tmp/kibana-4.0.1-linux-x64) into the Ansible created /opt/kibana directory. This proved problematic as neither the synchronize nor the copy modules seemed setup to do mass copy/transfer between one directory structure to another. Maybe I am just not getting it – I even tried using with_item loops but no joy as fileglobs are not recursive. Answers on a postcard are always appreciated? In the end I just did this :

- name: create kibana directory
 become: true
 file: owner=kibana group=kibana path=/opt/kibana state=directory
 tags: kibana

- name: extract kibana tar file
 become: true
 unarchive: src=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz dest=/opt/kibana copy=no
 tags: kibana

The next thing to do was to create a systemd service unit. There isn’t one for Kibana as there is no rpm package available. Usual templating applies here :

- name: install kibana as systemd service
 become: true
 template: src=kibana4.service.j2 dest=/etc/systemd/system/kibana4.service owner=root \
           group=root mode=0644
 - restart kibana
 tags: kibana

And the service unit file looked like:

[ansible@ansible-host01 templates]$ cat kibana4.service.j2
{{ ansible_managed }}

ExecStart=/opt/kibana/kibana-{{ kibana_version }}-linux-x64/bin/kibana


This all seemed to work as I could now access Kibana via my browser. No indexes yet of course :


There are one or two plays I would like still like to document. Firstly, the ‘notify’ actions in some of the plays. These are used to call – in my case – the restart handlers. Which in turn causes the service in question to be restarted – see the next section :

# file: roles/kibana/handlers

- name: restart kibana
 become: true
 service: name=kibana state=restarted

I wanted to document this next feature simply because it’s so useful – tags. I have assigned a tag to every play/task in the playbook so far you will have noticed. For testing purposes they allow you to run specific plays. You can then troubleshoot just that particular play and see what’s going on.

 ansible-playbook -i ./production site.yml --tags "kibana" --ask-sudo-pass

Now that I have the basic plays to get my Elasticsearch and Kibana services up and running via Ansible, it’s time to start looking at Logstash. Next time I post on ELK type stuff, I will try to look at logging and search use cases. Once I crack how they work of course.

ELK on Nutanix : Elasticsearch

In this second post on using Ansible to deploy the ELK stack on Nutanix, I will cover my initial draft at a playbook for Elasticsearch (ES).  Recall from my previous post, the playbook layout looks like:

[ansible@ansible-host01 roles]$ tree elastic
├── files
│   └── elasticsearch.repo
├── handlers
│   └── main.yml
├── tasks
│   └── main.yml
├── templates
│   ├── elasticsearch.default.j2
│   ├──
│   └── elasticsearch.yml.j2
└── vars
 └── main.yml

There’s also an additional role at play here, config – which is the basic config for the underlying VM guest OS, which we also need to look at :

[ansible@ansible-host01 roles]$ tree config
├── files
├── handlers
├── tasks
│   └── main.yml
├── templates
└── vars
 └── main.yml

the common role is where I set things via the Ansible sysctl module, or add entries to files (using lineinfile) in order to set max memory and ulimits etc. It’s generic system configuration, so for example:

#installing java runtime pkgs (pre-req for ELK)
- name: install java 8 runtime
 become: true
 yum: name=java state=installed
 tags: config

#set system max/min numbers...
- name: set maximum map count in sysctl/systemd
 become: true
 sysctl: name=vm.max_map_count value={{ os_max_map_count }} state=present
 tags: config


- name: set soft limits for open files
 become: true
 lineinfile: dest=/etc/security/limits.conf line="{{ elasticsearch_user }} soft nofile {{ elasticsearch_max_open_files }}" insertafter=EOF backup=yes
 tags: config

- name: set max locked memory
 become: true
 lineinfile: dest=/etc/security/limits.conf line="{{ elasticsearch_user }} - memlock {{ elasticsearch_max_locked_memory }}" insertafter=EOF backup=yes
 tags: config


Here might be a good time to touch upon how Ansible allows you to set variables. Within the directory of each role there’s a subdir called vars and all the variables needed for that role are contained in the YAML file (main.yml). Here’s a snippet:

# can use vars to set versioning and user 
elasticsearch_version: 1.7.0
elasticsearch_user: elasticsearch

# here's how we can specify the data volumes that ES will use 
elasticsearch_data_dir: /esdata/data01,/esdata/data02,/esdata/data03,/esdata/data04,/esdata/data05,/esdata/data06


# Virtual memory settings - ES heap is set to half my current VM RAM
# but no greater than 32GB for performance reasons
elasticsearch_heap_size: 16g
elasticsearch_max_locked_memory: unlimited
elasticsearch_memory_bootstrap_mlockall: "true"


# Good idea not to go with the ES default names of Franz Kafka etc
elasticsearch_cluster_name: nx-elastic
elasticsearch_node_name: nx-esnode01

# My initial nodes will be both cluster quorum members and data "workhorse" nodes.
# I will # separate duties as I scale. Also I set the min master nodes to 1 so that 
# my ES cluster comes up while initially testing a single index 
elasticsearch_node_master: "true"
elasticsearch_node_data: "true"
elasticsearch_discovery_zen_minimum_master_nodes: 1

We’ll see how we use these variables as we cover more features. Next up I used some nice features like shell and also register variables to be able to provide conditional behaviour for package install :

- name: check for previous elasticsearch installation
 shell: if [ -e /usr/share/elasticsearch/lib/elasticsearch-{{ elasticsearch_version }}.jar ]; then echo yes; else echo no; fi;
 register: version_exists
 always_run: True
 tags: elastic

- name: uninstalling previous version if applicable
 become: true
 command: yum erase -y elasticsearch
 when: version_exists.stdout == 'no'
 ignore_errors: true
 tags: elastic

and similarly for the marvel plugin :

- name: check marvel plugin installed
 become: true
 stat: path={{ elasticsearch_home_dir }}/plugins/marvel
 register: marvel_installed
 tags: elastic

- name: install marvel plugin
 become: true
 command: "{{ elasticsearch_home_dir }}/bin/plugin -i elasticsearch/marvel/latest"
 - restart elasticsearch
 when: not marvel_installed.stat.exists
 tags: elastic

The Marvel plugin stanza above also makes use of the stat module – this is a really great module. It returns all kinds of goodness you would normally expect from a stat() system call and yet you are doing it in your Ansible playbook.  There are a couple more things I will cover and then leave the rest for when I talk about Kibana and Logstash in a follow up post. First up then are templates. Ansible uses Jinja2 templating in order to transform a file and install it on your host, you can create a file with appropriate templating as below. The variables in {{ .. }} are from the roles ../var directory containing the yaml file already described earlier.

Note : I stripped all comment lines for sake of brevity:

[ansible@ansible-host01 templates]$ pwd
[ansible@ansible-host01 templates]$ grep -v ^# elasticsearch.yml.j2
{% if elasticsearch_cluster_name is defined %} {{ elasticsearch_cluster_name }}
{% endif %}


{% if elasticsearch_node_name is defined %} {{ elasticsearch_node_name }}
{% endif %}


{% if elasticsearch_node_master is defined %}
node.master: {{ elasticsearch_node_master }}
{% endif %}
{% if elasticsearch_node_data is defined %} {{ elasticsearch_node_data }}
{% endif %}


{% if elasticsearch_memory_bootstrap_mlockall is defined %}
bootstrap.mlockall: {{ elasticsearch_memory_bootstrap_mlockall }}
{% endif %}

The template  file when run in the play is then transformed using the provided variables and copied into place on my  ELK host target VM…

- name: copy elasticsearch defaults file
 become: true
 template: src=elasticsearch.default.j2 dest=/etc/sysconfig/elasticsearch owner={{ elasticsearch_user }} group={{ elasticsearch_group }} mode=0644
 - restart elasticsearch
 tags: elastic

So let’s see how our playbook runs and what the output looks like

[ansible@ansible-host01 elk]$ ansible-playbook -i ./production site.yml \
--tags "config,elastic" --ask-sudo-pass
SUDO password:

PLAY [elastic-hosts] **********************************************************

GATHERING FACTS ***************************************************************
ok: []

TASK: [config | install java 8 runtime] ***************************************
ok: []

TASK: [config | set swappiness in sysctl/systemd] *****************************
ok: []

TASK: [config | set maximum map count in sysctl/systemd] **********************
ok: []

TASK: [config | set hard limits for open files] *******************************
ok: []

TASK: [config | set soft limits for open files] *******************************
ok: []

TASK: [config | set max locked memory] ****************************************
ok: []

TASK: [config | Install wget package (Fedora based)] **************************
ok: []

TASK: [elastic | install elasticsearch signing key] ***************************
changed: []

TASK: [elastic | copy elasticsearch repo] *************************************
ok: []

TASK: [elastic | check for previous elasticsearch installation] ***************
changed: []

TASK: [elastic | uninstalling previous version if applicable] *****************
skipping: []

TASK: [elastic | install elasticsearch pkgs] **********************************
skipping: []

TASK: [elastic | copy elasticsearch configuration file] ***********************
ok: []

TASK: [elastic | copy elasticsearch defaults file] ****************************
ok: []

TASK: [elastic | set max memory limit in systemd file (RHEL/CentOS 7+)] *******
changed: []

TASK: [elastic | set log directory permissions] *******************************
ok: []

TASK: [elastic | set data directory permissions] ******************************
ok: []

TASK: [elastic | ensure elasticsearch running and enabled] ********************
ok: []

TASK: [elastic | check marvel plugin installed] *******************************
ok: []

TASK: [elastic | install marvel plugin] ***************************************
skipping: []

NOTIFIED: [elastic | restart elasticsearch] ***********************************
changed: []

PLAY RECAP ******************************************************************** : ok=19 changed=4 unreachable=0 failed=0

[ansible@ansible-host01 elk]$

I can verify that my ES cluster is working by querying the Cluster API – note that the red status is down to the fact I have no other cluster nodes yet on which to replicate the index shards:

# curl -XGET http://localhost:9200/_cluster/health?pretty
 "cluster_name" : "nx-elastic",
 "status" : "red",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 0,
 "active_shards" : 0,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 0,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0

You can use further API queries to verify that the desired configuration is in place and at that point you have a solid, repeatable deployment with a known outcome ie: you are doing DevOps.