Tag Archives: Acropolis

Openstack + Nutanix : Nova and Cinder integration

Now that we have setup an allinone deployment of the Acropolis OVM, configured networking, and an image registry. It’s time to look at the steps required to launch virtual machine (VM) instances and setup appropriate storage.  The first steps to take are to provide the necessary network access rules for the VM’s if they don’t already exist. The easiest way to do this is to create rules to ensure SSH (port 22) access from any address range and to make the VMs pingable.

Compute > Access & Security > Security Groups

Compute > Access & Security > Security Groups

Compute > Access-Security > Security Groups

Compute > Access & Security > Security Groups

Next create an SSH key-pair that can be assigned to your instances and subsequently control VM remote login access to holders of the appropriate private key. I will show how this is used later in the post, when we launch an instance. First, select the Key Pairs tab in the Access & Security frame and save the resulting PEM file to be used when accessing your VMs.

access-kp-create

Create a named key-pair (for example fedora-kp) for the set of instances you will create.

As an example, I am going to create a single volume using the Cinder service, in order to show we can attach this to a running VM. In this instance, Cinder gets redirected to the Acropolis Volume API and the subsequent volume gets attached to the instance as an iSCSI block device.

volume-create

Next step will be to spin up a number of VM instances, I have given a generic instance prefix for the name, and I am choosing to boot a Fedora 23 Cloud image. You can see the Flavour Details in the side panel in the screenshot below – Note the root disk size is big enough to accommodate the base image.

instances-launch

I also need to specify the SSH key-pair I am using and the Network on which the instances get launched. See below :

instances-network

instances-kps

At this point I can go ahead and launch my instances. We can see the 10 instances chosen all get created below, along with the assigned IP addresses from the already defined network, the instance flavour, and the named key-pair ….

instance-list

So now, if we were to take a look at the Nutanix cluster backend via Prism, we can see those VM instances created on the cluster and how they are spread across the hypervisor hosts. That’s all down to Acropolis management and placement.

prims-vm-list

We can dig a little deeper into the Acropolis functionality and show how each of the steps taken by the Acropolis REST API calls have built and deployed the VMs on the backend. Here’s the list of VMs that were created as defined in the http://<CVM-IP>:2030 page.

2030-vm-list

And we can see the breakdown of the individual task steps and how long each one took and how long they might have queued for, and if they were ultimately successful and so on. The key take away from all this is that the speed of creation of the VM instances is largely down to the Acropolis management interfaces consumed by the REST API calls.

ergon-task-list

Let’s take one of those VMs and add some volumes to it, let’s add a data and a log volume to fedvm-10. First of all we need to create the iSCSI volumes

volume-attach

 

Then we can attach the volumes to the VM instance ….

attach-volume

We now have the two volumes attached to the VM ….

volume-attachment-list

The two volumes should show up as virtual disks under /dev in the VM itself. We can verify this by logging into the VM directly using the private key I created earlier as part of the key-pair assigned to this series of instances.

# ssh -i ./fedora-kp.pem fedora@10.68.56.29
Last login: Thu Apr 7 21:28:21 2016 from 10.68.64.172
[fedora@fedvm-10 ~]$ 

[fedora@fedvm-10 ~]$ sudo fdisk -l
Disk /dev/sda: 3 GiB, 3221225472 bytes, 6291456 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x6e3892a8

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 6291455 6289408 3G 83 Linux


Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdc: 50 GiB, 53687091200 bytes, 104857600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

So from here, we can format the newly assigned disks and mount them as needed.

That’s it for this post, hopefully this series of posts has gone a little way to clarify how a Nutanix cluster can be used to scale out an Openstack deployment to form a highly available on-premise cloud. The deployment of which is radically simplified by using Nutanix as the Compute, Volume, Image and Network backend.

In future posts I intend to look at deploying an upstream Openstack controller, have a play around with snapshots within Openstack and their use as images. Also, some additional troubleshooting perhaps. Let me know what you find useful.

Openstack + Nutanix: Glance Image Service

This post will cover the retrieval of base or cloud OS images via the Openstack Glance image service and how the Acropolis driver interacts with Glance and maintains the image data on the Nutanix Distributed Storage Fabric (DSF).

From the Openstack documentation:

  • The Glance image service includes discovering, registering and retrieving virtual machine images
  • has a RESTful API  – allows querying of image metadata and actual image retrieval
  • It has the ability to copy (or snapshot) a server image and then to store it promptly. Stored images then can be used as templates to get new servers up and running quickly, and can also be used to store and catalog unlimited backups.

The Acropolis driver interacts with the Glance service by redirecting an image from the Openstack controller to the Acropolis DSF. Aside from any image metadata (ie: image configuration details) being stored in Glance, the image itself is actually stored on the Nutanix cluster. We do not store any images in the OVM, either in the Glance store or anywhere else within the Openstack Controller.

Images are managed in Openstack via System > Images – see screenshot below for example list of available images in an Openstack environment

glance-images

Images in Openstack are mostly downloaded via HTTP URL. Though file upload does work.  The image creation workflow in the screenshot below shows a Fedora 23 Cloud image in QCOW2 format being retrieved. I have left the respective “Minimum Disk” (size) and “Minimum RAM” fields blank – so that no minimum is set for either.

image-create

You can confirm the images are loaded into the Nutanix Cluster backend by viewing the Image Configuration menu in Prism. The images in Prism are stored on a specific container in my case.

prism-images

Similarly, Prism will report the progress of the Image upload to the cluster through the event and progress monitoring facility on the main menu bar.

prism-images-tasks

If all you really need is a quick demo perhaps, then Openstack suggests the following OS image for test purposes. Use this simply to test and demonstrate the basic glance functionality via the command line. Works exactly the same if done via Horizon GUI, however.

[root@nx-ovm ~]# source keystonerc_admin
[root@nx-ovm ~(keystone_admin)]# glance image-create --name cirros-0.3.2-x86_64 \
--is-public true --container-format bare --disk-format qcow2 \
--copy-from http://download.cirros-cloud.net/0.3.2/cirros-0.3.2-x86_64-disk.img

+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | None                                 |
| container_format | bare                                 |
| created_at       | 2016-04-05T10:31:53.000000           |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | qcow2                                |
| id               | f51ab65b-b7a5-4da1-92d9-8f0042af8762 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | cirros-0.3.2-x86_64                  | 
| owner            | 529638a186034e5daa11dd831cd1c863     |
| protected        | False                                |
| size             | 0                                    |
| status           | queued                               |
| updated_at       | 2016-04-05T10:31:53.000000           |
| virtual_size     | None                                 |
+------------------+--------------------------------------+

This is then reflected in the glance image list

[root@nx-ovm ~(keystone_admin)]# glance image-list
+--------------------------------------+----------------------------+-------------+------------------+------------+--------+
| ID                                   | Name                       | Disk Format | Container Format | Size       | Status |
+--------------------------------------+----------------------------+-------------+------------------+------------+--------+
| 44b4c9ab-b436-4b0c-ac8d-97acbabbbe60 | CentOS 7 x86_84            | qcow2       | bare             | 8589934592 | active |
| 033f24a3-b709-460a-ab01-f54e87e0e25b | cirros-0.3.2-x86_64        | qcow2       | bare             | 41126400   | active |
| f9b455b2-6fba-46d2-84d4-bb5cfceacdc7 | Fedora 23 Cloud            | qcow2       | bare             | 234363392  | active |
| 13992521-f555-4e6b-852b-20c385648947 | Ubuntu 14.04 - Cloud Image | qcow2       | bare             | 2361393152 | active |
+--------------------------------------+----------------------------+-------------+------------------+------------+--------+

One other thing to be aware of is that all network, image, instance and volume manipulation should only be done via the Openstack dashboard. All the Openstack elements created this way can not subsequently be changed or edited with the Acropolis Prism GUI. Both management interfaces are independent of one another. In fact the Openstack Services VM (OVM) was intentionally designed this way to be completely stateless. Though obviously this could change in future product iterations, if it was deemed to be a better solution going forward.

I have included the Openstack docs URL with additional image locations for anyone wanting to pull images of their own to work with. This is an excellent reference location for potential cloud instance images for both Linux distros and Windows:

http://docs.openstack.org/image-guide/

Next up, we will have a look at using the Acropolis Cinder plugin for Block Storage and the Nova Compute service integration.

Openstack + Nutanix : Neutron Networking

In my last post we covered the all-in-one installation of the OpenStack controller with the Nutanix shipped Acropolis Openstack drivers. The install created a single virtual machine, the Openstack Services VM (OVM). In this post I intend to talk about setting up a Network Topology using the Openstack dashboard and the Neutron service integration with Nutanix. I will be able to show how this gets reflected in the Acropolis Prism GUI. First, let’s create a public network for our VMs to reside on. Navigate via the Horizon dashboard to Admin > Networks

Navigate to System > Networks on the Horizon dashboard

1. Navigate to System > Networks on the Horizon dashboard and select +Create Network

Create Network

Currently, only local and VLAN “Provider Network Types” are supported by the Nutanix Openstack drivers. In the screenshot below, I am creating a segmented network (ID 64), named public-network, in the default admin tenant. I specify the network as shared and external.

Pro Tip


Do not use a VLAN/network assignment that has already been defined within the Nutanix cluster. Any network/subnet assignment should be done within Openstack using network parameters reserved specifically for your Openstack deployments.


 

create-network

2. Add your network details – Name, Project (tenant), Network Type, VLAN ID, etc

Create Subnet

Each network needs to have a subnet created with an associated DHCP pool. This DHCP pool information gets sent via the appropriate API call to Acropolis. Acropolis management associates the IP address with the vnic on the Acropolis VM. The Acropolis Openstack driver reads this configuration and, when the cloud instance gets powered on in Openstack, it will register the IP address with the Openstack VM. See setup screenshots below…

Pro Tip

When creating a subnet, you must specify a DNS server.


 

create-subnet

subnet-details

3. Subnet creation – requires subnet name,  network address (CIDR notation), gateway address, DHCP pool range, DNS servers

We can see the newly created network reflected in Acropolis via the Prism GUI in the screenshot below:

prism-network

 

Pro Tip

Once a network has been configured and you decide to add an additional cluster. That network will not be extended across the new cluster. You have a choice, you can either add a new network. Or, you can remove the network and re-add it so that it gets created across all the currently configured clusters.


network-topology

Now that we have a network configured we can look at setting up cloud instances to run on it. To do that we need to set up the Glance Image Service and that’s the subject of my next post.

Additional reading

http://www.ibm.com/developerworks/cloud/library/cl-openstack-neutron/

http://docs.openstack.org/kilo/install-guide/install/apt/content/neutron-concepts.html

Configuring Docker Storage on Nutanix

I have recently been looking at how best to deploy a container ecosystem on my Nutanix XCP environment. At present I can run a number of containers in a virtual machine (VM). This will give me the required networking, persistent storage and the ability to migrate between hosts. Something that containers are only just becoming capable of in many cases. So that I can scale out my container deployment within my Docker host VM, I am going to have to consider increasing  the available space within /var/lib/docker. By default, if you provide no additional storage/disk for your Docker install, loopback files get created to store your containers/images. This configuration is not particularly performant, so it’s not supported for production.  You can see below how the default setup looks…

# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.821 GB
 Data Space Total: 107.4 GB
 Data Space Available: 50.2 GB
 Metadata Space Used: 1.479 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-client
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

Configuring Docker Storage Options

Looking at the various methods we can use to provide dedicated block storage for Docker containers. With the Devicemapper storage driver Docker automatically creates a base thin device, this is two block devices, one for data and one for metadata. The thin device is automatically formatted with an empty filesystem on creation. This device is the base of all docker images and containers. All base images are snapshots of this device and those images are then in turn used as snapshots for other images and eventually containers. This is the Docker supported production setup. Also, by using LVM based devices as the underlying storage you are then accessing them as raw devices and no longer go through the VFS layer.

Devicemapper : direct-lvm

To begin, create two LVM devices, one for container data and another to hold metadata. By default the loopback method creates a storage pool with 100GB of space. In this example I am creating a 200G LVM volume for data and a 5G metadata volume. I prefer separate volumes where possible for performance reasons. We start by hot-adding the required Nutanix vDisks to the virtual machine (docker-directlvm) guest OS (Fedora22)

<acropolis> vm.disk_create docker-directlvm create_size=200g container=DEFAULT-CTR
DiskCreate: complete
<acropolis> vm.disk_create docker-directlvm create_size=10g container=DEFAULT-CTR
DiskCreate: complete

[root@docker-directlvm ~]# lsscsi 
 
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdb 
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdc 

The next step is to create the individual LVM volumes

# pvcreate /dev/sdb /dev/sdc
# vgcreate direct-lvm /dev/sdb /dev/sdc

# lvcreate --wipesignatures y -n data direct-lvm -l 95%VG
# lvcreate --wipesignatures y -n metadata direct-lvm -l 5%VG

If setting up a new metadata pool need to zero the first 4k to indicate empty metadata:
# dd if=/dev/zero of=/dev/direct-lvm/metadata bs=1M count=1

For sizing the metadata volume above, the rule of thumb seems to be 0.1% of the data volume. This is somewhat anecdotal, so size with a little headroom perhaps? Next, start the Docker daemon using the required options in the file /etc/sysconfig/docker-storage.

DOCKER_STORAGE_OPTIONS="--storage-opt dm.datadev=/dev/direct-lvm/data --storage-opt \
dm.metadatadev=/dev/direct-lvm/metadata --storage-opt dm.fs=xfs"

You can then verify that the requested underlying storage is in use, with the docker info command

# docker info
Containers: 5
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/direct-lvm/data
 Metadata file: /dev/direct-lvm/metadata
 Data Space Used: 10.8 GB
 Data Space Total: 199.5 GB
 Data Space Available: 188.7 GB
 Metadata Space Used: 7.078 MB
 Metadata Space Total: 10.5 GB
 Metadata Space Available: 10.49 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-directlvm
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

All well and good so far, but the storage options to expose data and metadata locations namely dm.datadev and dm.metadatadev have been deprecated in favour of a preferred model, which is to have a thin pool reserved outside of Docker and passed to the daemon via the dm.thinpooldev storage option.  There’s a helper script in some Linux distros called  /etc/sysconfig/docker-storage-setup. This does all the heavy lifting, you just need to supply a device  and, or a volume group name.

Devicemapper: Thinpool

Once again start by creating the virtual device – a Nutanix vDisk –  and adding it to the virtual machine guest OS

<acropolis> vm.disk_create docker-thinp create_size=200g container=DEFAULT-CTR
DiskCreate: complete

[root@localhost sysconfig]# lsscsi
[0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0
[2:0:0:0] disk NUTANIX VDISK 0 /dev/sda
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdd

Edit the file /etc/sysconfig/docker-storage-setup as follows:

root@localhost sysconfig]# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
DEVS=/dev/sdd
VG=docker

Then run the storage helper script:

[root@docker-thinp ~]# pvcreate /dev/sdd
 Physical volume "/dev/sdd" successfully created

[root@docker-thinp ~]# vgcreate docker /dev/sdd 
 Volume group "docker" successfully created 

[root@docker-thinp ~]# docker-storage-setup 
Rounding up size to full physical extent 192.00 MiB 
Logical volume "docker-poolmeta" created. 
Wiping xfs signature on /dev/docker/docker-pool. 
Logical volume "docker-pool" created. 
WARNING: Converting logical volume docker/docker-pool and docker/docker-poolmeta to pool's data and metadata volumes. 
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) 
Converted docker/docker-pool to thin pool. 
Logical volume "docker-pool" changed.

You can then verify the underlying storage being used in the usual way

[root@docker-thinp ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sdd 8:48 0 186.3G 0 disk
├─docker-docker--pool_tmeta 253:5 0 192M 0 lvm
│ └─docker-docker--pool 253:7 0 74.4G 0 lvm
└─docker-docker--pool_tdata 253:6 0 74.4G 0 lvm
 └─docker-docker--pool 253:7 0 74.4G 0 lvm


[root@docker-thinp ~]# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 62.39 MB
 Data Space Total: 79.92 GB
 Data Space Available: 79.86 GB
 Metadata Space Used: 90.11 kB
 Metadata Space Total: 201.3 MB
 Metadata Space Available: 201.2 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-thinp
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

On completion this will have created the correct entries in /etc/sysconfig/docker-storage

[root@docker-thinp ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs \
 --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

and runtime looks like...

[root@docker-thinp ~]# ps -ef | grep docker
root 8988 1 0 16:13 ? 00:00:11 /usr/bin/docker daemon --selinux-enabled
--storage-driver devicemapper --storage-opt dm.fs=xfs 
--storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

Bear in mind that when you are changing the underlying docker storage driver or storage options similar to the examples described above.  Then, typically, the following destructive command sequence is run (be sure to backup any important data before running the following ….

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker 

post changes 

$ systemctl daemon-reload
$ systemctl start docker

Additional Info

http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/

https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/

http://docs.docker.com/engine/reference/commandline/daemon/#storage-driver-options

ELK on Nutanix : Kibana

It might seem like I am doing things out of sequence by looking at the visualisation layer of the ELK stack next. However, recall in my original post , that I wanted to build sets  of unreplicated indexes and then use Logstash to fire test workloads at them. Hence, I am covering Elasticsearch and Kibana initially. This brings me to another technical point that I need to cover. In order for a single set of indexes to be actually recoverable, when running on a single node, we need to invoke the following parameters in our Elasticsearch playbook :

So in file: roles/elastic/vars/main.yml
...
elasticsearch_gateway.recover_after_nodes: 1
elasticsearch_gateway.recover_after_time: 5m
elasticsearch_gateway.expected_nodes: 1
...

These are then set in the elasticsearch.yml.j2 file as follows:

# file: roles/elastic/templates/elasticsearch.yml.j2
#{{ ansible_managed }}

...

# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 2
{% if elasticsearch_gateway_recover_after_nodes is defined %}
gateway.recover_after_nodes : {{ elasticsearch_gateway_recover_after_nodes}}
{% endif %}

and so on ....

This allows the indexes to be recovered when there is only a single node in the cluster. See below for the state of my indexes after a reboot:

[root@elkhost01 elasticsearch]# curl -XGET http://localhost:9200/_cluster/health?pretty
{
 "cluster_name" : "nx-elastic",
 "status" : "yellow",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 4,
 "active_shards" : 4,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 4,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

Lets now look at the Kibana playbook I am attempting. Unfortunately, Kibana is distributed as a compressed tar archive. This means that the yum or dnf modules are no help here. There is however a very useful unarchive module, but first we need to download the tar bundle using get_url as follows :

- name: download kibana tar file
 get_url: url=https://download.elasticsearch.org/kibana/kibana/kibana-{{ kibana_version }}-linux-x64.tar.gz
 dest=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz mode=755
 tags: kibana

I initially tried unarchiving the Kibana bundle into /tmp. I then intended to copy everything below the version specific directory (/tmp/kibana-4.0.1-linux-x64) into the Ansible created /opt/kibana directory. This proved problematic as neither the synchronize nor the copy modules seemed setup to do mass copy/transfer between one directory structure to another. Maybe I am just not getting it – I even tried using with_item loops but no joy as fileglobs are not recursive. Answers on a postcard are always appreciated? In the end I just did this :

- name: create kibana directory
 become: true
 file: owner=kibana group=kibana path=/opt/kibana state=directory
 tags: kibana

- name: extract kibana tar file
 become: true
 unarchive: src=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz dest=/opt/kibana copy=no
 tags: kibana

The next thing to do was to create a systemd service unit. There isn’t one for Kibana as there is no rpm package available. Usual templating applies here :

- name: install kibana as systemd service
 become: true
 template: src=kibana4.service.j2 dest=/etc/systemd/system/kibana4.service owner=root \
           group=root mode=0644
 notify:
 - restart kibana
 tags: kibana

And the service unit file looked like:

[ansible@ansible-host01 templates]$ cat kibana4.service.j2
{{ ansible_managed }}

[Service]
ExecStart=/opt/kibana/kibana-{{ kibana_version }}-linux-x64/bin/kibana
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=kibana4
User=root
Group=root
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

This all seemed to work as I could now access Kibana via my browser. No indexes yet of course :

kibana_initial_install

There are one or two plays I would like still like to document. Firstly, the ‘notify’ actions in some of the plays. These are used to call – in my case – the restart handlers. Which in turn causes the service in question to be restarted – see the next section :

# file: roles/kibana/handlers

- name: restart kibana
 become: true
 service: name=kibana state=restarted

I wanted to document this next feature simply because it’s so useful – tags. I have assigned a tag to every play/task in the playbook so far you will have noticed. For testing purposes they allow you to run specific plays. You can then troubleshoot just that particular play and see what’s going on.

 ansible-playbook -i ./production site.yml --tags "kibana" --ask-sudo-pass

Now that I have the basic plays to get my Elasticsearch and Kibana services up and running via Ansible, it’s time to start looking at Logstash. Next time I post on ELK type stuff, I will try to look at logging and search use cases. Once I crack how they work of course.

Sharded MongoDB config in Nutanix (3) : Backup & DR

Backing up sharded NoSQL databases can often require some additional consideration.  For example, any backup of a sharded MongoDB config needs to capture a backup for each shard and a single member of the configuration database quorum. The configuration database (configdb) holds the cluster metadata and so supports the ability to shard.  In a production environment you will need three config databases and they will all contain the same (meta)data. In this post I intend to cover the steps I recently used to backup a sharded MongoDB deployment using the snapshot technology available on my Nutanix platform.

First step prior to any backup should always be to stop the balancer. The balancer is responsible for migrating/balancing data “chunks” between the various shards. If such a migration is running while backing up then the resultant backup is potentially invalidated.

mongos> use config
switched to db config
mongos> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
mongos>

At which point we can proceed to lock one of the secondary replicas in each shard. I outlined how to do this in my post relating to backing up replica sets. The command sequence is repeated below, note that this needs to be done on one secondary for each shard (and should only be done if running MMAPv1 storage engine on the replica):

rs01:SECONDARY> db.fsyncLock()
{
 "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
 "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
 "ok" : 1

Having locked the secondaries for writes, the next step is to create a virtual machine (VM) snapshot of a configdb and of a secondary belonging to each shard (replica set). Using the Nutanix Acropolis App Mobility Fabric as follows :

<acropolis> vm.snapshot_create mongo-configdb01,mongodb03,mongowt03 snapshot_name_list=mongoconfigdb01-bk,mongodb02-bk,mongowt03-bk
SnapshotCreate: complete

The above snapshots have all been created at once within a single consistency group. The next step will be to create clones from them…

<acropolis> vm.clone configdb01-clone clone_from_snapshot=mongoconfigdb01-bk
configdb01-clone: complete
<acropolis> vm.clone mongodb03-clone clone_from_snapshot=mongodb03-bk
mongodb03-clone: complete
<acropolis> vm.clone mongowt03-clone clone_from_snapshot=mongowt03-bk
mongowt03-clone: complete

At this point we can unlock each of the secondaries :

rs01:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
rs01:SECONDARY>

and re-enable the balancer:

mongos> use config
switched to db config
mongos> sh.setBalancerState(true)
mongos>

As of now I merely have the “bare bones” of a MongoDB cluster encapsulated in the three VM clones just created. The thing to bear in mind is that each clone generated from the replica snapshots contains only a subset of any sharded collection. Hopefully, ~50% each, if our shard key selection is any good! That means we can’t just proceed as in previous posts and bring up each clone as a standalone MongoDB instance. The simplest way to make use of the current clones might be to just rsync any data to new hosts in a freshly sharded deployment. So essentially, we would just transfer the data to the required volumes on the newly set up VMs. In any case, there would still be some work to do around the replica set memberships and associated config.

Alternatively, to have access to any sharded collection held in my newly created clones above. I could begin by reconfiguring each replica clone as the new primary in the replica set and create additional configdb VMs that can be registered with a new mongos VM. Recall that mongos is stateless, and gets its info from the configdbs. At which stage we can re-register the replica shards within the configdb service. For example, here’s the state of the replica sets after they have been cloned:

> rs.status()
{
 "state" : 10,
 "stateStr" : "REMOVED",
 "uptime" : 97,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "ok" : 0,
 "errmsg" : "Our replica set config is invalid or we are not a member of it",
 "code" : 93
}
> rs.conf()
{
 "_id" : "rs01",
 "version" : 7,
 "members" : [
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "10.68.64.131:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "10.68.64.144:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatTimeoutSecs" : 10,
 "getLastErrorModes" : {

 },
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 }
 }
}

So first off we need to set each cloned replica VM as the new replica set primary and remove the no longer required (or available) hosts from the set membership :

> cfg=rs.conf()
> printjson(cfg) 
> cfg.members = [cfg.members[0]]
[
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
 },
 "slaveDelay" : 0,
 "votes" : 1
 }
]
 
> cfg.members[0].host="10.68.64.153:27017"
10.68.64.153:27017

> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }

rs01:PRIMARY> rs.status()
{
 "set" : "rs01",
 "date" : ISODate("2015-10-06T14:02:23.263Z"),
 "myState" : 1,
 "members" : [
 {
 "_id" : 0,
 "name" : "10.68.64.152:27017",
 "health" : 1,
 "state" : 1,
 "stateStr" : "PRIMARY",
 "uptime" : 396,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "electionTime" : Timestamp(1444140137, 1),
 "electionDate" : ISODate("2015-10-06T14:02:17Z"),
 "configVersion" : 97194,
 "self" : true
 }
 ],
 "ok" : 1
}

Once you have done this for all the required replica sets (these are your shards dont forget), the next step is to set up the configdb clone and create additional identical VMs that will contain the cluster metadata. The configdbs can be verified for correctness as follows :

configsvr> db.runCommand("dbhash")
{
 "numCollections" : 14,
 "host" : "localhost.localdomain:27019",
 "collections" : {
 "actionlog" : "bd8d8c2425e669fbc55114af1fa4df97",
 "changelog" : "fcb8ee4ce763a620ac93c5e6b7562eda",
 "chunks" : "bd7a2c0f62805fa176c6668f12999277",
 "collections" : "f8b0074495fc68b64c385bf444e4cc90",
 "databases" : "c9ee555dde6fc84a7bbdb64b74ef19bd",
 "lockpings" : "ba67ca64d12fd36f8b35a54e167649a8",
 "locks" : "c226b1a2601cf3e61ba45aeab146663d",
 "mongos" : "690326c2edcb410eeeb9212ad7c6c269",
 "settings" : "ce32ef7c2b99ca137c5a20ea477062f7",
 "shards" : "77d49755ba04fe38639c5c18ee5be78d",
 "tags" : "d41d8cd98f00b204e9800998ecf8427e",
 "version" : "14e1d35ba0d32a5ff393ddc7f16125a1"
 },
 "md5" : "61bde8ac240aead03080f4dde3ec2932",
 "timeMillis" : 43,
 "fromCache" : [ ],
 "ok" : 1
}

The above hashes in bold need to agree across the configdb membership. They are key to having all configdb servers in agreement. Once you have the configdbs enabled, then register them with a newly created mongos VM. Below, I am just using a single configdb to test for correctness. A production setup should always have three per cluster:

 mongos --configdb 10.68.64.151:27019

The next issue will be to correct the configdb shard info.  So as you can see from the mongos session below, the replica info in the configdb is still referring to the previous deployment:

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "rs01/10.68.64.111:27017,10.68.64.131:27017,10.68.64.144:27017"
 },
 {
 "_id" : "rs02",
 "host" : "rs02/10.68.64.110:27017,10.68.64.114:27017,10.68.64.137:27017"
 }
 ],
 "ok" : 1
}

We can correct the above setup to reflect our newly cloned shard/replica VMs. In a mongo shell session on the configdb server VM.  :

use config
configsvr> db.shards.update({_id: "rs01"} , {$set: {"host" : "10.68.64.152:27017"}})
configsvr> db.shards.update({_id: "rs02"} , {$set: {"host" : "10.68.64.153:27017"}})

You will have to restart the mongos server so that it picks up the new info from the configdb server.

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "10.68.64.152:27017"
 },
 {
 "_id" : "rs02",
 "host" : "10.68.64.153:27017"
 }
 ],
 "ok" : 1

And that, as they say, is how babies get made. At this stage you have a MongoDB cluster consisting of a configdb, registered with a mongos server, that can access both shards, formed of a replica set, formed of a single primary member. To flesh this out to production standards you could increase the configdb count (to 3) and add secondaries to the replica sets for higher availability. With some additional work perhaps (ie : renaming replica sets ?) this could form the basis of a Dev/QA system, containing a potential production workload.