Tag Archives: acli

Configuring Docker Storage on Nutanix

I have recently been looking at how best to deploy a container ecosystem on my Nutanix XCP environment. At present I can run a number of containers in a virtual machine (VM). This will give me the required networking, persistent storage and the ability to migrate between hosts. Something that containers are only just becoming capable of in many cases. So that I can scale out my container deployment within my Docker host VM, I am going to have to consider increasing  the available space within /var/lib/docker. By default, if you provide no additional storage/disk for your Docker install, loopback files get created to store your containers/images. This configuration is not particularly performant, so it’s not supported for production.  You can see below how the default setup looks…

# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 1.821 GB
 Data Space Total: 107.4 GB
 Data Space Available: 50.2 GB
 Metadata Space Used: 1.479 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-client
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

Configuring Docker Storage Options

Looking at the various methods we can use to provide dedicated block storage for Docker containers. With the Devicemapper storage driver Docker automatically creates a base thin device, this is two block devices, one for data and one for metadata. The thin device is automatically formatted with an empty filesystem on creation. This device is the base of all docker images and containers. All base images are snapshots of this device and those images are then in turn used as snapshots for other images and eventually containers. This is the Docker supported production setup. Also, by using LVM based devices as the underlying storage you are then accessing them as raw devices and no longer go through the VFS layer.

Devicemapper : direct-lvm

To begin, create two LVM devices, one for container data and another to hold metadata. By default the loopback method creates a storage pool with 100GB of space. In this example I am creating a 200G LVM volume for data and a 5G metadata volume. I prefer separate volumes where possible for performance reasons. We start by hot-adding the required Nutanix vDisks to the virtual machine (docker-directlvm) guest OS (Fedora22)

<acropolis> vm.disk_create docker-directlvm create_size=200g container=DEFAULT-CTR
DiskCreate: complete
<acropolis> vm.disk_create docker-directlvm create_size=10g container=DEFAULT-CTR
DiskCreate: complete

[root@docker-directlvm ~]# lsscsi 
 
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdb 
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdc 

The next step is to create the individual LVM volumes

# pvcreate /dev/sdb /dev/sdc
# vgcreate direct-lvm /dev/sdb /dev/sdc

# lvcreate --wipesignatures y -n data direct-lvm -l 95%VG
# lvcreate --wipesignatures y -n metadata direct-lvm -l 5%VG

If setting up a new metadata pool need to zero the first 4k to indicate empty metadata:
# dd if=/dev/zero of=/dev/direct-lvm/metadata bs=1M count=1

For sizing the metadata volume above, the rule of thumb seems to be 0.1% of the data volume. This is somewhat anecdotal, so size with a little headroom perhaps? Next, start the Docker daemon using the required options in the file /etc/sysconfig/docker-storage.

DOCKER_STORAGE_OPTIONS="--storage-opt dm.datadev=/dev/direct-lvm/data --storage-opt \
dm.metadatadev=/dev/direct-lvm/metadata --storage-opt dm.fs=xfs"

You can then verify that the requested underlying storage is in use, with the docker info command

# docker info
Containers: 5
Images: 2
Storage Driver: devicemapper
 Pool Name: docker-253:1-33883287-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/direct-lvm/data
 Metadata file: /dev/direct-lvm/metadata
 Data Space Used: 10.8 GB
 Data Space Total: 199.5 GB
 Data Space Available: 188.7 GB
 Metadata Space Used: 7.078 MB
 Metadata Space Total: 10.5 GB
 Metadata Space Available: 10.49 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-directlvm
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

All well and good so far, but the storage options to expose data and metadata locations namely dm.datadev and dm.metadatadev have been deprecated in favour of a preferred model, which is to have a thin pool reserved outside of Docker and passed to the daemon via the dm.thinpooldev storage option.  There’s a helper script in some Linux distros called  /etc/sysconfig/docker-storage-setup. This does all the heavy lifting, you just need to supply a device  and, or a volume group name.

Devicemapper: Thinpool

Once again start by creating the virtual device – a Nutanix vDisk –  and adding it to the virtual machine guest OS

<acropolis> vm.disk_create docker-thinp create_size=200g container=DEFAULT-CTR
DiskCreate: complete

[root@localhost sysconfig]# lsscsi
[0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0
[2:0:0:0] disk NUTANIX VDISK 0 /dev/sda
[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdd

Edit the file /etc/sysconfig/docker-storage-setup as follows:

root@localhost sysconfig]# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
DEVS=/dev/sdd
VG=docker

Then run the storage helper script:

[root@docker-thinp ~]# pvcreate /dev/sdd
 Physical volume "/dev/sdd" successfully created

[root@docker-thinp ~]# vgcreate docker /dev/sdd 
 Volume group "docker" successfully created 

[root@docker-thinp ~]# docker-storage-setup 
Rounding up size to full physical extent 192.00 MiB 
Logical volume "docker-poolmeta" created. 
Wiping xfs signature on /dev/docker/docker-pool. 
Logical volume "docker-pool" created. 
WARNING: Converting logical volume docker/docker-pool and docker/docker-poolmeta to pool's data and metadata volumes. 
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) 
Converted docker/docker-pool to thin pool. 
Logical volume "docker-pool" changed.

You can then verify the underlying storage being used in the usual way

[root@docker-thinp ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sdd 8:48 0 186.3G 0 disk
├─docker-docker--pool_tmeta 253:5 0 192M 0 lvm
│ └─docker-docker--pool 253:7 0 74.4G 0 lvm
└─docker-docker--pool_tdata 253:6 0 74.4G 0 lvm
 └─docker-docker--pool 253:7 0 74.4G 0 lvm


[root@docker-thinp ~]# docker info
Containers: 0
Images: 0
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 62.39 MB
 Data Space Total: 79.92 GB
 Data Space Available: 79.86 GB
 Metadata Space Used: 90.11 kB
 Metadata Space Total: 201.3 MB
 Metadata Space Available: 201.2 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Library Version: 1.02.93 (2015-01-30)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.5-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.5 MiB
Name: docker-thinp
ID: VHCA:JO3X:IRF5:44RG:CFZ6:WETN:YBJ2:6IL5:BNDT:FK32:KH6E:UZED

On completion this will have created the correct entries in /etc/sysconfig/docker-storage

[root@docker-thinp ~]# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS=--storage-driver devicemapper --storage-opt dm.fs=xfs \
 --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

and runtime looks like...

[root@docker-thinp ~]# ps -ef | grep docker
root 8988 1 0 16:13 ? 00:00:11 /usr/bin/docker daemon --selinux-enabled
--storage-driver devicemapper --storage-opt dm.fs=xfs 
--storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool

Bear in mind that when you are changing the underlying docker storage driver or storage options similar to the examples described above.  Then, typically, the following destructive command sequence is run (be sure to backup any important data before running the following ….

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker 

post changes 

$ systemctl daemon-reload
$ systemctl start docker

Additional Info

http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/

https://jpetazzo.github.io/2014/01/29/docker-device-mapper-resize/

http://docs.docker.com/engine/reference/commandline/daemon/#storage-driver-options

Sharded MongoDB config on Nutanix (2) : High Availability

One of the prime availability considerations for any horizontal scale out application, like a MongoDB cluster, is how that cluster behaves under a failure event. We have seen (in the MongoDB case) how replica sets are configured with additional secondary instances to handle the failure of a primary instance in a replica set. We also create a mini quorum of configuration database servers and query routers to give redundancy to the cluster “infrastructure”. However, the Nutanix XCP environment provides further protection through certain features of the Acropolis management interface. Your key VMs need to be enabled to run under high availability. This is so that when the underlying hypervisor host fails for any reason, these VM’s failover to another host with sufficient CPU and RAM resources. The screenshot below shows how this (Tech Preview) feature can be enabled (pre NOS 4.5) on a per-VM basis :

enable-HA

The underlying migration functionality is also used for the manual placement of key VMs. As an example, let’s consider the following layout, where two of the configdb VMs in a MongoDB cluster are co-located on the same AHV host:

mongodb-colocated-vms

Notice in the screen capture above, there are two configdb VMs on host “D”. This means that ideally we want to migrate a MongoDB Config DB to another AHV host. Let’s move the VM mongo-configdb02 to AHV host “C”…

mongodb-migrate-VM

Note that the migration process could have automatically chosen an appropriate AHV host to receive the VM. In the above case however, we have instead specified the desired host ourselves.

We can monitor the progress and duration of any migration via the VM tasks frame in Prism:

mongodb-vm-tasks-migration

As always, this workflow can also be done manually (or scripted) through the acli interface. In this example I am migrating the VM running a query router (mongos process)….

<acropolis> vm.migrate mongos01 host=10.68.64.41 
mongos01: complete 
<acropolis>

As of the time of writing this post. Acropolis Base Software (NOS) 4.5 has been released and this feature has become part of general availability (GA). It can now be enabled cluster wide:

ha-enable-menu-4

enable-ha-4

Nutanix customers are strongly recommended to enable this feature when they require HA functionality for their VMs.

In my next post I will be completing this short blog series on sharded MongoDB configs on Nutanix. I intend to cover how Nutanix Acropolis managed snapshots and cloning are employed to create backups and then use them to perform rapid build out of potential dev/QA type environments. Stay tuned.