Monthly Archives: June 2015

Installing MongoDB on Nutanix XCP

As part of the recent MongoDB certification of Nutanix XCP as an Infrastructure as a Service  (IaaS) platform,  I thought I might collate some of the info I have collected while working to get the certification process completed. There’s a lot of great docs over at www.mongodb.com but I want to condense everything into a series of posts. This first post will deal with the initial install of a standalone MongoDB instance.

We saw in my previous post here how to create a Linux VM and add networking and vDisks. In this instance I have added 6 x 200GB vDisks for a data volume, and an additional 2 vDisks – one for the journal volume (50GB) and one volume to hold the log file (100GB). Here’s the output from /usr/bin/lsscsi showing the disks and their SCSI assignments :

[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdj
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdk
[2:0:7:0] disk NUTANIX VDISK 0 /dev/sdb
[2:0:8:0] disk NUTANIX VDISK 0 /dev/sdc
[2:0:9:0] disk NUTANIX VDISK 0 /dev/sdd
[2:0:10:0] disk NUTANIX VDISK 0 /dev/sde
[2:0:11:0] disk NUTANIX VDISK 0 /dev/sdf
[2:0:12:0] disk NUTANIX VDISK 0 /dev/sdg
[2:0:13:0] disk NUTANIX VDISK 0 /dev/sdh
[2:0:14:0] disk NUTANIX VDISK 0 /dev/sdi

Create a user/group mongod that will own the MongoDB software :

# groupadd mongod 
# useradd mongod

To install the MongoDB Enterprise packages, create a new repo with the required information and then install as MongoDB user using yum :

# pwd
/etc/yum.repos.d
# cat mongodb-enterprise.repo
[mongodb-enterprise]
name=MongoDB Enterprise Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/stable/$basearch/
gpgcheck=0
enabled=1
$ sudo yum install -y mongodb-enterprise

We use LVM to create a 6 column striped data volume. All Nutanix vDIsks are redundant (RF=2) so to create a RAID10 data volume just stripe the vDisks, and then create 2 further linear volumes. First create the underlying physical volumes :

# lsscsi | awk '{print $6}' | grep /dev/sd | grep -v sda | xargs pvcreate
 Physical volume "/dev/sdb" successfully created
 Physical volume "/dev/sdc" successfully created
 Physical volume "/dev/sdd" successfully created
 Physical volume "/dev/sde" successfully created
 Physical volume "/dev/sdf" successfully created
 Physical volume "/dev/sdg" successfully created
 Physical volume "/dev/sdh" successfully created
 Physical volume "/dev/sdi" successfully created
 Physical volume "/dev/sdj" successfully created

Then create both the volume groups and the required volumes

vgcreate mongodata /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg 
vgcreate mongojournal /dev/sdh 
vgcreate mongolog /dev/sdi
# lvcreate -i 6 -l 100%VG -n mongodata mongodata
# lvcreate -l 100%VG -n mongojournal mongojournal 
# lvcreate -l 100%VG -n mongolog mongolog

Create an XFS filesystem on each volume:

mkfs.xfs /dev/mapper/mongodata-mongodata
mkfs.xfs /dev/mapper/mongojournal-mongojournal
mkfs.xfs /dev/mapper/mongolog-mongolog

Create the required mountpoints:

mkdir -p /mongodb/data mongodb/journal /mongodb/log

Mount the filesystems – setting noatime option on the data volume

/dev/mapper/mongodata-mongodata /mongodb/data xfs defaults,auto,noatime,noexec 0 0
/dev/mapper/mongojournal-mongojournal /mongodb/journal xfs defaults,auto,noexec 0 0
/dev/mapper/mongolog-mongolog /mongodb/log xfs defaults,auto,noexec 0 0

Set up a  soft link to re-direct the journal I/O to a separate volume:

# ln -s /mongodb/journal /mongodb/data/journal
...
lrwxrwxrwx. 1 root root 21 Nov 21 14:13 journal -> /mongodb/journal
...

At this point set the filesystem ownership to the MongoDB user:

# chown -R mongod:mongod /mongodb/data mongodb/journal mongodb/log

Prior to starting MongoDB there are a few well known best practices that need to be adhered to. Firstly, we reduce the read ahead on the data volume in order to avoid filling RAM with unwanted pages of data. MongoDB documents are quite small and a large readahead figure will fill RAM with additional pages of data that will have to then be evicted to make room for other required pages. Filling virtual memory with this superfluous data can have an adverse effect on performance. Usual recommendation is to start with a setting of 16K (32 * 512M sectors) and then adjust upwards from there.

rwxrwxrwx. 1 root root 7 Feb 4 11:50 /dev/mapper/mongodata-mongodata -> ../dm-3 

# blockdev --setra 32 /dev/dm-3
# blockdev --getra /dev/dm-3
32

MongoDB recommends that you disable transparent huge pages, edit your startup files as follows :

 #disable THP at boot time
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
 fi
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
 fi

Set swappiness = 1: MongoDB is a memory-based database; if the nodes are sized correctly, then we won’t need to swap. However, setting swappiness=0 could cause unexpected invocations of the OOM (Out of Memory) killer in certain Linux distros.

$ sudo sysctl vm.swappiness=1 (for current runtime)
$ sudo echo 'vm.swappiness=1' >> /etc/sysctl.conf (make permanent)

Disable NUMA, either in VM BIOS or, invoke mongod with NUMA disabled. All supported versions of MongoDB ship with an init script that automates this as follows:

numactl –interleave=all /usr/bin/mongod –f /etc/mongod.conf

Also ensure:

$ sudo cat /proc/sys/vm/zone_reclaim_mode
0

Finally, once you have configured the /etc/mongod.conf file (as root), you can start the mongod service –  see output from grep -v ^# /etc/mongod.conf below. Note, I have added the address for the primary NIC interface to the bind_ip in addition to the local loopback.

logpath=/mongodb/log/mongod.log 
logappend=true
fork=true
dbpath=/mongodb/data
pidfilepath=/var/run/mongodb/mongod.pid
bind_ip=127.0.0.1,10.68.64.110
sudo service mongod start

Once the database has started then you can connect via the mongo shell and verify the database is up and running :

$ mongo
MongoDB shell version: 3.0.3
connecting to: test
>

Now that we have our mongodb instance installed, we can use it as a template to clone additional MongoDB hosts on demand. I will cover this in future posts when I create replica sets and shards etc. For now, we need to get some data loaded and perform a few CRUD operations and perform some additional testing. I’ll cover this in my next post.

 

 

 

 

That ‘One Click’ upgrade again, in full

One way of demonstrating the concept of ‘Invisible Infrastructure’ is the ability to complete a full system upgrade with minimal service interruption. In this post I will show the “One Click” upgrade facility that’s available on the Nutanix platform.  This facility allows the admin to upgrade the Nutanix Operating System (NOS), the hypervisor, any required storage firmware and appropriate version of  Nutanix Cluster Check (NCC) for the target NOS release.

You can choose to either upload the NOS upgrade tarball or have it automatically downloaded to a landing area. Just check the Enable Automatic downloads box. Here I am uploading the software to the platform.

You can choose to either upload the NOS upgrade tarball or have it automatically downloaded to a landing area. Just check the Enable Automatic Download box. Here I am uploading the software to the platform.

Similar to the NOS version, the hypervisor can also be upgraded to a newer version when available.

Similar to the NOS version, the hypervisor can also be upgraded to a newer version when available.

You can either select to run the preupgrade checks standalone without performing an upgrade or just select to upgrade directly, in which case the checks are run prior.

You can either select to run the preupgrade checks standalone without performing an upgrade or just select to upgrade directly, those same checks will be run before the start of the upgrade in any case.

Selecting upgrade will show the progress of the various stages of the upgrade as they occur.

Selecting upgrade will show the progress of the various stages of the upgrade as they occur. CVMs are upgraded sequentially and only one CVM is rebooted at a time. A CVM is always back in the cluster membership before the next CVM is restarted.

kvm-preupgrade

You can choose to upgrade the underlying hypervisor as well at this stage.

You can choose to upgrade the underlying hypervisor as well at this stage.

As always you can monitor progress in the Prism main window. Here we see the upgrade process has been completed successfully.

As always you can check progress in the Prism main window. Here we see the upgrade process has completed successfully.

kvm-upgrade-events

Nutanix Prism also shows the individual task info ie task stage, CVM/host involved, time taken etc.

Nutanix Prism also shows the individual task info ie task stage, CVM/host involved, time taken etc.

The Nutanix platform upgrade takes care of all the intermediate steps and just works, regardless of the size of the cluster. There’s minimal impact and disruption as the upgrade takes place and it enables you to carry out such tasks within normal working hours, and not losing a weekend to the usual rigours of a traditional hardware upgrade cycle.

Webscalin’ – adding Nutanix nodes

Most modern web-scale applications (NoSQL, Search, Big Data, etc) are achieving massive elastic scale though horizontal scale out techniques. The admins for such apps require the ability to add nodes and storage for the required scale out without interruption to service. The workflow for adding a node to a Nutanix cluster allows such seamless addition, without any of the complex storage operations such as multipathing, zoning/masking, etc. A node is simply added to the chassis, the autodiscovery service detects the new node and the user is then simply asked to push a button to complete the process. The following are some screenshots of the prescribed workflow…

Connect to the nodes lights out management or IPMI webapp via a browser (enter the IPMI address) and login using the ADMIN credentials. You may need enable java im yor browser and configure java to allow the IPMI address.

After inserting the new node into the chassis slot, connect to the nodes lights out management or IPMI webapp via a browser (enter the IPMI address) and login using the ADMIN credentials. You may need enable Java in your browser and configure Java to allow the IPMI address.

Launch the remote console to access the Hypervisor

Launch the Console to enable remote access the Hypervisor.

Using the menu bar power on the node (if needed) otherwise login and configure network addressing.

Using the ‘Power Control’ drop down on the Menu bar across the top of the frame- Power On the node (if needed). You can at this point set up any L2 networking such vlan tagging etc.

Select 'Expand Cluster' from the right drop down menus in the Prism GUI. The node should be auto-discovered.

Select ‘Expand Cluster’ from the right drop down menus in the Prism GUI. The node should be auto-discovered.

Configure the required network addresses and select 'Save' to add the node to the cluster.

Configure the required network addresses and select ‘Save’ to add the node to the cluster.

The progress of the node addition can be monitored in the Prism GUI. Note that the hypervisor was automatically upgraded in order to maintain the same software functionality across the cluster nodes.

The progress of the node addition can be monitored in the Prism GUI. Note that the hypervisor was automatically upgraded in order to maintain the same software functionality across the cluster nodes.

That’s it, once the node is added and the metadata is re-balanced across all the nodes,  then the new nodes storage (HDD/SSD) is added to the storage pool with the rest of the cluster nodes. At which point all containers (datastores) are automatically mounted onto the newly added host and the new host is ready to receive guests! This kind of ease of use story is becoming paramount in terms of  time to value for many webscale applications. Its all well and good having applications on top of NoSQL DBs that allow for rapid development and deployment. However, if the upfront planning for the underlying architecture holds everything back for days if not weeks, then modern DevOps style operations are much harder to achieve..

Switch to Simplicity …

With the recent announcement by Nutanix of the Xtreme Computing Platform (XCP) built on a KVM based hypervisor and the Acropolis management solution. I thought I would use this step change in technology as the basis for my inaugural blog! What I would like to highlight is how much simpler this has made deploying applications in virtual machines, particularly on a KVM platform. As most of us that have had some exposure to KVM, we know that KVM is in fact the amalgamation of three distinct open source projects. These are:

QEMU (Quick Emulator). An emulator and virtualizer for Linux.  KVM leverages QEMU specifically for CPU emulation, executing virtual machine operations directly on the host CPU to achieve near native performance.

KVM kernel modules: Loadable kernel components which provide the virtualization infrastructure (other than the CPU).  Specifically, kvm.ko provides the core virtualization infrastructure and a processor-specific module (kvm-intel.ko or kvm-amd.ko) interacts with QEMU.

libvirt: An API for the management of virtualization environments

Let’s take a look at how a VM is created using the Nutanix Prism GUI…

Selecting the Network Create box in the VM tab: we are assigning a vlan tag (64) and leaving the network to be externally managed – ie: the current (external to Nutanix) network infrastructure manages the network (such as DHCP etc.)

Selecting the Network Create box in the VM tab: we are assigning a vlan tag (64) and leaving the network externally managed – ie: the current (external to Nutanix) network infrastructure manages the network (such as DHCP etc.)

Next select +VM Create and fill out the details as required above. We will add a NIC, a boot Disk and attach the CDROM image in the next steps.

Next select +VM Create and fill out the details as required above. We will add a NIC, a boot Disk and attach the CDROM image in the next steps.

Add a NIC from the previously created L2 network (VLAN 64)

Add a NIC from the previously created L2 network (VLAN 64)

Attach the CDROM image by selecting CLONE FROM NDFS FILE and specifying the path to the image. Images are stored on a specifically created for the purpose NFS container.

Attach the CDROM image by selecting CLONE FROM NDFS FILE. Specify the PATH to the image. Images are stored on a NFS container – specifically created for that purpose

Add Disk – create a 100GB vDisk to act as the permanent boot disk that will be stored on DEFAULT-CTR.

Add Disk – create a 100GB vDisk to act as the permanent boot disk stored on DEFAULT-CTR.

Power the VM and launch the console from the Prism GUI. The VM should power on and install.

Power the VM and launch the console from the Prism GUI. The VM should power on and install.

The finished product (remember to “eject” the cdrom) …

The finished product (remember to “eject” the cdrom)

Next, I am going to step through the manual creation of a VM using the standard APIs and show how the complexity of which, has been abstracted by doing things the Nutanix way. First off, we are going to need a virtual disk image:

$ qemu-img create -f qcow2 libvirt-example.qcow 4G

Formatting ‘libvirt-example.qcow’, fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 lazy_refcounts=off

Here’s the syntax to create a very basic VM using the libvirt API. I am specifying the cdrom image, the virtual disk location, a name for the VM and the connection to the local libvirt instance:

$ sudo virt-install \
–cdrom=/var/lib/libvirt/images/ttylinuxvirtio_x86_64-16.1.iso \
–disk=/var/lib/libvirt/images/libvirt-example.qcow,format=qcow2 \
–name=libvirt-example –ram=512 –connect qemu:///system

You can obtain the above ttylinux image here. Note also that libvirt has created a default network for the VM:

$ sudo virsh net-list –all
Name                    State     Autostart             Persistent
————————————————————-
default                 active    yes                        yes

Next, we can create another VM but this time using the QEMU interface. In this example we create a VNC endpoint to connect to the VM after start up:

sudo qemu-system-x86_64 -enable-kvm -name qemu-example \
-m 1G -hda /var/lib/libvirt/images/qemu-example.qcow2 \
–cdrom /var/lib/libvirt/images/ttylinux-virtio_x86_64-16.1.iso \
-vnc 127.0.0.1:1

These images can of course be managed by utilities such as virt-manager, virt-viewer, etc. Equally, I have not shown the full complexity of the command line options, exposed by the standard KVM APIs. I have shown though, how the Nutanix software simplifies and abstracts away the complexity of these APIs that most provisioning and orchestration stacks have to deal with. The Nutanix platform does provide a management API and a command line syntax to build out your VMs but I will leave that for another post in the future. Thanks for reading.