Tag Archives: NUMA

Installing MongoDB on Nutanix XCP

As part of the recent MongoDB certification of Nutanix XCP as an Infrastructure as a Service  (IaaS) platform,  I thought I might collate some of the info I have collected while working to get the certification process completed. There’s a lot of great docs over at www.mongodb.com but I want to condense everything into a series of posts. This first post will deal with the initial install of a standalone MongoDB instance.

We saw in my previous post here how to create a Linux VM and add networking and vDisks. In this instance I have added 6 x 200GB vDisks for a data volume, and an additional 2 vDisks – one for the journal volume (50GB) and one volume to hold the log file (100GB). Here’s the output from /usr/bin/lsscsi showing the disks and their SCSI assignments :

[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdj
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdk
[2:0:7:0] disk NUTANIX VDISK 0 /dev/sdb
[2:0:8:0] disk NUTANIX VDISK 0 /dev/sdc
[2:0:9:0] disk NUTANIX VDISK 0 /dev/sdd
[2:0:10:0] disk NUTANIX VDISK 0 /dev/sde
[2:0:11:0] disk NUTANIX VDISK 0 /dev/sdf
[2:0:12:0] disk NUTANIX VDISK 0 /dev/sdg
[2:0:13:0] disk NUTANIX VDISK 0 /dev/sdh
[2:0:14:0] disk NUTANIX VDISK 0 /dev/sdi

Create a user/group mongod that will own the MongoDB software :

# groupadd mongod 
# useradd mongod

To install the MongoDB Enterprise packages, create a new repo with the required information and then install as MongoDB user using yum :

# pwd
/etc/yum.repos.d
# cat mongodb-enterprise.repo
[mongodb-enterprise]
name=MongoDB Enterprise Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/stable/$basearch/
gpgcheck=0
enabled=1
$ sudo yum install -y mongodb-enterprise

We use LVM to create a 6 column striped data volume. All Nutanix vDIsks are redundant (RF=2) so to create a RAID10 data volume just stripe the vDisks, and then create 2 further linear volumes. First create the underlying physical volumes :

# lsscsi | awk '{print $6}' | grep /dev/sd | grep -v sda | xargs pvcreate
 Physical volume "/dev/sdb" successfully created
 Physical volume "/dev/sdc" successfully created
 Physical volume "/dev/sdd" successfully created
 Physical volume "/dev/sde" successfully created
 Physical volume "/dev/sdf" successfully created
 Physical volume "/dev/sdg" successfully created
 Physical volume "/dev/sdh" successfully created
 Physical volume "/dev/sdi" successfully created
 Physical volume "/dev/sdj" successfully created

Then create both the volume groups and the required volumes

vgcreate mongodata /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg 
vgcreate mongojournal /dev/sdh 
vgcreate mongolog /dev/sdi
# lvcreate -i 6 -l 100%VG -n mongodata mongodata
# lvcreate -l 100%VG -n mongojournal mongojournal 
# lvcreate -l 100%VG -n mongolog mongolog

Create an XFS filesystem on each volume:

mkfs.xfs /dev/mapper/mongodata-mongodata
mkfs.xfs /dev/mapper/mongojournal-mongojournal
mkfs.xfs /dev/mapper/mongolog-mongolog

Create the required mountpoints:

mkdir -p /mongodb/data mongodb/journal /mongodb/log

Mount the filesystems – setting noatime option on the data volume

/dev/mapper/mongodata-mongodata /mongodb/data xfs defaults,auto,noatime,noexec 0 0
/dev/mapper/mongojournal-mongojournal /mongodb/journal xfs defaults,auto,noexec 0 0
/dev/mapper/mongolog-mongolog /mongodb/log xfs defaults,auto,noexec 0 0

Set up a  soft link to re-direct the journal I/O to a separate volume:

# ln -s /mongodb/journal /mongodb/data/journal
...
lrwxrwxrwx. 1 root root 21 Nov 21 14:13 journal -> /mongodb/journal
...

At this point set the filesystem ownership to the MongoDB user:

# chown -R mongod:mongod /mongodb/data mongodb/journal mongodb/log

Prior to starting MongoDB there are a few well known best practices that need to be adhered to. Firstly, we reduce the read ahead on the data volume in order to avoid filling RAM with unwanted pages of data. MongoDB documents are quite small and a large readahead figure will fill RAM with additional pages of data that will have to then be evicted to make room for other required pages. Filling virtual memory with this superfluous data can have an adverse effect on performance. Usual recommendation is to start with a setting of 16K (32 * 512M sectors) and then adjust upwards from there.

rwxrwxrwx. 1 root root 7 Feb 4 11:50 /dev/mapper/mongodata-mongodata -> ../dm-3 

# blockdev --setra 32 /dev/dm-3
# blockdev --getra /dev/dm-3
32

MongoDB recommends that you disable transparent huge pages, edit your startup files as follows :

 #disable THP at boot time
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
 fi
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
 fi

Set swappiness = 1: MongoDB is a memory-based database; if the nodes are sized correctly, then we won’t need to swap. However, setting swappiness=0 could cause unexpected invocations of the OOM (Out of Memory) killer in certain Linux distros.

$ sudo sysctl vm.swappiness=1 (for current runtime)
$ sudo echo 'vm.swappiness=1' >> /etc/sysctl.conf (make permanent)

Disable NUMA, either in VM BIOS or, invoke mongod with NUMA disabled. All supported versions of MongoDB ship with an init script that automates this as follows:

numactl –interleave=all /usr/bin/mongod –f /etc/mongod.conf

Also ensure:

$ sudo cat /proc/sys/vm/zone_reclaim_mode
0

Finally, once you have configured the /etc/mongod.conf file (as root), you can start the mongod service –  see output from grep -v ^# /etc/mongod.conf below. Note, I have added the address for the primary NIC interface to the bind_ip in addition to the local loopback.

logpath=/mongodb/log/mongod.log 
logappend=true
fork=true
dbpath=/mongodb/data
pidfilepath=/var/run/mongodb/mongod.pid
bind_ip=127.0.0.1,10.68.64.110
sudo service mongod start

Once the database has started then you can connect via the mongo shell and verify the database is up and running :

$ mongo
MongoDB shell version: 3.0.3
connecting to: test
>

Now that we have our mongodb instance installed, we can use it as a template to clone additional MongoDB hosts on demand. I will cover this in future posts when I create replica sets and shards etc. For now, we need to get some data loaded and perform a few CRUD operations and perform some additional testing. I’ll cover this in my next post.