Tag Archives: mongod

Sharded MongoDB config in Nutanix (3) : Backup & DR

Backing up sharded NoSQL databases can often require some additional consideration.  For example, any backup of a sharded MongoDB config needs to capture a backup for each shard and a single member of the configuration database quorum. The configuration database (configdb) holds the cluster metadata and so supports the ability to shard.  In a production environment you will need three config databases and they will all contain the same (meta)data. In this post I intend to cover the steps I recently used to backup a sharded MongoDB deployment using the snapshot technology available on my Nutanix platform.

First step prior to any backup should always be to stop the balancer. The balancer is responsible for migrating/balancing data “chunks” between the various shards. If such a migration is running while backing up then the resultant backup is potentially invalidated.

mongos> use config
switched to db config
mongos> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
mongos>

At which point we can proceed to lock one of the secondary replicas in each shard. I outlined how to do this in my post relating to backing up replica sets. The command sequence is repeated below, note that this needs to be done on one secondary for each shard (and should only be done if running MMAPv1 storage engine on the replica):

rs01:SECONDARY> db.fsyncLock()
{
 "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
 "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
 "ok" : 1

Having locked the secondaries for writes, the next step is to create a virtual machine (VM) snapshot of a configdb and of a secondary belonging to each shard (replica set). Using the Nutanix Acropolis App Mobility Fabric as follows :

<acropolis> vm.snapshot_create mongo-configdb01,mongodb03,mongowt03 snapshot_name_list=mongoconfigdb01-bk,mongodb02-bk,mongowt03-bk
SnapshotCreate: complete

The above snapshots have all been created at once within a single consistency group. The next step will be to create clones from them…

<acropolis> vm.clone configdb01-clone clone_from_snapshot=mongoconfigdb01-bk
configdb01-clone: complete
<acropolis> vm.clone mongodb03-clone clone_from_snapshot=mongodb03-bk
mongodb03-clone: complete
<acropolis> vm.clone mongowt03-clone clone_from_snapshot=mongowt03-bk
mongowt03-clone: complete

At this point we can unlock each of the secondaries :

rs01:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
rs01:SECONDARY>

and re-enable the balancer:

mongos> use config
switched to db config
mongos> sh.setBalancerState(true)
mongos>

As of now I merely have the “bare bones” of a MongoDB cluster encapsulated in the three VM clones just created. The thing to bear in mind is that each clone generated from the replica snapshots contains only a subset of any sharded collection. Hopefully, ~50% each, if our shard key selection is any good! That means we can’t just proceed as in previous posts and bring up each clone as a standalone MongoDB instance. The simplest way to make use of the current clones might be to just rsync any data to new hosts in a freshly sharded deployment. So essentially, we would just transfer the data to the required volumes on the newly set up VMs. In any case, there would still be some work to do around the replica set memberships and associated config.

Alternatively, to have access to any sharded collection held in my newly created clones above. I could begin by reconfiguring each replica clone as the new primary in the replica set and create additional configdb VMs that can be registered with a new mongos VM. Recall that mongos is stateless, and gets its info from the configdbs. At which stage we can re-register the replica shards within the configdb service. For example, here’s the state of the replica sets after they have been cloned:

> rs.status()
{
 "state" : 10,
 "stateStr" : "REMOVED",
 "uptime" : 97,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "ok" : 0,
 "errmsg" : "Our replica set config is invalid or we are not a member of it",
 "code" : 93
}
> rs.conf()
{
 "_id" : "rs01",
 "version" : 7,
 "members" : [
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "10.68.64.131:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "10.68.64.144:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatTimeoutSecs" : 10,
 "getLastErrorModes" : {

 },
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 }
 }
}

So first off we need to set each cloned replica VM as the new replica set primary and remove the no longer required (or available) hosts from the set membership :

> cfg=rs.conf()
> printjson(cfg) 
> cfg.members = [cfg.members[0]]
[
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
 },
 "slaveDelay" : 0,
 "votes" : 1
 }
]
 
> cfg.members[0].host="10.68.64.153:27017"
10.68.64.153:27017

> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }

rs01:PRIMARY> rs.status()
{
 "set" : "rs01",
 "date" : ISODate("2015-10-06T14:02:23.263Z"),
 "myState" : 1,
 "members" : [
 {
 "_id" : 0,
 "name" : "10.68.64.152:27017",
 "health" : 1,
 "state" : 1,
 "stateStr" : "PRIMARY",
 "uptime" : 396,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "electionTime" : Timestamp(1444140137, 1),
 "electionDate" : ISODate("2015-10-06T14:02:17Z"),
 "configVersion" : 97194,
 "self" : true
 }
 ],
 "ok" : 1
}

Once you have done this for all the required replica sets (these are your shards dont forget), the next step is to set up the configdb clone and create additional identical VMs that will contain the cluster metadata. The configdbs can be verified for correctness as follows :

configsvr> db.runCommand("dbhash")
{
 "numCollections" : 14,
 "host" : "localhost.localdomain:27019",
 "collections" : {
 "actionlog" : "bd8d8c2425e669fbc55114af1fa4df97",
 "changelog" : "fcb8ee4ce763a620ac93c5e6b7562eda",
 "chunks" : "bd7a2c0f62805fa176c6668f12999277",
 "collections" : "f8b0074495fc68b64c385bf444e4cc90",
 "databases" : "c9ee555dde6fc84a7bbdb64b74ef19bd",
 "lockpings" : "ba67ca64d12fd36f8b35a54e167649a8",
 "locks" : "c226b1a2601cf3e61ba45aeab146663d",
 "mongos" : "690326c2edcb410eeeb9212ad7c6c269",
 "settings" : "ce32ef7c2b99ca137c5a20ea477062f7",
 "shards" : "77d49755ba04fe38639c5c18ee5be78d",
 "tags" : "d41d8cd98f00b204e9800998ecf8427e",
 "version" : "14e1d35ba0d32a5ff393ddc7f16125a1"
 },
 "md5" : "61bde8ac240aead03080f4dde3ec2932",
 "timeMillis" : 43,
 "fromCache" : [ ],
 "ok" : 1
}

The above hashes in bold need to agree across the configdb membership. They are key to having all configdb servers in agreement. Once you have the configdbs enabled, then register them with a newly created mongos VM. Below, I am just using a single configdb to test for correctness. A production setup should always have three per cluster:

 mongos --configdb 10.68.64.151:27019

The next issue will be to correct the configdb shard info.  So as you can see from the mongos session below, the replica info in the configdb is still referring to the previous deployment:

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "rs01/10.68.64.111:27017,10.68.64.131:27017,10.68.64.144:27017"
 },
 {
 "_id" : "rs02",
 "host" : "rs02/10.68.64.110:27017,10.68.64.114:27017,10.68.64.137:27017"
 }
 ],
 "ok" : 1
}

We can correct the above setup to reflect our newly cloned shard/replica VMs. In a mongo shell session on the configdb server VM.  :

use config
configsvr> db.shards.update({_id: "rs01"} , {$set: {"host" : "10.68.64.152:27017"}})
configsvr> db.shards.update({_id: "rs02"} , {$set: {"host" : "10.68.64.153:27017"}})

You will have to restart the mongos server so that it picks up the new info from the configdb server.

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "10.68.64.152:27017"
 },
 {
 "_id" : "rs02",
 "host" : "10.68.64.153:27017"
 }
 ],
 "ok" : 1

And that, as they say, is how babies get made. At this stage you have a MongoDB cluster consisting of a configdb, registered with a mongos server, that can access both shards, formed of a replica set, formed of a single primary member. To flesh this out to production standards you could increase the configdb count (to 3) and add secondaries to the replica sets for higher availability. With some additional work perhaps (ie : renaming replica sets ?) this could form the basis of a Dev/QA system, containing a potential production workload.

Sharded MongoDB config on Nutanix (1) : Deployment

So far I have posted on MongoDB deployments either as standalone or as part of a replica set. This is fine when you can size your VM memory to hold the entire database working set. However, if your VM’s RAM will not accommodate the working set in memory, you will need to shard to aggregate RAM from multiple replica sets and form a MongoDB cluster.

Having already discussed using clones of gold image VMs to create members for a replica set, then the most basic of MongoDB clusters requires at least two replica sets. On top of which we need a number of MongoDB “infrastructure” VMs that make MongoDB cluster operation possible. These entail a minimum of three (3) Configuration Databases (mongod –configsvr) per cluster and around one (1) Query Router (mongos) for every two shards. Here is the layout of a cluster deployment on my lab system:

2shard-system

In the above lab deployment, for availability considerations, I avoid co-locating any primary replica VM on the same physical host, and likewise any of the Query Router or ConfigDB VMs. One thing to bear in mind is that sharding is done on a per collection basis. Simply put, the idea behind sharding is that you split the collections across the replica sets and then by connecting to a mongos process you are routed to the appropriate shard holding the part of the collection that can serve your query. The following commands show the syntax to create one of the three required configdb’s (ran on three separate VMs, and need to be started first), and a Query Router, or mongos process (where we add the IP addresses of each configdb server VM) :

Config DB Servers – each ran as:
mongod --configsvr --dbpath /data/configdb --port 27019

Query Router - ran as:
mongos --configdb 10.68.64.142:27019,10.68.64.143:27019,10.68.64.145:27019

- the above IP addresses in mongos command line are the addresses of each config DB.

This brings up an issue if you are not cloning replica VMs from “blank” gold VMs. By cloning a new replica set from a current working replica set, ie: so that you essentially have each replica set holding a full copy of all your databases and their collections. Then when you come to add such a replica set as a shard, you generate the error condition shown below.

Here’s the example of what can happen when you attempt to shard and your new replica set (rs02)  is simply cloned off a current running replica set (rs01):

mongos> sh.addShard("rs02/192.168.1.52")
{s
 "ok" : 0,
 "errmsg" : "can't add shard rs02/192.168.1.52:27017 because a local database 'ycsb' 
exists in another rs01:rs01/192.168.1.27:27017,192.168.1.32:27017,192.168.1.65:27017"
}

This is the successful workflow adding both shards (the primary of each replica set) via the mongos router VM:

$ mongo --host localhost --port 27017
MongoDB shell version: 3.0.3
connecting to: localhost:27017/test
mongos>
 
mongos> sh.addShard("rs01/10.68.64.111")
{ "shardAdded" : "rs01", "ok" : 1 }
mongos> sh.addShard("rs02/10.68.64.110")
{ "shardAdded" : "rs02", "ok" : 1 }

We next need to enable sharding on the database and subsequently shard on the collection we want to distribute across the replica sets available. The choice of shard key is crucial to future MongoDB cluster performance. Issues such as read and write scaling, cardinality etc are covered here. For my test cluster I am using the _id field for demonstration purposes.

mongos> sh.enableSharding("ycsb")
{ "ok" : 1 }

mongos> sh.shardCollection("ycsb.usertable", { "_id": 1})
{ "collectionsharded" : "ycsb.usertable", "ok" : 1 }

The balancer process will run for the period of time needed to migrate data between the available shards. This can take anywhere from a number of hours to a number of days depending on the size of the collection, the number of shards, the current workload etc. Once complete however, this results in the following sharding status output. Notice  the “chunks” of the usertable collection held in the ycsb database are now shared across both shards (522 chunks in each shard) :

 mongos> sh.status()
--- Sharding Status ---
 sharding version: {
 "_id" : 1,
 "minCompatibleVersion" : 5,
 "currentVersion" : 6,
 "clusterId" : ObjectId("55f96e6c5dfc4a5c6490bea3")
}
 shards:
 { "_id" : "rs01", "host" : "rs01/10.68.64.111:27017,10.68.64.131:27017,10.68.64.144:27017" }
 { "_id" : "rs02", "host" : "rs02/10.68.64.110:27017,10.68.64.114:27017,10.68.64.137:27017" }
 balancer:
 Currently enabled: yes
 Currently running: no
 Failed balancer rounds in last 5 attempts: 0
 Migration Results for the last 24 hours:
 No recent migrations
 databases:
 { "_id" : "admin", "partitioned" : false, "primary" : "config" }
 { "_id" : "enron_mail", "partitioned" : false, "primary" : "rs01" }
 { "_id" : "mydocs", "partitioned" : false, "primary" : "rs01" }
 { "_id" : "sbtest", "partitioned" : false, "primary" : "rs01" }
 { "_id" : "ycsb", "partitioned" : true, "primary" : "rs01" }
 ycsb.usertable
 shard key: { "_id" : 1 }
 chunks:
 rs01 522
 rs02 522
 too many chunks to print, use verbose if you want to force print
 { "_id" : "test", "partitioned" : false, "primary" : "rs02" }

Additional Links:

 

 

 

 

 

 

Using Nutanix clones to deploy MongoDB replica set

In this post I am going to look at setting up a replica set to support high availability in a MongoDB environment. Replica sets contain a primary MongoDB database and a number of additional secondary replica databases. Any one of the allowed replicas can become primary in the event that the original primary fails for whatever reason. Replica set membership count is usually an odd number in order that new primary elections are not tied.

Building out an HA MongoDB setup on Nutanix is relatively easy to do. Each MongoDB instance is hosted in a separate, sandboxed environment. In our case a virtual machine (VM). Each VM is then located on a separate physical hypervisor host. I have a gold image VM that has a MongoDB instance installed along recommended best practice guidelines. This VM gets cloned as required when I need to build out a new MongoDB environment. So for a 3 member replica set I need 3 clones.

three-replicaset

From a cluster CVM node type:

$ acli 

<acropolis> vm.clone mongodb01,mongodb02,mongodb03 clone_from_vm=mongodb30-gold
mongodb01: complete
mongodb02: complete
mongodb03: complete
 
<acropolis> vm.list
...
mongodb01: 2b9498c1-502e-454e-93c8-931a45a321b6
mongodb02: 9a445d26-caf9-4ddf-9d8e-296ea8b6e19e
mongodb03: 9a5512fa-3d19-4ddc-8cac-11721f999459
...

<acropolis> vm.on mongodb01,mongdb02,mongodb03
mongodb01: complete
mongodb01: complete
mongodb01: complete

After powering on the VMs, check that mongod starts correctly on default port 27017 on each VM. First thing to make sure is that the mongod process is listening on the correct address. I have set my VMs to use DHCP and this is the address that the service needs to listen on.

# ip a

2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether 52:54:00:db:17:76 brd ff:ff:ff:ff:ff:ff
 inet 10.68.64.111/24 brd 10.68.64.255 scope global eth0


# cat /etc/mongod.conf | grep -i bind_ip
 bind_ip=127.0.0.1,10.68.64.111

# service mongod restart
# service mongod status

Once all of the VMs are up and running on their respective address:port tuples, make sure that we enable firewall access via iptables. Each VM, that will form part of the replica set, needs to allow access to the other members via mongod port 27017. So for a replica set with members 10.68.64.111, 10.68.64.114, 10.68.64.113, then for each member, in this example 10.68.64.111, run…

# iptables -A INPUT -s 10.68.64.113 -p tcp --destination-port 27017 -m state \
--state NEW,ESTABLISHED -j ACCEPT
# iptables -A INPUT -s 10.68.64.114 -p tcp --destination-port 27017 -m state \
--state NEW,ESTABLISHED -j ACCEPT

# service iptables save
iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
# service iptables reload

abridged iptables -L output after the above changes….

Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT tcp -- 10.68.64.113 anywhere tcp dpt:27017 state NEW,ESTABLISHED
ACCEPT tcp -- 10.68.64.114 anywhere tcp dpt:27017 state NEW,ESTABLISHED

Check access by performing a series of bi-directional tests between all the replica set members:

<10.68.64.111>$ mongo --host 10.68.64.113 --port 27017
MongoDB shell version: 3.0.3
connecting to: 10.68.64.113:27017/test
>
> quit()

Should any of the connection tests fail then revisit the iptables entries. Usual troubleshooting applies with telnet or nc, netstat etc.

In order to create the replica set, connect via ssh to each VM and edit the mongod.conf to include the replSet functionality:

$ grep -i replSet /etc/mongod.conf
replSet=rs01

Restart the mongod process (sudo service mongod restart) and then start a mongo shell session, the first member of the set (primary) needs to run :

$ mongo
MongoDB shell version: 3.0.3
connecting to: test
> rs.initiate()
{
 "info2" : "no configuration explicitly specified -- making one",
 "me" : "10.68.64.111:27017",
 "ok" : 1
}
rs01:PRIMARY>

You can use the shell commands rs.conf() and rs.status() to check the replica set at any point. We’ll look at one of these outputs after completing the replica set creation. Next, from the same mongo shell session, add the other two replica nodes:

rs01:PRIMARY> rs.add("10.68.64.113")
{ "ok" : 1 }

rs01:PRIMARY> rs.add("10.68.64.114")
{ "ok" : 1 }

Potential error scenarios

  •  if you didn’t clone the VMs for the replica set from a blank gold image but rather from a VM already running a replicated mongodb configuration. Then the replication commands report errors similar to this :
{
 "info2" : "no configuration explicitly specified -- making one",
 "me" : "10.68.64.111:27017",
 "info" : "try querying local.system.replset to see current configuration",
 "ok" : 0,
 "errmsg" : "already initialized",
 "code" : 23
}

On the proviso that this is a greenfield install, delete the local db config files in the data directory and re-run the rs.initiate()

  • if the firewall rules are not set correctly then the following error message is thrown:
 "errmsg" : "Quorum check failed because not enough voting nodes responded; 
required 2 but only the following 1 voting nodes responded: 10.68.64.111:27017; 
the following nodes did not respond affirmatively: 
10.68.64.131:27017 failed with Failed attempt to connect to 10.68.64.131:27017; 
couldn't connect to server 10.68.64.131:27017 (10.68.64.131), 
connection attempt failed",

Ensure that the firewall rules allow proper access between the VM’s.

  • if replication is not enabled correctly in the mongod configuration files on each host of the replica set :
"errmsg" : "Quorum check failed because not enough voting nodes responded; 
required 2 but only the following 1 voting nodes responded: 10.68.64.110:27017; 
the following nodes did not respond affirmatively: 
10.68.64.114:27017 failed with not running with --replSet",

Once the replica set configuration is complete, check the setup by running rs.status() or rs.conf() to confirm :

rs01:PRIMARY> rs.conf()
{
 "_id" : "rs01",
 "version" : 3,
 "members" : [
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "10.68.64.113:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "10.68.64.114:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatTimeoutSecs" : 10,
 "getLastErrorModes" : {

 },
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 }
 }
}

From the output above we can see the full replica set membership, both the member function and status. Things like priority settings and whether or not the replica is hidden to user applications queries etc. Also, whether a replica is a full mongod instance or an arbiter (simply there to mitigate against primary election ties). Or, if any of the replicas have a delay enabled (used for backup/reporting duties).

In an earlier post I have shown the available mongo shell commands to calculate the working set for the database. For read intensive workloads, where your working set is sized to fit available RAM in the mongod server VMs; a replica set deployment can be used to run MongoDB and support high availability.

Installing MongoDB on Nutanix XCP

As part of the recent MongoDB certification of Nutanix XCP as an Infrastructure as a Service  (IaaS) platform,  I thought I might collate some of the info I have collected while working to get the certification process completed. There’s a lot of great docs over at www.mongodb.com but I want to condense everything into a series of posts. This first post will deal with the initial install of a standalone MongoDB instance.

We saw in my previous post here how to create a Linux VM and add networking and vDisks. In this instance I have added 6 x 200GB vDisks for a data volume, and an additional 2 vDisks – one for the journal volume (50GB) and one volume to hold the log file (100GB). Here’s the output from /usr/bin/lsscsi showing the disks and their SCSI assignments :

[2:0:1:0] disk NUTANIX VDISK 0 /dev/sdj
[2:0:2:0] disk NUTANIX VDISK 0 /dev/sdk
[2:0:7:0] disk NUTANIX VDISK 0 /dev/sdb
[2:0:8:0] disk NUTANIX VDISK 0 /dev/sdc
[2:0:9:0] disk NUTANIX VDISK 0 /dev/sdd
[2:0:10:0] disk NUTANIX VDISK 0 /dev/sde
[2:0:11:0] disk NUTANIX VDISK 0 /dev/sdf
[2:0:12:0] disk NUTANIX VDISK 0 /dev/sdg
[2:0:13:0] disk NUTANIX VDISK 0 /dev/sdh
[2:0:14:0] disk NUTANIX VDISK 0 /dev/sdi

Create a user/group mongod that will own the MongoDB software :

# groupadd mongod 
# useradd mongod

To install the MongoDB Enterprise packages, create a new repo with the required information and then install as MongoDB user using yum :

# pwd
/etc/yum.repos.d
# cat mongodb-enterprise.repo
[mongodb-enterprise]
name=MongoDB Enterprise Repository
baseurl=https://repo.mongodb.com/yum/redhat/$releasever/mongodb-enterprise/stable/$basearch/
gpgcheck=0
enabled=1
$ sudo yum install -y mongodb-enterprise

We use LVM to create a 6 column striped data volume. All Nutanix vDIsks are redundant (RF=2) so to create a RAID10 data volume just stripe the vDisks, and then create 2 further linear volumes. First create the underlying physical volumes :

# lsscsi | awk '{print $6}' | grep /dev/sd | grep -v sda | xargs pvcreate
 Physical volume "/dev/sdb" successfully created
 Physical volume "/dev/sdc" successfully created
 Physical volume "/dev/sdd" successfully created
 Physical volume "/dev/sde" successfully created
 Physical volume "/dev/sdf" successfully created
 Physical volume "/dev/sdg" successfully created
 Physical volume "/dev/sdh" successfully created
 Physical volume "/dev/sdi" successfully created
 Physical volume "/dev/sdj" successfully created

Then create both the volume groups and the required volumes

vgcreate mongodata /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg 
vgcreate mongojournal /dev/sdh 
vgcreate mongolog /dev/sdi
# lvcreate -i 6 -l 100%VG -n mongodata mongodata
# lvcreate -l 100%VG -n mongojournal mongojournal 
# lvcreate -l 100%VG -n mongolog mongolog

Create an XFS filesystem on each volume:

mkfs.xfs /dev/mapper/mongodata-mongodata
mkfs.xfs /dev/mapper/mongojournal-mongojournal
mkfs.xfs /dev/mapper/mongolog-mongolog

Create the required mountpoints:

mkdir -p /mongodb/data mongodb/journal /mongodb/log

Mount the filesystems – setting noatime option on the data volume

/dev/mapper/mongodata-mongodata /mongodb/data xfs defaults,auto,noatime,noexec 0 0
/dev/mapper/mongojournal-mongojournal /mongodb/journal xfs defaults,auto,noexec 0 0
/dev/mapper/mongolog-mongolog /mongodb/log xfs defaults,auto,noexec 0 0

Set up a  soft link to re-direct the journal I/O to a separate volume:

# ln -s /mongodb/journal /mongodb/data/journal
...
lrwxrwxrwx. 1 root root 21 Nov 21 14:13 journal -> /mongodb/journal
...

At this point set the filesystem ownership to the MongoDB user:

# chown -R mongod:mongod /mongodb/data mongodb/journal mongodb/log

Prior to starting MongoDB there are a few well known best practices that need to be adhered to. Firstly, we reduce the read ahead on the data volume in order to avoid filling RAM with unwanted pages of data. MongoDB documents are quite small and a large readahead figure will fill RAM with additional pages of data that will have to then be evicted to make room for other required pages. Filling virtual memory with this superfluous data can have an adverse effect on performance. Usual recommendation is to start with a setting of 16K (32 * 512M sectors) and then adjust upwards from there.

rwxrwxrwx. 1 root root 7 Feb 4 11:50 /dev/mapper/mongodata-mongodata -> ../dm-3 

# blockdev --setra 32 /dev/dm-3
# blockdev --getra /dev/dm-3
32

MongoDB recommends that you disable transparent huge pages, edit your startup files as follows :

 #disable THP at boot time
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/enabled; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
 fi
 if test -f /sys/kernel/mm/redhat_transparent_hugepage/defrag; then
 echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
 fi

Set swappiness = 1: MongoDB is a memory-based database; if the nodes are sized correctly, then we won’t need to swap. However, setting swappiness=0 could cause unexpected invocations of the OOM (Out of Memory) killer in certain Linux distros.

$ sudo sysctl vm.swappiness=1 (for current runtime)
$ sudo echo 'vm.swappiness=1' >> /etc/sysctl.conf (make permanent)

Disable NUMA, either in VM BIOS or, invoke mongod with NUMA disabled. All supported versions of MongoDB ship with an init script that automates this as follows:

numactl –interleave=all /usr/bin/mongod –f /etc/mongod.conf

Also ensure:

$ sudo cat /proc/sys/vm/zone_reclaim_mode
0

Finally, once you have configured the /etc/mongod.conf file (as root), you can start the mongod service –  see output from grep -v ^# /etc/mongod.conf below. Note, I have added the address for the primary NIC interface to the bind_ip in addition to the local loopback.

logpath=/mongodb/log/mongod.log 
logappend=true
fork=true
dbpath=/mongodb/data
pidfilepath=/var/run/mongodb/mongod.pid
bind_ip=127.0.0.1,10.68.64.110
sudo service mongod start

Once the database has started then you can connect via the mongo shell and verify the database is up and running :

$ mongo
MongoDB shell version: 3.0.3
connecting to: test
>

Now that we have our mongodb instance installed, we can use it as a template to clone additional MongoDB hosts on demand. I will cover this in future posts when I create replica sets and shards etc. For now, we need to get some data loaded and perform a few CRUD operations and perform some additional testing. I’ll cover this in my next post.