Tag Archives: Dev/QA

Nutanix: Cloud-like DevOps powering NoSQL for BigData

The popularity of NoSQL has increasingly come about as developers want to use the same in-memory data structures in their applications and have them map directly into a database persistence layer. For example, storing data in XML or JSON format is often hierarchical and potentially does not lend itself to being easily stored in row based tables. It becomes more complicated if the data also contains lists and objects. Not having to convert these in-memory structures into relational database structures is a major advantage in terms of time to value. Such considerations have been made all the more acute by the rise of the web as a platform for services. There’s also an economic aspect, like the prohibitive infrastructure costs required to scale up traditional RDBMS to support high availability etc. Compare this to such Web-Scale or cloud aware apps like NoSQL, which expects to “just drop in” commodity hardware at the infrastructure layer and scale out horizontally on demand.

So if we were to consider the requirements from a modern hyper-converged infrastructure (HCI) that employed the same Web-Scale paradigms used by modern cloud-aware applications. Then to deploy apps, like a NoSQL database for example, the first thing I would want to do is virtualise. This means a right-sized, sandboxed environment (ie a virtual machine) to run individual NoSQL instances. If there was a need to scale up, then it’s a simple case of increasing RAM and CPU. As the application landscape grows over time and starts to scale out, there’s increased need for more nodes/VMs.  Hence, any HCI platform needs cloud like provisioning of nodes. So providing faster time to deploy and time to value. The ability to auto-discover and add new nodes by the click of a button is quite compelling. In short, horizontal scale out needs to be easily undertaken. Say, in the middle of the production day, while running the month end workload?

Intelligent, automated data tiering, locality and balancing via post-process techniques like Mapreduce is another key requirement. As any database working set grows over time, ie: more users will mean more queries, new tables, indexes, aggregations, etc. So the ability to maintain a responsive I/O profile via SSD, as more I/O is periodically obtained from disk, will be key. If all VMs are then able to get local access to their data via SSD from a global storage fabric so much the better. While we are here, consider how you would migrate to a new(er) hardware fleet with and without a distributed storage fabric. Far easier to just drop in units of converged compute/storage and then migrate VMs to it. Compare how that would work with a large white box server estate spread across numerous racks in a DC? There’s yet another aspect of economics to all this. In that auto tiering of the storage layer means the current “working set” data is held at the most performant (and by comparison more expensive) layer. While colder data sits on cheaper spinning disk.

Another advantage of a distributed storage fabric is one of data service features. Take point in time (PIT) backups of sharded DBs, which can sometimes be a complicated issue. In which case, a data service that supports VM centric snapshots of key VMs in a consistency group can avoid another potential pain point. Also, rapid cloning of preconfigured VMs will improve deployment times and speaks to the DevOps workflows that many IT shops have increasingly adopted. Consider how easy might it be to create dev/QA environments with production style data using such mechanisms? What about burst workloads? The ability to migrate VMs between public and private cloud would bring further benefits, both as a means to provide offsite backups or move VMs between geographies.

Bear in mind there isn’t 20+ years of ecosystem software (or even tribal knowledge perhaps?) in the NoSQL community – unlike in traditional RDBMS. For this reason continual monitoring is a major requirement. The ability to support a floor to ceiling overview of VMs, hypervisor and hardware platform in terms of performance, alerts and events is paramount. We mentioned briefly above how working set size and IO throughput could affect end user experience. So the ability to predict trends in such behaviour means timely decisions about when to scale or shard an application can be made.  No discussion of any DevOps processes is complete without including REST API and/or Powershell automation capabilities. Automation is key in terms of workflow agility, allowing routine tasks to be performed repeatedly with a well understood outcome. Dev/QA environments can benefit greatly from the features already described. In addition, via the API, developers can build self-service portal software allowing them to spin up new environments in a matter of minutes.

In previous roles I worked with customers running UNIX based failover clusters protecting traditional SQL RDBMS and ERP software. Think Solaris and SUN Cluster, underpinning Oracle and SAP installs.  While running this kind of ‘Big Iron’ was considered ‘state of the art’. Coming up fast on the inside was ‘Big Data’ and with it a complete rethink on how to achieve massive scale. Traditionally, systems had scaled vertically by adding more CPU and RAM to the host platform, and horizontally by adding system boards to a midframe chassis. This came at a price and often a staggering level of administrative complexity. While Web-Scale technologies may not have completely replaced this approach yet, large scale big iron systems will continue to become more niche as time goes on in my opinion.

So, coming back to the beginning of this post. HCI is not only about scaling just to support Big Data workloads, it’s also about creating lower time to value and radical ease of use synergies with the application that sits on top of the stack. Having a HCI platform designed from the ground up with the same underlying principles as modern Web-Scale applications, means we are able to remove the operational delays and complexity that tend to act as drag anchors in today’s rapid deployment environments. IT departments are then free to focus on innovations that help the business succeed.

Using Nutanix snapshots to backup MongoDB

In an earlier post I described how Nutanix clones can speed up deployment workflows for MongoDB replica sets. In this post we will cover the Nutanix VM-centric snapshot functionality to backup a MongoDB database instance. Even when the database is sharded, the ability to take a snapshot of a set of VMs at once (as part of a consistency group) ensures that a point in time (PIT) backup can easily be taken.

Let’s first consider the 3 member replica set as described in my last post. In order to take an OS level backup we would need to quiesce the I/O to one of the secondary members of the replica set. The command sequence below is for MMAPv1 storage engine only. The db.fsyncLock() flushes all pending writes and locks the instance for further writes until we run db.fsyncUnlock(). Even though we have journaling enabled, we have separated the I/O for data and journal files to different volumes (as part of our standard config). Hence the need to quiesce I/O to make sure we take a consistent snapshot.

rs01:SECONDARY> db.fsyncLock()
 "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
 "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
 "ok" : 1

Nutanix snapshots work at the VM level. This means we take a snapshot of all the vDisks in the VM hosting a MongoDB secondary at the same time. However, MongoDB can’t guarantee backup consistency if the journal and data are located in separate volumes. Hence, we still need db.fsyncLock()/db.fsyncUnlock().

<acropolis> vm.snapshot_create mongodb03
SnapshotCreate: complete

<acropolis> vm.snapshot_create mongodb03 snapshot_name_list=mongodb03-snap1
SnapshotCreate: complete

<acropolis> vm.snapshot_list mongodb03
Snapshot name Snapshot UUID
mongodb03-snap1 41c77ddc-8cc7-49bf-a250-23d52031b76e
mongodb03_2015-09-08T10:04:00.805780 7cb77f7f-1eb6-4103-8909-e8e50c318151

Above we have taken two snapshots (by way of example) and show the naming conventions returned for both a named snapshot and the default. See the rest of the workflow below…

<acropolis> vm.clone mongodb04 clone_from_snapshot=mongodb03-snap1
mongodb03-backup: complete

<acropolis> vm.restore mongodb03 mongodb03-snap1
VmRestore: complete

<acropolis> snapshot.delete mongodb03-snap1
Delete 1 snapshots? (yes/no) y
Please type 'yes' or 'no': yes
mongodb03-snap1: complete

We can carry out the same workflow via Nutanix Prism. I have taken a screenshot of the available commands (see below) in the VM Snapshots frame. The command sequence is issued via the  VM tab in the GUI. The desired CLI arguments such as clone and snapshot names for example, are entered in the resulting popups. For reasons of brevity I have not included all the required screenshots.


Having taken our snapshot, we can now unlock the replica – make sure you have kept the session where you called the db.fsyncLock() command open until this time.

rs01:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }

Any newly created clones from the Nutanix VM snapshot can be used to seed a test environment with production strength data. Bearing in mind that the backup/clone was made from a VM that formed part of a replica set, we would like to be able to start the MongoDB instance in standalone mode. You can do this as follows ….

  • power off the newly created clone (mongodb04)
  • Login and create a sub-directory in the MongoDB data directory in order to save the local db files to…
[root@mongodb04 data]# mkdir local-old
[root@mongodb04 data]# mv local.* local-old/
  • edit /etc/mongod.conf : comment out the line naming the replica set and make sure the bind_ip reflects the IP address assigned to the VM (via DHCP in my case)
[mongod@mongodb04 ~]$ ip a | grep eth0
2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 inet brd scope global eth0

[mongod@mongodb04 ~]$ egrep "bind_ip|replSet" /etc/mongod.conf
  • Start up the mongod (now as a standalone) instance with the newly edited configuration file
  • Drop into a mongo shell. You can then verify the consistency of the copied data (in my example I am using a synthetic database created by the YCSB benchmark software) …
[mongod@mongodb04 ~]$ mongo
MongoDB shell version: 3.0.3
connecting to: test
> show dbs
enron_mail 3.952GB
local 0.078GB
ycsb 207.853GB
> use ycsb
switched to db ycsb
> show collections
> db.usertable.findOne()
 "_id" : "user6284781860667377211",
 "field5" : BinData(0,"PTo4JCo4JiskPDU2PiIzPikmMycrNSQjJzo3NiEqJjImIyY0PSE5KS43ICM4Liw+MCclLyE1JCA3PDM3OCsoMz0nKD8rISYrKCw8Pik4JT0jIyYuKDkoOy4gJCkoPSMkOywnNA=="),
 "field4" : BinData(0,"PyI/NSIqITEhKzQxPTwlNyIzJy0zLz05IzUvLDkwLi0uNjQwKysnMSQ5Lj8hMDozPDo+JTg/IDw6NTI9PzAkNjIjLTw8OyYvOTsgLTM2IjcvPCk4Ij84Ny45LiYhJC0mKT0qKw=="),
 "field3" : BinData(0,"MTkwLj0sPTYtPTg7OTUmIz0xJSM3OzM+Kyw3PyckPDYnMyotMC84JDIjMzEnITktJi4mMz0rLTkzODkoPCUkMT4pJzEpKiM8JDUlKjUxLiY5PzIxIiM0OCgpLTUpLyQ1JCogLw=="),
 "field2" : BinData(0,"Pz4jNTcnLDgwKS04ISc7Oi01IDkrNDclPTsuMyElOSU/KywvMzY6PDU0Jyo3NyYjKig8NzMrOTs7LiQiMy0hKioiPiMkPTgyLDAyPSoxJS4tOiUyLzE9Pz8rJy0zPjo2PCszNA=="),
 "field9" : BinData(0,"KTknMSs2PyEiMjUiPz4qKiM9MS0iIj4vMyMyNCsjPSUkJyYsJz0nLD0/NSY3MyQ7KCo/OikiNyU3MjQ8MCArLj8wMyM0MiAgJiUzOCc5KjYiJTw8Izw+OzA7NCAiPzA5ISYxOg=="),
 "field8" : BinData(0,"KCgoLTQ4ICE0IDY0ODAyIS86LiYrKzQmIyIpJy0nKz0gICYpMTQsLT86PyAiKjs/Kzw3MSciJDYkJTssLiMsLiUtLzMtOiYpND8nMCQpOzopMyQ1OyYsNy8uPysxNyg1PCQ6MA=="),
 "field7" : BinData(0,"ITM1KiUmKCcnKjA8IT8xPCIjKzc4Ijs+IyYhICMlKCsjPy09MygxKTohPDgjLjIzMzwpPj45Oyg2JTgsLjYlJS0xPCc+IiolKDw0JiQkLyg4NiohID0xPjIyMic9KDgqKD0zKg=="),
 "field6" : BinData(0,"IjovMTotODUxPD0mPikqLyo4IzonKjc+LDwuLCEhICghOzYxJTw5PiAzIzktNjA9OzUqNiovMTszMDMqLDQmNjk8IDA0NjAvPz8mPygnITU+OSY8JCIzLDAgOSEmNyA1KCQ+Jw=="),
 "field1" : BinData(0,"ID0sMjg8JiAiKz4gITYjNCkxLSs+IC8/JTAiNCA8PignMTEjNyogKyU0Lz4lNz0mIzohNCQ8LSwhICw/OT82PCYyPjk2KSElIDo5MSU2Kig9NiUnOjwpLi0vOjIqJz0sKDExOw=="),
 "field0" : BinData(0,"OikgJykuOTchIjUlISghLCkwOis/Pyw4ISo7Jyw6Kz46NTwlODIsKDQqLT0uNi0rOTcsOzA4Lzo3PDcyMjozLiItKSY1JSsoLSIkPio7KC4xOzs7MzY7Oj0nOzU+IyEuJDMwJA==")

The VM can now be easily migrated anywhere across the Nutanix cluster using the features of the Nutanix Distributed filesystem (NDFS). If additional availability is required then we can clone “blank” gold image VMs and form a new replica set. To avoid confusion the naming convention of the new replica set(s) can reflect the intended use case perhaps : dev, test, QA etc.