Tag Archives: Provisioning

Nutanix: Cloud-like DevOps powering NoSQL for BigData

The popularity of NoSQL has increasingly come about as developers want to use the same in-memory data structures in their applications and have them map directly into a database persistence layer. For example, storing data in XML or JSON format is often hierarchical and potentially does not lend itself to being easily stored in row based tables. It becomes more complicated if the data also contains lists and objects. Not having to convert these in-memory structures into relational database structures is a major advantage in terms of time to value. Such considerations have been made all the more acute by the rise of the web as a platform for services. There’s also an economic aspect, like the prohibitive infrastructure costs required to scale up traditional RDBMS to support high availability etc. Compare this to such Web-Scale or cloud aware apps like NoSQL, which expects to “just drop in” commodity hardware at the infrastructure layer and scale out horizontally on demand.

So if we were to consider the requirements from a modern hyper-converged infrastructure (HCI) that employed the same Web-Scale paradigms used by modern cloud-aware applications. Then to deploy apps, like a NoSQL database for example, the first thing I would want to do is virtualise. This means a right-sized, sandboxed environment (ie a virtual machine) to run individual NoSQL instances. If there was a need to scale up, then it’s a simple case of increasing RAM and CPU. As the application landscape grows over time and starts to scale out, there’s increased need for more nodes/VMs.  Hence, any HCI platform needs cloud like provisioning of nodes. So providing faster time to deploy and time to value. The ability to auto-discover and add new nodes by the click of a button is quite compelling. In short, horizontal scale out needs to be easily undertaken. Say, in the middle of the production day, while running the month end workload?

Intelligent, automated data tiering, locality and balancing via post-process techniques like Mapreduce is another key requirement. As any database working set grows over time, ie: more users will mean more queries, new tables, indexes, aggregations, etc. So the ability to maintain a responsive I/O profile via SSD, as more I/O is periodically obtained from disk, will be key. If all VMs are then able to get local access to their data via SSD from a global storage fabric so much the better. While we are here, consider how you would migrate to a new(er) hardware fleet with and without a distributed storage fabric. Far easier to just drop in units of converged compute/storage and then migrate VMs to it. Compare how that would work with a large white box server estate spread across numerous racks in a DC? There’s yet another aspect of economics to all this. In that auto tiering of the storage layer means the current “working set” data is held at the most performant (and by comparison more expensive) layer. While colder data sits on cheaper spinning disk.

Another advantage of a distributed storage fabric is one of data service features. Take point in time (PIT) backups of sharded DBs, which can sometimes be a complicated issue. In which case, a data service that supports VM centric snapshots of key VMs in a consistency group can avoid another potential pain point. Also, rapid cloning of preconfigured VMs will improve deployment times and speaks to the DevOps workflows that many IT shops have increasingly adopted. Consider how easy might it be to create dev/QA environments with production style data using such mechanisms? What about burst workloads? The ability to migrate VMs between public and private cloud would bring further benefits, both as a means to provide offsite backups or move VMs between geographies.

Bear in mind there isn’t 20+ years of ecosystem software (or even tribal knowledge perhaps?) in the NoSQL community – unlike in traditional RDBMS. For this reason continual monitoring is a major requirement. The ability to support a floor to ceiling overview of VMs, hypervisor and hardware platform in terms of performance, alerts and events is paramount. We mentioned briefly above how working set size and IO throughput could affect end user experience. So the ability to predict trends in such behaviour means timely decisions about when to scale or shard an application can be made.  No discussion of any DevOps processes is complete without including REST API and/or Powershell automation capabilities. Automation is key in terms of workflow agility, allowing routine tasks to be performed repeatedly with a well understood outcome. Dev/QA environments can benefit greatly from the features already described. In addition, via the API, developers can build self-service portal software allowing them to spin up new environments in a matter of minutes.

In previous roles I worked with customers running UNIX based failover clusters protecting traditional SQL RDBMS and ERP software. Think Solaris and SUN Cluster, underpinning Oracle and SAP installs.  While running this kind of ‘Big Iron’ was considered ‘state of the art’. Coming up fast on the inside was ‘Big Data’ and with it a complete rethink on how to achieve massive scale. Traditionally, systems had scaled vertically by adding more CPU and RAM to the host platform, and horizontally by adding system boards to a midframe chassis. This came at a price and often a staggering level of administrative complexity. While Web-Scale technologies may not have completely replaced this approach yet, large scale big iron systems will continue to become more niche as time goes on in my opinion.

So, coming back to the beginning of this post. HCI is not only about scaling just to support Big Data workloads, it’s also about creating lower time to value and radical ease of use synergies with the application that sits on top of the stack. Having a HCI platform designed from the ground up with the same underlying principles as modern Web-Scale applications, means we are able to remove the operational delays and complexity that tend to act as drag anchors in today’s rapid deployment environments. IT departments are then free to focus on innovations that help the business succeed.

Webscalin’ – adding Nutanix nodes

Most modern web-scale applications (NoSQL, Search, Big Data, etc) are achieving massive elastic scale though horizontal scale out techniques. The admins for such apps require the ability to add nodes and storage for the required scale out without interruption to service. The workflow for adding a node to a Nutanix cluster allows such seamless addition, without any of the complex storage operations such as multipathing, zoning/masking, etc. A node is simply added to the chassis, the autodiscovery service detects the new node and the user is then simply asked to push a button to complete the process. The following are some screenshots of the prescribed workflow…

Connect to the nodes lights out management or IPMI webapp via a browser (enter the IPMI address) and login using the ADMIN credentials. You may need enable java im yor browser and configure java to allow the IPMI address.

After inserting the new node into the chassis slot, connect to the nodes lights out management or IPMI webapp via a browser (enter the IPMI address) and login using the ADMIN credentials. You may need enable Java in your browser and configure Java to allow the IPMI address.

Launch the remote console to access the Hypervisor

Launch the Console to enable remote access the Hypervisor.

Using the menu bar power on the node (if needed) otherwise login and configure network addressing.

Using the ‘Power Control’ drop down on the Menu bar across the top of the frame- Power On the node (if needed). You can at this point set up any L2 networking such vlan tagging etc.

Select 'Expand Cluster' from the right drop down menus in the Prism GUI. The node should be auto-discovered.

Select ‘Expand Cluster’ from the right drop down menus in the Prism GUI. The node should be auto-discovered.

Configure the required network addresses and select 'Save' to add the node to the cluster.

Configure the required network addresses and select ‘Save’ to add the node to the cluster.

The progress of the node addition can be monitored in the Prism GUI. Note that the hypervisor was automatically upgraded in order to maintain the same software functionality across the cluster nodes.

The progress of the node addition can be monitored in the Prism GUI. Note that the hypervisor was automatically upgraded in order to maintain the same software functionality across the cluster nodes.

That’s it, once the node is added and the metadata is re-balanced across all the nodes,  then the new nodes storage (HDD/SSD) is added to the storage pool with the rest of the cluster nodes. At which point all containers (datastores) are automatically mounted onto the newly added host and the new host is ready to receive guests! This kind of ease of use story is becoming paramount in terms of  time to value for many webscale applications. Its all well and good having applications on top of NoSQL DBs that allow for rapid development and deployment. However, if the upfront planning for the underlying architecture holds everything back for days if not weeks, then modern DevOps style operations are much harder to achieve..