Category Archives: Big Data

Automating S3 compliant Object stores via Nutanix Objects API

As part of an API first strategy within the company, the Objects team at Nutanix has developed a REST API to enable the automated creation, deletion, management and monitoring of S3 compliant Object stores. I was fortunate to be given early access to the developing API. As part of this preview work, I have been looking at how to use CALM’s built-in support for “chaining” REST calls together, in order to build a JSON payload that creates an object store via its API. 

POST /objectstores

Let’s take a brief look at a subset of the Objects API. In order to create our objectstore, we need to make several intermediate calls to the standard v3 API. These calls are used to obtain (for example) reference UUIDs from entities like the underlying Nutanix cluster or required networks. The image below shows how the desired objectstore payload is pre-populated using macro variables that are either entered as part of the initial CALM blueprint configuration – @@{objectstore_name}@@ or generated from CALM tasks that pass in a variable at runtime – @@{CLUSTER}@@. We’ll discuss the latter shortly.

The Objects API (OSS) is accessed via a Prism Central (PC) endpoint. Notice the Objects API endpoint URL, where @@{address}@@ defines the PC IP address.


The REST call to create the objectstore is then handled by the CALM provided URL request function, urlreq(). The underlying call is still made via the Python requests module however. See below for how it was used in this scenario. More details on the various supported CALM functions can be found on the Nutanix documentation portal

Task type: Set Variable

Let’s look at how we generate the various saved UUIDs and other required entities, in order to pass them around our code. Recall that such entities are used to build the final JSON payload for the objectstore creation step we have already covered above. CALM provides a task framework that performs various functions. For example, to run a script or some Python code. There’s also a task option that results in the setting of a required variable. Once such a variable is created or set, it is then available to all other tasks. The next image below shows how we configure a task to set a variable.

Application profile : Objects

On the left hand pane in the above image, you will see an Application profile, entitled Objects. This profile gives me a set of default actions for my object store, such as Create, Start, Restart, etc. It also allows the creation of custom actions. We will look at REST_Create as an example of a custom action. From the list of tasks associated with REST_Create in the central canvas, we have an a task entitled, GetClusterUUID. The right hand pane shows how this task is configured. Note the task type is “Set Variable”. We also run a Python request, in the Script canvas. This populates an Output variable entitled CLUSTER. CLUSTER contains the Nutanix cluster UUID. We can see how this works in a little more detail below.


First, we set the credentials for Prism Central access. How credentials get set up in this kind of configuration, will be discussed later in the post. Next step is to populate the REST headers, URL and the JSON payload. Payload here is empty, but you can choose to either limit the number of clusters returned or use pagination if preferred. Pagination will require additional coding however.

We cycle through the response content of cluster entities looking for a match against our supplied cluster name – @@{cluster_name}@@. If found, we have guardrail code that ensures we only proceed if both hypervisor and version of AOS are supported. We do this in the GetClusterUUID task as its the first call we make. In doing so we exit as early as possible if we find a problem.

The matching cluster UUID from the response is saved into the CLUSTER variable. This UUID is then available to other tasks in the blueprint. Similar patterns are repeated in the tasks GetInfraNetUUID and GetClientNetUUID. Both tasks populate a variable with their respective network references (UUIDs). These variables are both used in the CreateObjectstore task, covered above. Without going into too much architecture detail, the Objects feature set is built on a microservice architecture. The networks mentioned are required for the internal Kubernetes inter-node/pod communication.

CALM Service

I will quickly go over the creation of the required Objects_Store service in CALM. This will cover the previously only mentioned credentials setup and so on. I think the image below is fairly self explanatory. It shows how to configure a blueprint to run against the incumbent Prism Central instance, and deploy the application (in our case an Object store) on an existing cluster infrastructure.

The CALM blueprint discussed here for automated Object store creation is available here (in its current form):

As the API develops towards General Availability, I hope to add more functionality to the blueprint (DELETE, Replace Certs, and so on). For now, here’s a quick run through of how the blueprint deploys the Objectstore via API. The image below shows the running application after the blueprint is launched.

The objectstore is then “managed” via the now provisioned application. To then create an objectstore according to the options set at the blueprint launch, we run the custom actions we previously created. Select first the Manage tab and then the REST_Create task

While the objectstore is being created, we can run other tasks that perform API calls that monitor objectstore progress and status. The output from the Audit tab is how ever we decided to format the JSON response in our REST_Status task. For example….

This ties in with exactly what we see in the Prism GUI at that time.

Big Data use case

In addition to the use cases outlined below, I am interested in investigating how Nutanix Objects  will play in the Big Data space. In particular, how Objects can be used to create standby environments for an Hadoop ecosystem. Ideally in another location. This is something that usually requires a large amount of work. Using Objects there’s the potential to de-risk the data lake replication part to a large extent. I hope to make this investigation a part of our upcoming Hadoop certification work.

Current Use Cases

  • Backup: Consolidate Nutanix and non-Nutanix primary infrastructure.
  • Long Term Retention (e.g.Splunk cold tier, Doc archives,Images/Videos): Cheap & deep,
    with regulatory content retention.
  • DevOps: Enable IT to provide an AWS S3 like service, on-premises, for cloud-native

Let me know if you find the Objects blueprint useful or feel free to share your experience of Nutanix Objects and how we can make things work better.

Elasticsearch Sizing on Nutanix

One node, one index, one shard

The answer to the question : “how big should I size my Elasticsearch VMs and how what kind of performance will I get?”, always comes down to the somewhat disappointing answer of “It depends!?” It depends on the workload – be it index or search heavy, on the type of data being transformed and so on. 

The way to size your Elasticsearch environment is by finding your “unit of scale”, this is the performance characteristics you will get for your workload via a single shard index running in a single Virtual Machine (VM). Once you have a set of numbers for a particular VM config then you can scale throughput etc, via increasing the number of VMs and/or indexes to handle additional workload.

Virtual Machine Settings

The accepted sweet spot for VM sizing an indexing workload is something like 64GB RAM/ 8+ vCPUs. You can of course right size this further where necessary, thanks to virtualisation. I assign just below half the RAM (31GB) to the heap for the Elasticsearch instance. This is to ensure that the JVM uses compressed Ordinary Object Pointers (OOPs) on a 64 bit system. This heap memory also needs to be locked into RAM

# grep -v ^# /etc/elasticsearch/elasticsearch.yml esrally esbench /elastic/data01    # <<< single striped data volume 
bootstrap.memory_lock: true   # <<< lock heap in RAM
http.port: 9200
discovery.zen.minimum_master_nodes: 1  # <<< single node test cluster false

# grep -v ^# /etc/elasticsearch/jvm.options

From the section above , notice the single mount point for the entry. I am using a 6 vdisk LVM stripe. While you can specify per-vdisk mount points in a comma separated list, unless you have enough indices to make sure all the spindles turn (all the time) then you are better off with logical volume management. You can ensure you are using compressed OOPs by checking for the following log entry at startup

[2017-08-07T11:06:16,849][INFO ][o.e.e.NodeEnvironment ] [esrally02] heap size [30.9gb], compressed ordinary object pointers [true]

Operation System Settings

Set the required kernel settings 

# sysctl -p 
vm.swappiness = 0
vm.overcommit_memory = 0
vm.max_map_count = 262144

Ensure file descriptors limits are increased

# ulimit –n 65536


curl –XGET**.max_file_descriptors

Disable swapping, either via the cli or remove swap entries from /etc/fstab

# sudo swapoff –a 

Elasticsearch Bulk Index Tuning

In order to improve indexing rate and increase shard segment size, you can disable refresh interval on an initial load.  Afterwards, setting this to 30s (default=1s) in production means larger segments sizes and potentially less merge pressure at a later date.

curl -X PUT "" -H 'Content-Type: application/json' -d'
    "index" : {
        "refresh_interval" : "-1"

Recall that we only want a single shard index and no replication for our testing. We can achieve this by either disabling replication on the fly or creating a template that configures the desired settings at index creation 

Disable replication globally ...

curl -X PUT "" -H 'Content-Type: application/json' -d '{"index" : {"number_of_replicas" : 0}}’

or create a template - in this case, for a series of index name regex patterns...

# cat template.json
        “index_patterns": [ “ray*”, "elasticlogs”],
        "settings": {
                "number_of_shards": 1,
                "number_of_replicas": 0
curl -s -X PUT "" -H 'Content-Type: application/json' -d @template.json

Elasticsearch Benchmarking tools

esrally is a macrobenchmarking tool for elasticsearch. To install and configure – use the following quickstart guide. Full information is available here :

rally-eventdata-track –  is repository containing a Rally track for simulating event-based data use-cases. The track supports bulk indexing of auto-generated events as well as simulated Kibana queries.

esrally --pipeline=benchmark-only --target-hosts= 
--track=eventdata --track-repository=eventdata --challenge=bulk-size-evaluation
eventdata bulk index - 5000 events/request highlighted @indexing rate of ~50k docs/sec
eventdata bulk index – 5000 events/request highlighted @indexing rate of ~50k docs/sec
httpd logs index test - highlighted @indexing rate ~80k docs/s
httpd logs index test – highlighted @indexing rate ~80k docs/s

Elasticsearch is just one of a great many cloud native applications that can run successfully on Nutanix Enterprise Cloud. I am seeing more and more opportunities to assist our account teams in the sizing and deployment of Elasticsearch. However, unlike other Search and Analytics platforms Elasticsearch has no ready made formula for sizing. This post will hopefully allow people to make a start on their Elasticsearch sizing on Nutanix and, in addition, help identify future steps to improve their performance numbers.

Further Reading

Elasticsearch Reference

Nutanix: Cloud-like DevOps powering NoSQL for BigData

The popularity of NoSQL has increasingly come about as developers want to use the same in-memory data structures in their applications and have them map directly into a database persistence layer. For example, storing data in XML or JSON format is often hierarchical and potentially does not lend itself to being easily stored in row based tables. It becomes more complicated if the data also contains lists and objects. Not having to convert these in-memory structures into relational database structures is a major advantage in terms of time to value. Such considerations have been made all the more acute by the rise of the web as a platform for services. There’s also an economic aspect, like the prohibitive infrastructure costs required to scale up traditional RDBMS to support high availability etc. Compare this to such Web-Scale or cloud aware apps like NoSQL, which expects to “just drop in” commodity hardware at the infrastructure layer and scale out horizontally on demand.

So if we were to consider the requirements from a modern hyper-converged infrastructure (HCI) that employed the same Web-Scale paradigms used by modern cloud-aware applications. Then to deploy apps, like a NoSQL database for example, the first thing I would want to do is virtualise. This means a right-sized, sandboxed environment (ie a virtual machine) to run individual NoSQL instances. If there was a need to scale up, then it’s a simple case of increasing RAM and CPU. As the application landscape grows over time and starts to scale out, there’s increased need for more nodes/VMs.  Hence, any HCI platform needs cloud like provisioning of nodes. So providing faster time to deploy and time to value. The ability to auto-discover and add new nodes by the click of a button is quite compelling. In short, horizontal scale out needs to be easily undertaken. Say, in the middle of the production day, while running the month end workload?

Intelligent, automated data tiering, locality and balancing via post-process techniques like Mapreduce is another key requirement. As any database working set grows over time, ie: more users will mean more queries, new tables, indexes, aggregations, etc. So the ability to maintain a responsive I/O profile via SSD, as more I/O is periodically obtained from disk, will be key. If all VMs are then able to get local access to their data via SSD from a global storage fabric so much the better. While we are here, consider how you would migrate to a new(er) hardware fleet with and without a distributed storage fabric. Far easier to just drop in units of converged compute/storage and then migrate VMs to it. Compare how that would work with a large white box server estate spread across numerous racks in a DC? There’s yet another aspect of economics to all this. In that auto tiering of the storage layer means the current “working set” data is held at the most performant (and by comparison more expensive) layer. While colder data sits on cheaper spinning disk.

Another advantage of a distributed storage fabric is one of data service features. Take point in time (PIT) backups of sharded DBs, which can sometimes be a complicated issue. In which case, a data service that supports VM centric snapshots of key VMs in a consistency group can avoid another potential pain point. Also, rapid cloning of preconfigured VMs will improve deployment times and speaks to the DevOps workflows that many IT shops have increasingly adopted. Consider how easy might it be to create dev/QA environments with production style data using such mechanisms? What about burst workloads? The ability to migrate VMs between public and private cloud would bring further benefits, both as a means to provide offsite backups or move VMs between geographies.

Bear in mind there isn’t 20+ years of ecosystem software (or even tribal knowledge perhaps?) in the NoSQL community – unlike in traditional RDBMS. For this reason continual monitoring is a major requirement. The ability to support a floor to ceiling overview of VMs, hypervisor and hardware platform in terms of performance, alerts and events is paramount. We mentioned briefly above how working set size and IO throughput could affect end user experience. So the ability to predict trends in such behaviour means timely decisions about when to scale or shard an application can be made.  No discussion of any DevOps processes is complete without including REST API and/or Powershell automation capabilities. Automation is key in terms of workflow agility, allowing routine tasks to be performed repeatedly with a well understood outcome. Dev/QA environments can benefit greatly from the features already described. In addition, via the API, developers can build self-service portal software allowing them to spin up new environments in a matter of minutes.

In previous roles I worked with customers running UNIX based failover clusters protecting traditional SQL RDBMS and ERP software. Think Solaris and SUN Cluster, underpinning Oracle and SAP installs.  While running this kind of ‘Big Iron’ was considered ‘state of the art’. Coming up fast on the inside was ‘Big Data’ and with it a complete rethink on how to achieve massive scale. Traditionally, systems had scaled vertically by adding more CPU and RAM to the host platform, and horizontally by adding system boards to a midframe chassis. This came at a price and often a staggering level of administrative complexity. While Web-Scale technologies may not have completely replaced this approach yet, large scale big iron systems will continue to become more niche as time goes on in my opinion.

So, coming back to the beginning of this post. HCI is not only about scaling just to support Big Data workloads, it’s also about creating lower time to value and radical ease of use synergies with the application that sits on top of the stack. Having a HCI platform designed from the ground up with the same underlying principles as modern Web-Scale applications, means we are able to remove the operational delays and complexity that tend to act as drag anchors in today’s rapid deployment environments. IT departments are then free to focus on innovations that help the business succeed.