Tag Archives: Invisible Infrastructure

ELK on Nutanix : Kibana

It might seem like I am doing things out of sequence by looking at the visualisation layer of the ELK stack next. However, recall in my original post , that I wanted to build sets  of unreplicated indexes and then use Logstash to fire test workloads at them. Hence, I am covering Elasticsearch and Kibana initially. This brings me to another technical point that I need to cover. In order for a single set of indexes to be actually recoverable, when running on a single node, we need to invoke the following parameters in our Elasticsearch playbook :

So in file: roles/elastic/vars/main.yml
...
elasticsearch_gateway.recover_after_nodes: 1
elasticsearch_gateway.recover_after_time: 5m
elasticsearch_gateway.expected_nodes: 1
...

These are then set in the elasticsearch.yml.j2 file as follows:

# file: roles/elastic/templates/elasticsearch.yml.j2
#{{ ansible_managed }}

...

# Allow recovery process after N nodes in a cluster are up:
#
#gateway.recover_after_nodes: 2
{% if elasticsearch_gateway_recover_after_nodes is defined %}
gateway.recover_after_nodes : {{ elasticsearch_gateway_recover_after_nodes}}
{% endif %}

and so on ....

This allows the indexes to be recovered when there is only a single node in the cluster. See below for the state of my indexes after a reboot:

[root@elkhost01 elasticsearch]# curl -XGET http://localhost:9200/_cluster/health?pretty
{
 "cluster_name" : "nx-elastic",
 "status" : "yellow",
 "timed_out" : false,
 "number_of_nodes" : 1,
 "number_of_data_nodes" : 1,
 "active_primary_shards" : 4,
 "active_shards" : 4,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 4,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

Lets now look at the Kibana playbook I am attempting. Unfortunately, Kibana is distributed as a compressed tar archive. This means that the yum or dnf modules are no help here. There is however a very useful unarchive module, but first we need to download the tar bundle using get_url as follows :

- name: download kibana tar file
 get_url: url=https://download.elasticsearch.org/kibana/kibana/kibana-{{ kibana_version }}-linux-x64.tar.gz
 dest=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz mode=755
 tags: kibana

I initially tried unarchiving the Kibana bundle into /tmp. I then intended to copy everything below the version specific directory (/tmp/kibana-4.0.1-linux-x64) into the Ansible created /opt/kibana directory. This proved problematic as neither the synchronize nor the copy modules seemed setup to do mass copy/transfer between one directory structure to another. Maybe I am just not getting it – I even tried using with_item loops but no joy as fileglobs are not recursive. Answers on a postcard are always appreciated? In the end I just did this :

- name: create kibana directory
 become: true
 file: owner=kibana group=kibana path=/opt/kibana state=directory
 tags: kibana

- name: extract kibana tar file
 become: true
 unarchive: src=/tmp/kibana-{{ kibana_version }}-linux-x64.tar.gz dest=/opt/kibana copy=no
 tags: kibana

The next thing to do was to create a systemd service unit. There isn’t one for Kibana as there is no rpm package available. Usual templating applies here :

- name: install kibana as systemd service
 become: true
 template: src=kibana4.service.j2 dest=/etc/systemd/system/kibana4.service owner=root \
           group=root mode=0644
 notify:
 - restart kibana
 tags: kibana

And the service unit file looked like:

[ansible@ansible-host01 templates]$ cat kibana4.service.j2
{{ ansible_managed }}

[Service]
ExecStart=/opt/kibana/kibana-{{ kibana_version }}-linux-x64/bin/kibana
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=kibana4
User=root
Group=root
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

This all seemed to work as I could now access Kibana via my browser. No indexes yet of course :

kibana_initial_install

There are one or two plays I would like still like to document. Firstly, the ‘notify’ actions in some of the plays. These are used to call – in my case – the restart handlers. Which in turn causes the service in question to be restarted – see the next section :

# file: roles/kibana/handlers

- name: restart kibana
 become: true
 service: name=kibana state=restarted

I wanted to document this next feature simply because it’s so useful – tags. I have assigned a tag to every play/task in the playbook so far you will have noticed. For testing purposes they allow you to run specific plays. You can then troubleshoot just that particular play and see what’s going on.

 ansible-playbook -i ./production site.yml --tags "kibana" --ask-sudo-pass

Now that I have the basic plays to get my Elasticsearch and Kibana services up and running via Ansible, it’s time to start looking at Logstash. Next time I post on ELK type stuff, I will try to look at logging and search use cases. Once I crack how they work of course.

Using Ansible to deploy ELK stack on Nutanix

Just recently my colleague Andrew Nelson (@vmwnelson) posted an article on setting up Ansible on the Nutanix platform. I am also using Ansible to develop playbooks and the like to deploy the ELK stack components (Elasticsearch-Logstash-Kibana) on a block here at Nutanix. My initial aim is to setup a single index in an Elasticsearch (single node for now) cluster and use Logstash to pipe in data to be indexed. On top of that I intend to use Kibana and the Marvel plugin to measure at which point my index begins to struggle (based on stuff like OS level resource consumption, etc) as viewed from Marvel.

From a virtual machine perspective I have a Fedora 22 based gold image. From this base image I clone one VM to be the Ansible master that I will run playbooks (orchestration) from, and another VM which I will deploy my ELK stack to. This second “target” VM has had 7 vDisks added to it. The idea here being that Elasticsearch (ES) can use a comma separated list of vDisks (in my case I created them as six Linear LVM volumes). These are written to in a round robin fashion by ES and so the data gets “striped”. Nutanix vDisks are already redundant so we are getting a kind of RAID 10 for free! Here’s how my disk layout looks once configured and mounted (I am using XFS as my filesystem) on the target VM:

[root@elkhost01 ~]# df -h
/dev/mapper/esdata05-esdata05 200G 271M 200G 1% /esdata/data05
/dev/mapper/esdata03-esdata03 200G 291M 200G 1% /esdata/data03
/dev/mapper/esdata04-esdata04 200G 273M 200G 1% /esdata/data04
/dev/mapper/esdata02-esdata02 200G 271M 200G 1% /esdata/data02
/dev/mapper/esdata06-esdata06 200G 291M 200G 1% /esdata/data06
/dev/mapper/eslog-eslog 100G 150M 100G 1% /var/log/elasticsearch
/dev/mapper/esdata01-esdata01 200G 279M 200G 1% /esdata/data01

and

[root@elkhost01 ~]# lvs
 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
 esdata01 esdata01 -wi-ao---- 200.00g
 esdata02 esdata02 -wi-ao---- 200.00g
 esdata03 esdata03 -wi-ao---- 200.00g
 esdata04 esdata04 -wi-ao---- 200.00g
 esdata05 esdata05 -wi-ao---- 200.00g
 esdata06 esdata06 -wi-ao---- 200.00g
 eslog eslog -wi-ao---- 100.00g

The next step is to install and configure Ansible. First off, configure an ansible user on both the orchestration host and target host and sync ssh keys between the two – (there’s a module that does ssh key exchange in Ansible and I will cover that at some stage)  – like so:

on both VMs :

useradd ansible
passwd ansible

# generate pub and priv keys ....
ssh-keygen -t rsa

If using strictmodes (default) in sshd_config file 
then ensure correct perms on .ssh directory and files 

chmod 700 ~/.ssh 
chmod 600 ~/.ssh/authorized_keys
[ansible@elkhost01 ~]$ ls -l ~/.ssh
total 12
-rw-------. 1 ansible ansible 404 Oct 1 13:38 authorized_keys
-rw-------. 1 ansible ansible 1675 Oct 1 13:31 id_rsa
-rw-------. 1 ansible ansible 402 Oct 1 13:31 id_rsa.pub

Exchange public keys (copy into remote hosts authorized_keys file) 
for passwordless access

[ansible@ansible-host01 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub 10.68.64.117
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ansible@10.68.64.126's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh '10.68.64.117'"
and check to make sure that only the key(s) you wanted were added.

[ansible@ansible-host01 ~]$ 

[ansible@ansible-host01 ~]$ ssh 10.68.64.117
Last login: Thu Oct 1 13:38:35 2015 from 10.68.64.113
[ansible@elkhost01 ~]$

Once you have passwordless ssh configured between your hosts – go ahead and install Ansible on the orchestration host:

# yum install ansible -y

Once installed, there are a few post install steps and tests to make sure that Ansible is working. First off set up a Ansible hosts inventory file that will eventually contain all the hostnames broken out by deployment type. The default location for this file is  /etc/ansible/hosts. In this instance I have chosen to specify a non standard name/location in order keep my hosts file within my proposed playbook.

[ansible@ansible-host01 elk]$ pwd
/home/ansible/elk
[ansible@ansible-host01 elk]$ cat production
# file: production

[elastic-hosts]
10.68.64.117

[kibana-hosts]
10.68.64.117

[nginx-hosts]
10.68.64.126

And if the passwordless ssh setup is correct – we can test as follows :

[ansible@ansible-host01 elk]$ ansible all -m ping
10.68.64.117 | success >> {
 "changed": false,
 "ping": "pong"
}

Ansible machine configuration is done via playbooks, which are based on YAML syntax. There’s a great best practice guide here. I have followed that same best practice guide on playbook directory layout below …

elk
├── elastic.yml
├── group_vars
├── host_vars
├── kibana.yml
├── production 
├── roles
│   ├── common
│   │   ├── files
│   │   ├── handlers
│   │   ├── tasks
│   │   │   └── main.yml
│   │   ├── templates
│   │   └── vars
│   │   └── main.yml
│   ├── elastic
│   │   ├── files
│   │   │   └── elasticsearch.repo
│   │   ├── handlers
│   │   │   └── main.yml
│   │   ├── tasks
│   │   │   └── main.yml
│   │   ├── templates
│   │   │   ├── elasticsearch.default.j2
│   │   │   ├── elasticsearch.in.sh.j2
│   │   │   └── elasticsearch.yml.j2
│   │   └── vars
│   │   └── main.yml
│  └-- kibana
│      ├── files
│      ├── handlers
│      │   └── main.yml
│      ├── tasks
│      │   └── main.yml
│      ├── templates
│      │   └── kibana4.service.j2
│      └── vars
│      └── main.yml
├── site.yml

I am going to cover the individual roles for elasticsearch, logstash and kibana in subsequent posts. For now there’s a main site wide playbook :

[ansible@ansible-host01 elk]$ cat site.yml
---
# file: site.yml
- include: elastic.yml
- include: kibana.yml
- include: logstash.yml
#- include: log-forwarder.yml
#- include: redis.yml
#- include: nginx.yml

Which is then broken up into individual service specific playbooks :

[ansible@ansible-host01 elk]$ cat elastic.yml
---
#file: elastic.yml
- hosts: elastic-hosts
 roles:
 - common
 - elastic
[ansible@ansible-host01 elk]$ cat kibana.yml
---
#file: kibana.yml
- hosts: kibana-hosts
 roles:
 - kibana

I will discuss the individual roles and their associated tasks etc next time. For now this should be enough to get basic Ansible functionality going.

Sharded MongoDB config in Nutanix (3) : Backup & DR

Backing up sharded NoSQL databases can often require some additional consideration.  For example, any backup of a sharded MongoDB config needs to capture a backup for each shard and a single member of the configuration database quorum. The configuration database (configdb) holds the cluster metadata and so supports the ability to shard.  In a production environment you will need three config databases and they will all contain the same (meta)data. In this post I intend to cover the steps I recently used to backup a sharded MongoDB deployment using the snapshot technology available on my Nutanix platform.

First step prior to any backup should always be to stop the balancer. The balancer is responsible for migrating/balancing data “chunks” between the various shards. If such a migration is running while backing up then the resultant backup is potentially invalidated.

mongos> use config
switched to db config
mongos> sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Waiting again for active hosts after balancer is off...
mongos>

At which point we can proceed to lock one of the secondary replicas in each shard. I outlined how to do this in my post relating to backing up replica sets. The command sequence is repeated below, note that this needs to be done on one secondary for each shard (and should only be done if running MMAPv1 storage engine on the replica):

rs01:SECONDARY> db.fsyncLock()
{
 "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
 "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
 "ok" : 1

Having locked the secondaries for writes, the next step is to create a virtual machine (VM) snapshot of a configdb and of a secondary belonging to each shard (replica set). Using the Nutanix Acropolis App Mobility Fabric as follows :

<acropolis> vm.snapshot_create mongo-configdb01,mongodb03,mongowt03 snapshot_name_list=mongoconfigdb01-bk,mongodb02-bk,mongowt03-bk
SnapshotCreate: complete

The above snapshots have all been created at once within a single consistency group. The next step will be to create clones from them…

<acropolis> vm.clone configdb01-clone clone_from_snapshot=mongoconfigdb01-bk
configdb01-clone: complete
<acropolis> vm.clone mongodb03-clone clone_from_snapshot=mongodb03-bk
mongodb03-clone: complete
<acropolis> vm.clone mongowt03-clone clone_from_snapshot=mongowt03-bk
mongowt03-clone: complete

At this point we can unlock each of the secondaries :

rs01:SECONDARY> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
rs01:SECONDARY>

and re-enable the balancer:

mongos> use config
switched to db config
mongos> sh.setBalancerState(true)
mongos>

As of now I merely have the “bare bones” of a MongoDB cluster encapsulated in the three VM clones just created. The thing to bear in mind is that each clone generated from the replica snapshots contains only a subset of any sharded collection. Hopefully, ~50% each, if our shard key selection is any good! That means we can’t just proceed as in previous posts and bring up each clone as a standalone MongoDB instance. The simplest way to make use of the current clones might be to just rsync any data to new hosts in a freshly sharded deployment. So essentially, we would just transfer the data to the required volumes on the newly set up VMs. In any case, there would still be some work to do around the replica set memberships and associated config.

Alternatively, to have access to any sharded collection held in my newly created clones above. I could begin by reconfiguring each replica clone as the new primary in the replica set and create additional configdb VMs that can be registered with a new mongos VM. Recall that mongos is stateless, and gets its info from the configdbs. At which stage we can re-register the replica shards within the configdb service. For example, here’s the state of the replica sets after they have been cloned:

> rs.status()
{
 "state" : 10,
 "stateStr" : "REMOVED",
 "uptime" : 97,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "ok" : 0,
 "errmsg" : "Our replica set config is invalid or we are not a member of it",
 "code" : 93
}
> rs.conf()
{
 "_id" : "rs01",
 "version" : 7,
 "members" : [
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 1,
 "host" : "10.68.64.131:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 },
 {
 "_id" : 2,
 "host" : "10.68.64.144:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {

 },
 "slaveDelay" : 0,
 "votes" : 1
 }
 ],
 "settings" : {
 "chainingAllowed" : true,
 "heartbeatTimeoutSecs" : 10,
 "getLastErrorModes" : {

 },
 "getLastErrorDefaults" : {
 "w" : 1,
 "wtimeout" : 0
 }
 }
}

So first off we need to set each cloned replica VM as the new replica set primary and remove the no longer required (or available) hosts from the set membership :

> cfg=rs.conf()
> printjson(cfg) 
> cfg.members = [cfg.members[0]]
[
 {
 "_id" : 0,
 "host" : "10.68.64.111:27017",
 "arbiterOnly" : false,
 "buildIndexes" : true,
 "hidden" : false,
 "priority" : 1,
 "tags" : {
 },
 "slaveDelay" : 0,
 "votes" : 1
 }
]
 
> cfg.members[0].host="10.68.64.153:27017"
10.68.64.153:27017

> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }

rs01:PRIMARY> rs.status()
{
 "set" : "rs01",
 "date" : ISODate("2015-10-06T14:02:23.263Z"),
 "myState" : 1,
 "members" : [
 {
 "_id" : 0,
 "name" : "10.68.64.152:27017",
 "health" : 1,
 "state" : 1,
 "stateStr" : "PRIMARY",
 "uptime" : 396,
 "optime" : Timestamp(1443441939, 1),
 "optimeDate" : ISODate("2015-09-28T12:05:39Z"),
 "electionTime" : Timestamp(1444140137, 1),
 "electionDate" : ISODate("2015-10-06T14:02:17Z"),
 "configVersion" : 97194,
 "self" : true
 }
 ],
 "ok" : 1
}

Once you have done this for all the required replica sets (these are your shards dont forget), the next step is to set up the configdb clone and create additional identical VMs that will contain the cluster metadata. The configdbs can be verified for correctness as follows :

configsvr> db.runCommand("dbhash")
{
 "numCollections" : 14,
 "host" : "localhost.localdomain:27019",
 "collections" : {
 "actionlog" : "bd8d8c2425e669fbc55114af1fa4df97",
 "changelog" : "fcb8ee4ce763a620ac93c5e6b7562eda",
 "chunks" : "bd7a2c0f62805fa176c6668f12999277",
 "collections" : "f8b0074495fc68b64c385bf444e4cc90",
 "databases" : "c9ee555dde6fc84a7bbdb64b74ef19bd",
 "lockpings" : "ba67ca64d12fd36f8b35a54e167649a8",
 "locks" : "c226b1a2601cf3e61ba45aeab146663d",
 "mongos" : "690326c2edcb410eeeb9212ad7c6c269",
 "settings" : "ce32ef7c2b99ca137c5a20ea477062f7",
 "shards" : "77d49755ba04fe38639c5c18ee5be78d",
 "tags" : "d41d8cd98f00b204e9800998ecf8427e",
 "version" : "14e1d35ba0d32a5ff393ddc7f16125a1"
 },
 "md5" : "61bde8ac240aead03080f4dde3ec2932",
 "timeMillis" : 43,
 "fromCache" : [ ],
 "ok" : 1
}

The above hashes in bold need to agree across the configdb membership. They are key to having all configdb servers in agreement. Once you have the configdbs enabled, then register them with a newly created mongos VM. Below, I am just using a single configdb to test for correctness. A production setup should always have three per cluster:

 mongos --configdb 10.68.64.151:27019

The next issue will be to correct the configdb shard info.  So as you can see from the mongos session below, the replica info in the configdb is still referring to the previous deployment:

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "rs01/10.68.64.111:27017,10.68.64.131:27017,10.68.64.144:27017"
 },
 {
 "_id" : "rs02",
 "host" : "rs02/10.68.64.110:27017,10.68.64.114:27017,10.68.64.137:27017"
 }
 ],
 "ok" : 1
}

We can correct the above setup to reflect our newly cloned shard/replica VMs. In a mongo shell session on the configdb server VM.  :

use config
configsvr> db.shards.update({_id: "rs01"} , {$set: {"host" : "10.68.64.152:27017"}})
configsvr> db.shards.update({_id: "rs02"} , {$set: {"host" : "10.68.64.153:27017"}})

You will have to restart the mongos server so that it picks up the new info from the configdb server.

mongos> db.adminCommand( { listShards: 1 } )
{
 "shards" : [
 {
 "_id" : "rs01",
 "host" : "10.68.64.152:27017"
 },
 {
 "_id" : "rs02",
 "host" : "10.68.64.153:27017"
 }
 ],
 "ok" : 1

And that, as they say, is how babies get made. At this stage you have a MongoDB cluster consisting of a configdb, registered with a mongos server, that can access both shards, formed of a replica set, formed of a single primary member. To flesh this out to production standards you could increase the configdb count (to 3) and add secondaries to the replica sets for higher availability. With some additional work perhaps (ie : renaming replica sets ?) this could form the basis of a Dev/QA system, containing a potential production workload.

Sharded MongoDB config on Nutanix (2) : High Availability

One of the prime availability considerations for any horizontal scale out application, like a MongoDB cluster, is how that cluster behaves under a failure event. We have seen (in the MongoDB case) how replica sets are configured with additional secondary instances to handle the failure of a primary instance in a replica set. We also create a mini quorum of configuration database servers and query routers to give redundancy to the cluster “infrastructure”. However, the Nutanix XCP environment provides further protection through certain features of the Acropolis management interface. Your key VMs need to be enabled to run under high availability. This is so that when the underlying hypervisor host fails for any reason, these VM’s failover to another host with sufficient CPU and RAM resources. The screenshot below shows how this (Tech Preview) feature can be enabled (pre NOS 4.5) on a per-VM basis :

enable-HA

The underlying migration functionality is also used for the manual placement of key VMs. As an example, let’s consider the following layout, where two of the configdb VMs in a MongoDB cluster are co-located on the same AHV host:

mongodb-colocated-vms

Notice in the screen capture above, there are two configdb VMs on host “D”. This means that ideally we want to migrate a MongoDB Config DB to another AHV host. Let’s move the VM mongo-configdb02 to AHV host “C”…

mongodb-migrate-VM

Note that the migration process could have automatically chosen an appropriate AHV host to receive the VM. In the above case however, we have instead specified the desired host ourselves.

We can monitor the progress and duration of any migration via the VM tasks frame in Prism:

mongodb-vm-tasks-migration

As always, this workflow can also be done manually (or scripted) through the acli interface. In this example I am migrating the VM running a query router (mongos process)….

<acropolis> vm.migrate mongos01 host=10.68.64.41 
mongos01: complete 
<acropolis>

As of the time of writing this post. Acropolis Base Software (NOS) 4.5 has been released and this feature has become part of general availability (GA). It can now be enabled cluster wide:

ha-enable-menu-4

enable-ha-4

Nutanix customers are strongly recommended to enable this feature when they require HA functionality for their VMs.

In my next post I will be completing this short blog series on sharded MongoDB configs on Nutanix. I intend to cover how Nutanix Acropolis managed snapshots and cloning are employed to create backups and then use them to perform rapid build out of potential dev/QA type environments. Stay tuned.

 

 

 

That ‘One Click’ upgrade again, in full

One way of demonstrating the concept of ‘Invisible Infrastructure’ is the ability to complete a full system upgrade with minimal service interruption. In this post I will show the “One Click” upgrade facility that’s available on the Nutanix platform.  This facility allows the admin to upgrade the Nutanix Operating System (NOS), the hypervisor, any required storage firmware and appropriate version of  Nutanix Cluster Check (NCC) for the target NOS release.

You can choose to either upload the NOS upgrade tarball or have it automatically downloaded to a landing area. Just check the Enable Automatic downloads box. Here I am uploading the software to the platform.

You can choose to either upload the NOS upgrade tarball or have it automatically downloaded to a landing area. Just check the Enable Automatic Download box. Here I am uploading the software to the platform.

Similar to the NOS version, the hypervisor can also be upgraded to a newer version when available.

Similar to the NOS version, the hypervisor can also be upgraded to a newer version when available.

You can either select to run the preupgrade checks standalone without performing an upgrade or just select to upgrade directly, in which case the checks are run prior.

You can either select to run the preupgrade checks standalone without performing an upgrade or just select to upgrade directly, those same checks will be run before the start of the upgrade in any case.

Selecting upgrade will show the progress of the various stages of the upgrade as they occur.

Selecting upgrade will show the progress of the various stages of the upgrade as they occur. CVMs are upgraded sequentially and only one CVM is rebooted at a time. A CVM is always back in the cluster membership before the next CVM is restarted.

kvm-preupgrade

You can choose to upgrade the underlying hypervisor as well at this stage.

You can choose to upgrade the underlying hypervisor as well at this stage.

As always you can monitor progress in the Prism main window. Here we see the upgrade process has been completed successfully.

As always you can check progress in the Prism main window. Here we see the upgrade process has completed successfully.

kvm-upgrade-events

Nutanix Prism also shows the individual task info ie task stage, CVM/host involved, time taken etc.

Nutanix Prism also shows the individual task info ie task stage, CVM/host involved, time taken etc.

The Nutanix platform upgrade takes care of all the intermediate steps and just works, regardless of the size of the cluster. There’s minimal impact and disruption as the upgrade takes place and it enables you to carry out such tasks within normal working hours, and not losing a weekend to the usual rigours of a traditional hardware upgrade cycle.