Veeam One – Free Monitoring and Alerting of Hypervisors Review

Veeam One Monitor – Free Edition – Review

Screen Shot 2014-10-07 at 11.23.40 AM

So with the migration of the majority of our servers to Hyper-V, I’ve been on the lookout for software that will manage and monitor our VM’s and Hyper-V hosts. Microsoft does leave a lot to be desired in the realm of monitoring and alerting of Server 2012 Hyper-V, so when I read a post in www.reddit.com/r/sysadmin that Veeam One has a free version, I took the opportunity to try it out.

Veeam has a good reputation in the backup and replication of VMware systems, so I was hoping to find good support for Hyper-V, which I’m happy to say, I did. After registering for a free account, I began to download a 600MB .iso file. This initially led me to believe I would be running another virtual machine from which I would create using the .iso, similar to PHD Virtual, however, this was not the case; it installs a Fat Client Application (Yay, not another web-based console!) I was easily able to mount the .iso as a virtual DVD drive by right-clicking on the file in Server 2012 and then ran setup. The installer went through a setup for Veeam One and also installed SQL Server 2008 R2 64bit Express edition. I initially had trouble adding Hyper-V hosts into the software until I provided a server name using the FQDN and credentials (servername.domain.root.company.com / domain.root.company.comusername).

Once I was able to Add Servers, the software recognized all of the virtual machines running on each host, as well as populating the Local “storages” of the hosts (both RAID and iSCSI volumes). I continued on and setup Alerting to use our SMTP relay and soon my mailbox was being flooded with alerts. I happened to finish the installation during our scheduled backups so I wasn’t surprised that I was getting Disk Queue Length alerts. However, I was concerned that there were many warnings in regards to Memory Pressure. Hyper-V uses a feature called Dynamic Memory that when a virtual machine is first created, you are given the option of setting a Startup RAM amount in MB the amount of memory you want the machine to have available during start-up. You are also given the option to Enable Dynamic Memory and set a Minimum RAM and a Maximum RAM. In our case I had one of our hosts setup with 2 VMs to NOT use Dynamic memory and were hard-set to use 8GB and then 16GB Startup RAM out of a total available 32GB. Additionally, I had a 3rd VM on the same host set to use 4GB Start-up RAM with Dynamic RAM enabled (512 min/10GB max). This 3rd VM is the one that was complaining about this “Memory Pressure” stuff, and earlier in the month had spontaneously restarted, which prodded me to dig deeper and optimize our Hyper-V setup.

Memory Pressure is described by Veeam as:

“Dynamic Memory determines the amount of memory needed by a virtual machine by calculating memory pressure. To perform this calculation, Hyper-V looks at the total committed memory of the guest operating system running in the virtual machine and then calculates pressure as the ratio of how much memory the virtual machine wants, to how much it has. The higher VM average pressure you have, the more the VM starts paging memory to disk with disastrous effect on the VM’s performance.”

and further with a Resolution:

“Most likely high demand for Memory resources is caused by application workloads inside the VM. To troubleshoot memory pressure check dynamic memory allocation, buffer, and limits for this VM. If there is not enough physical memory available on the host to give the VM it’s requested amount of memory, consider relocating this VM to another host or verify memory usage for existing VMs on the same host.”

By highlighting first the Host with the problem VM, then selecting the memory tab, I found that the Hyper-V Services Memory Consumed average percentage was hovering near 130%.

Screen Shot 2014-10-07 at 11.03.11 AM

You can see that mid-day on the 6th, I ‘moved’ the problematic VM off of the host and the memory pressure was relieved enough to not receive alerts.

A couple things to note during the “Move” of the VM from one Server 2012 Hyper-V host to another, I did have to modify the Processor of the VM and place a checkmark in the Compatibility Configuration to enable the “Migrate to a physical computer with a different processor version”. Additionally, although the live migration or move did complete successfully after the move and the system stayed up and remained in service, after a restart it lost it’s IP Configuration.  The new host had more memory available for the moved VM, and so I allotted more memory to it while the system was shut down. I noticed on power-up that the IP address had changed and so found that the Network Adapter Advanced Features had reverted back to a Dynamic Mac, instead of the hard-coded “Static” MAC Address. After changing this to it’s previous Mac Address, the server picked up the correct IP address and the move was complete.

In the graph above you’ll also notice a dip in memory. This was due to one of our Database servers going down. This is critical VM for our site and it was very beneficial to have Veeam One available to help troubleshoot the problem. First I checked the Server 2012 Event logs and found the following events.

info: 5.52pm Event 18510, Hyper-V-Worker ‘DBServer’ saved successfully. (Virtual machine ID 7DFDBBE8-CD76-4EBC-B015-27FF251BCFA5)

info: 5:52pm Event 16642, Hyper-V Hypervisor Hyper-V successfully deleted a partition (partition 2).

Error 5:51 Event 4096, Hyper-V-Integration-Data-Exchange: ‘DBServer’: The Data Exchange integration service is either not enabled, not running or not initialized. (Virtual machine ID 7DFDBBE8-CD76-4EBC-B015-27FF251BCFA5)

Veeam says:
The VM Configuration is no longer accessible.

Cause: Virtual Machine might not have right permissions on vsconfig.dll file or VMMS service is not started.

Resolution: Check that virtual machine configuration module (also called the “store”) is contained in the vsconfig.dll file an the path to the vsconfig.dll file is valid and has the right permissions. By default the vsconfig.dll file is installed at \windowssystem32. Verify that the VMMS service is in the running state in the Service Control Manager and that the failed operation now succeeds.

I Checked hyperv001, the Hyper-V Virtual Machine Management Service is in the Running/Automatic state. Logged in as Local System. I modified the Recovery options on the service from 1st failure: Restart the Service, 2nd failure: Restart the Service, Subsequent failures: Take no Action to: Restart the Service and Restart service after from 2 minutes to 1 minute.
I spun up the failed VM successfully (without changing any of the VM’s settings) and DB services were restored.I ran a Hyper-V Export successfully on ‘DBServer’ to D:ExportsDBServer to make sure we have a good image of the VM backed up.The HyperV Data Exchange Service is described as: Provides a mechanism to exchange data between the virtual machine and the operating system running.
The Startup Type for the service is: Manual (Trigger Start). This type of data integration seems related to “Integration Services” which is what you install inside a Windows OS with a virtual DVD in order for driver compatibility etc. Also I saw indications that we might need to “modify” NUMA or “extra processor features”.It could be that Linux DBServer doesn’t play well with Hyper-V CPU and/or integrated services and we may need to power-down the VMs, modify the processor features, or NUMA, then power them up again.

I Double-checked NUMA/Compatibility on both WebServer and DBServer:
DBServer:
Compatiblity is Checked, 6processors, 3 NUMA Nodes, 2 Sockets
WebServer:
Compatibility is UnChecked, 8processors, 2 NUMA Nodes, 1 SocketFor the DBServer, I believe we may need to shut down and UnCheck Compatibility option, in effect the opposite of “Dumbing-down” the CPU on the problematic server.
If the services fail again and the DBServer VM shuts down again, I’m going to:
1. Shut down the DBServer
2 . Do another Export of the VM so as to get another good backup
3. give it the same Hyper-V VM CPU settings as WebServer (8 processors, Compatibility Unchecked etc.)
4. Startup the system again and monitor.
In regards to Veeam One Monitor, in conclusion I think it’s a well-rounded monitoring system, in that it contains descriptive error analysis, cause and resolutions, easy to understand alerting , management of alerts, is well laid-out, and easy to setup and configure. I’m glad that I found this software at the right time, because I’m not sure I could have set up the same amount of alerts and visibility into hyper-v in the same amount of time as something like Nagios. Also I do believe the software helped me become aware of potential problems with our Hyper-V resource allocations, and resolutions to a downed critical VM.
Alerts have died down significantly to a trickle and now I’m only getting alerts when backups are running. I may need to modify my thresholds for disk queue length, or look further into optimizing my storage and/or networking. Because I have a background in Resolving, Exluding, and Editing Exclusions, the ability to fine-tune alerts is relatively easy for me, however, Veeam One has a nicely intuitive way of changing altering to fit your comfort level. I found that by excluding the really noisy top-level alerts until I got a handle on things, I was then better able to go back, re-enable the alerts and check into why they were occurring. I’m happy with the monitoring solution from Veeam and although it seems sufficient enough without buying the pro version, I’d be inclined to getting an approval for money and see what the finer details of the Pro version have in store.