CockroachDB – how to build a 4 node SQL cluster on ubuntu and HyperV

CockroachDB Overview

Description: cockroach is an open source, survivable, strongly consistent, scale-out SQL database. If you wonder where google engineers go when they leave google, they go out on their own and build unbelievably great scalable and distributed open source software. Essentially if you want to run your own fault-tolerant SQL database across multiple datacenters and cloud services, using your own servers, allowing you complete control of your database, without paying hefty licensing fees, then run cockroach. The info in this post is not a review of cockroach, but rather a demonstration of a lab setup and POC.

To get started in our lab, first we want to build around 3 or 4 test clone servers or “nodes”. I use ubuntu on top of HyperV, but you can use any flavor of linux or MacOS you want. It can also run on Windows Docker.

If you’re like me and use Hyper-V on Win10, make 4 x Ubuntu 16.04 “clones” – first build a ‘goldmaster’ image, and clone it 4 times – guide here: https://4sysops.com/archives/clone-a-ubuntu-server-in-hyper-v-2012-r2/ – or use something like virtualboxes.org.

Create 4 virtual machines, each having it’s own IP address:
Node1: inet addr:10.0.10.169
Node2: inet addr:10.0.10.170
Node3: inet addr:10.0.10.171
Node4: inet addr:10.0.10.172

Make sure each node is up to date and has ntp installed and synchronized with the commands:

sudo apt-get install ntp

Use the command

timedatectl

To ensure that…

NTP synchronized: yes

At this point before you install/run cockroach, it’s wise to export each node VM with HyperV as a backup.

On Nodes 1,2,3,4 download the latest binary here https://www.cockroachlabs.com/docs/install- cockroachdb.html with the command:

sudo wget https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz

Extract the binary with the command:

tar -xvf cockroach-latest.linux-amd64.tgz

Move the binary to a location in your PATH or add the directory location to your path. You can learn about your path with the command:

sudo vi /etc/environment

And then move your extracted cockroach to /usr/sbin with the command:

sudo mv cockroach-latest.linux-amd64/cockroach /usr/sbin/

Do a sanity check with the command:

cockroach version

Start cockroach in insecure mode in the background on Node1 (master server) with the command:

sudo cockroach start --background --insecure --host=10.0.10.169

Result should be something like below:

CockroachDB node starting at 2017-03-15 23:16:23.118419329 -0700 PDT
 build: CCL beta-20170309 @ 2017/03/09 16:31:10 (go1.8)
 admin: http://10.0.10.169:8080
 sql: postgresql://root@10.0.10.169:26257?sslmode=disable
 logs: cockroach-data/logs
 store[0]: path=cockroach-data
 status: restarted pre-existing node
 clusterID: 08b6bfe6-4886-466b-a9c6-bc58a3809113
 nodeID: 1

Go ahead and browse to the admin page http://10.0.10.169:8080

On your other nodes:

sudo cockroach start --background --insecure --host=10.0.10.170 --join=10.0.10.169:26257

*where –host=current node ip address you’re having to join with the master server 10.0.10.169

Your results should look something like the following:

CockroachDB node starting at 2017-03-15 23:23:43.783097234 -0700 PDT
 build: CCL beta-20170309 @ 2017/03/09 16:31:10 (go1.8)
 admin: http://10.0.10.170:8080
 sql: postgresql://root@10.0.10.170:26257?sslmode=disable
 logs: cockroach-data/logs
 store[0]: path=cockroach-data
 status: initialized new node, joined pre-existing cluster
 clusterID: 08b6bfe6-4886-466b-a9c6-bc58a3809113
 nodeID: 2

Your web interface should provide you with performance graphs:

Identify the new nodes in the View Nodes List link:

Go on and add the remaining Nodes to the cluster.

???

Profit! – just kidding

Now you can go on to learn about cockroach SQL and create some databases and tables and test how pulling the plug on one of your nodes doesn’t bring down the DB, and how all the data is replicated to all 4 nodes. It’s recommended you don’t run this lab on a single workstation-class system, but something that meets the cockroach DB minimum system requirements. This product is still in beta and features are subject to change. Regardless, cockroachdb is an incredible addition to the open-source community and I’m sure will be very useful to a lot of systems admins and application developers.

Install OpenDNS Umbrella Virtual Appliances on Hyper-V 2012 R2

You’re probably already familiar with OpenDNS; the service has long been trusted with consumer-grade firewalls and Wi-Fi Access Points. OpenDNS is now owned by Cisco, and the service is relatively inexpensive at approximately $115 for a three-year, 250-license package on CDW. OpenDNS Umbrella extends that protection to your enterprise by categorizing your DNS traffic in the OpenDNS data centers, rather than relying on your own firewall’s DNS capabilities. This is especially useful if you are running pfSense firewalls, as the packaged domain blocking and reporting is minimal in several areas.

In addition to Umbrella, OpenDNS can protect your roaming devices by installing a remote client. For now, we’ll look at the default reporting and why it’s necessary to set up virtual appliances.

Read the rest of the article here:

Install OpenDNS Umbrella Virtual Appliances on Hyper-V 2012 R2

Set up Ubuntu as a domain controller with SAMBA on VirtualBox

If you want to run a domain controller on your network but don’t have access to a Windows Server license, you can use SAMBA, the free open-source software, and VirtualBox, the free virtualization software. We’ll describe the procedure for setting up a virtual server using VirtualBox and netboot.xyz iPXE and move on to setting up your domain controller with SAMBA.

Read my full article here:

Set up Ubuntu as a domain controller with SAMBA on VirtualBox

Clone a Ubuntu server in Hyper-V 2012 R2

Ubuntu runs on Hyper-V perfectly fine, so you may want to run many Ubuntu Virtual Machines (VMs) on Hyper-V Server 2012. R2 This article will show you how to clone or duplicate a single Ubuntu server on Hyper-V with different network interfaces and host names. Cloning Linux servers on Hyper-V is easy and quick when you have the right knowledge and tools.

Read my full article here:

Clone a Ubuntu server in Hyper-V 2012 R2

Microsoft Bizspark – free business software for 3 years

If you’re thinking about which cloud service to use for a startup business, Microsoft just upped the ante with BizSpark.

Microsoft BizSpark https://www.microsoft.com/bizspark#start-two is really an amazing deal for business start-ups. If you wish you could get Microsoft software for free or for a huge discount check out their offer. BizSpark offers the following services and software for free for three years:

BizSpark gives startups 3 years of free stuff – software, services, tech support, and Azure cloud. Your startup qualifies if it is less than 5 years old, is privately held, and earns less than $1M annually. And at the end of your 3 years, you keep all the software you’ve downloaded – at no cost.

To expand on this service what you get with the Microsoft Bizspark details are the following:

Get up to $750 per month of FREE Azure cloud services for 3 years; that’s $150 per month each for up to 5 developers.

This potentially is a $27000 value!

Membership puts all Microsoft development and test software at your fingertips, including Azure, Windows, and Office 365 – for free. Plus, enjoy access to hundreds of free training classes, technical content, and 4 break-fix phone support incidents to help you on your journey.

It’s pretty amazing that BizSpark, in addition, also offers up to $120,000 worth of Azure credit.

Makes me want to go out and start a new business – hmm, maybe jasoncoltrin.com would qualify?

Hyper-V failed to generate initial replica for server

turbolift_access_denied

I recently found that one of my virtual machines had failed it’s initial replication off to a server located at a different site. I looked at a few of the Hyper-V-VMMS Admin Event Logs and found some of the following errors:

Event ID: 32042 – Hyper-V failed to generate initial replica for ‘040REP001’: General access denied error (0x80070005). (Virtual machine ID GUID)

Event ID: 33680 – Replication operation for virtual machine ‘ServerName’ failed.

Event ID: 32086 – Hyper-V suspended replication for virtual machine ‘ServerName’ due to a non-recoverable failure. Resume replication after correcting the failure.

Also Event ID’s 33676, 18012, 16370

I googled around and found that most resolutions revolved around fixing the “Virtual Machine” group permissions of the parent Folder or Volume by using the icacls command that looks something like this:

icacls “C:ProgramDataMicrosoftWindowsHyper-VVirtual Machines5A6F2E44-7F95-4CF8-89E5-AE8A6648C93A.xml” /grant “NT VIRTUAL MACHINE5A6F2E44-7F95-4CF8-89E5-AE8A6648C93A”:(F) /L

or this:

icacls “C:UsersPublicDocumentsHyper-VVirtual Hard DisksMountPointHere” /grant “NT VIRTUAL MACHINEVirtual Machines”:F /T

The folder that the .vhd was contained in did not have “Virtual Machine” group permissions (c:\hyper-v\exports), so I decided I should move the VM and it’s virtual hard disk to a new location that does have the correct permissions. Since this was an old export, I found the original location of the .vhd in the config file in c:\hyper-v\exports\config.xml.

Since I didn’t like the current location of the virtual machine and .vhd anyway, I decided rather than trying to fix permissions, I’d move it to a better location in a folder that would inherit the correct permissions.

To resolve the replica problem I did the following:

1. Disabled alerting on the server, then shut down the VM.

2. Disabled replication on the vm – right-click on VM in HyperV Manager -> Replication -> Remove Replication

3. Moved all Virtual Machine folders associated with MyServerName out of a folder I had created “c:\Hyper-V\Exports”  including the .vhd within (folders that did not have the permissions) and cut/pasted into “C:\Users\Public\Documents\Hyper-V\Virtual hard disks\” folder that was already created. This folder inherited permissions from the parent folder including the “Virtual Machines” group.

4. On the shut-down VM, I then went into the VM’s settings and then IDE Controller 0 -> Hard Drive -> Changed Location of Virtual Hard Disk by hitting the “Browse…” button, browsed to the new .vhd location folder with the correct permissions -> OK

5. Started the Virtual Machine successfully, then re-enabled replication with the remote site Replica server and, – yes! – initial replica started normally and health is normal again. Sending initial Replica now at 3%.

6. Re-enabled alerting on the server.

Hope this helps someone else who’s replications are failing and they’re struggling with the “General Permissions Error” problems associated with snapshots and replication.

Microsoft Hyper-V 2012 High Availability and Live Migration

I’m about 75% of the way through the Introduction to Hyper-V Jump start in the Microsoft Virtual academy. There is a lot of great information contained in the videos and .ppt downloads.  Below are some of the highlights of the material that I am not as familiar with: High-Availability and Clustering. Microsoft Hyper-V in Server 2012  provides “Complete Redundancy In the Box.”

Virtualization can cause problems if you take 10 or 20 servers and virtualize them all onto a single piece of hardware (your host server); you’ve created a single point of failure. If your single host goes down you lose a whole workload. Because Microsoft realizes that the value of these hosts exponentially increases, they’ve worked hard to create complete redundancy and protection in all core services; from the bottom of the stack, to the top. Below are the five levels, from the physical to the virtual, where Server 2012 can protect your data.

1. Hardware Fault:

  • Windows Hardware Error Architecture (WHEA)
  • Reliability, Availability, Serviceability (RAS)

Server 2012 provides RAS Hardware Fault detection capabilities. For example, if a memory controller detects an address is failing, hyper-v is notified, and the address space is taken offline. The information about the hardware fault is sent to the BDC store so it survives reboots, and the server never allocates this area of memory again. This is a nice feature that fixes hardware errors without user intervention.

2. Physical Node Redundancy:

  • Live Migration for Planned Downtime
  • Failover Cluster for Unplanned Downtime

If a server is going to go down unplanned, we want the workloads to failover without any user intervention. In planned downtime, we can live-migrate, perform maintenance on our host, then live-migrate our workloads back over to our repaired host. This physical node redundancy is provided with Hyper-V 2012.

3. I/O Redundancy:

  • Network Load Balancing & Failover via Windows NIC Teaming
  • Storage Multi-Path I/O (MPIO)
  • Multi-Channel SMB (Server 2012 Fileserver)

Storage Multi-path I/O provides redundancy for example iSCSI and  Fiber Channel both from a Host level and a Guest level. If you’re using a Server 2012 fileserver as back-end storage, you get redundancy her as well.

4. Application/Service Failover:

  • Non-Cluster Aware Apps: Hyper-V App Monitoring
  • VM Guest Cluster: iSCSI, Fiber Channel
  • VM Guest Teaming of SR-IOV NICs

If you’re running VM’s and you want to provide failover of applications within the virtual machine, you can cluster applications already. For Non-cluster aware (legacy) apps, Server 2012 provides App Monitoring which does light-level health monitoring such as restarting processes and notifications automatically etc.

5. Disaster Recovery:

  • Hyper-V Replica for Asynchronous Replication
  • CSV 2.0 Integration with Storage Arrays for Synchronous Replication

If you want to failover to another site completely, with cluster shared volumes, you need redundancy – two of everything. All nodes simultaneously monitor eachother through a heartbeat network. Every node keeps track of every other node in the cluster in a registry database (states/properties). If a node crashes, all the other nodes know about what workloads and VM’s were running on the crashed node. It will connect to the appropriate VHDs on the network, and pick up that workload. There are two types of clustering that we can consider: Host Clustering and Guest Clustering.

Host Clustering: The most common type of clustering, were we are clustering the physical servers, and we can move apps and vms between the servers.

  • Avoids a single point of failure when consolidating
  • VM’s can survive a host crash because the replica VM is restarted on another node, VMs can be restarted on the same node also when a the VM OS crashes or hangs.
  • Zero downtime maintenance and patching (live-migrate VMs to other hosts
  • Mobility and Load distribution – Live Migrate VMs to different servers to load balance.

Guest Cluster: Two virtual machines running Windows Server and form a cluster themselves for high availability. If one of the two needs to be patched, we can fail over to the other VM. If one of the two crashes, it will fail-over for example that SQL server to the live VM. The difference between virtual and physical options for storage on Guest Clusters is it requires virtualized HBAs: Virtualized Fiber Channel, Fiber Channel over Ethernet and iSCSI (not Serial Attached SCSI).

Combining Host and Guest Clustering:

It’s recommended to combine Host and Guest clustering for flexibility and protection. You can combine all VMs as long as your VMs pass the Clusters Best Practice Analyzer, known as “Validate”.

 

 

Kudos to Microsoft Virtual Academy, Symon Perriman, and Jeff Woolsey

Veeam One – Free Monitoring and Alerting of Hypervisors Review

Veeam One Monitor – Free Edition – Review

Screen Shot 2014-10-07 at 11.23.40 AM

So with the migration of the majority of our servers to Hyper-V, I’ve been on the lookout for software that will manage and monitor our VM’s and Hyper-V hosts. Microsoft does leave a lot to be desired in the realm of monitoring and alerting of Server 2012 Hyper-V, so when I read a post in www.reddit.com/r/sysadmin that Veeam One has a free version, I took the opportunity to try it out.

Veeam has a good reputation in the backup and replication of VMware systems, so I was hoping to find good support for Hyper-V, which I’m happy to say, I did. After registering for a free account, I began to download a 600MB .iso file. This initially led me to believe I would be running another virtual machine from which I would create using the .iso, similar to PHD Virtual, however, this was not the case; it installs a Fat Client Application (Yay, not another web-based console!) I was easily able to mount the .iso as a virtual DVD drive by right-clicking on the file in Server 2012 and then ran setup. The installer went through a setup for Veeam One and also installed SQL Server 2008 R2 64bit Express edition. I initially had trouble adding Hyper-V hosts into the software until I provided a server name using the FQDN and credentials (servername.domain.root.company.com / domain.root.company.comusername).

Once I was able to Add Servers, the software recognized all of the virtual machines running on each host, as well as populating the Local “storages” of the hosts (both RAID and iSCSI volumes). I continued on and setup Alerting to use our SMTP relay and soon my mailbox was being flooded with alerts. I happened to finish the installation during our scheduled backups so I wasn’t surprised that I was getting Disk Queue Length alerts. However, I was concerned that there were many warnings in regards to Memory Pressure. Hyper-V uses a feature called Dynamic Memory that when a virtual machine is first created, you are given the option of setting a Startup RAM amount in MB the amount of memory you want the machine to have available during start-up. You are also given the option to Enable Dynamic Memory and set a Minimum RAM and a Maximum RAM. In our case I had one of our hosts setup with 2 VMs to NOT use Dynamic memory and were hard-set to use 8GB and then 16GB Startup RAM out of a total available 32GB. Additionally, I had a 3rd VM on the same host set to use 4GB Start-up RAM with Dynamic RAM enabled (512 min/10GB max). This 3rd VM is the one that was complaining about this “Memory Pressure” stuff, and earlier in the month had spontaneously restarted, which prodded me to dig deeper and optimize our Hyper-V setup.

Memory Pressure is described by Veeam as:

“Dynamic Memory determines the amount of memory needed by a virtual machine by calculating memory pressure. To perform this calculation, Hyper-V looks at the total committed memory of the guest operating system running in the virtual machine and then calculates pressure as the ratio of how much memory the virtual machine wants, to how much it has. The higher VM average pressure you have, the more the VM starts paging memory to disk with disastrous effect on the VM’s performance.”

and further with a Resolution:

“Most likely high demand for Memory resources is caused by application workloads inside the VM. To troubleshoot memory pressure check dynamic memory allocation, buffer, and limits for this VM. If there is not enough physical memory available on the host to give the VM it’s requested amount of memory, consider relocating this VM to another host or verify memory usage for existing VMs on the same host.”

By highlighting first the Host with the problem VM, then selecting the memory tab, I found that the Hyper-V Services Memory Consumed average percentage was hovering near 130%.

Screen Shot 2014-10-07 at 11.03.11 AM

You can see that mid-day on the 6th, I ‘moved’ the problematic VM off of the host and the memory pressure was relieved enough to not receive alerts.

A couple things to note during the “Move” of the VM from one Server 2012 Hyper-V host to another, I did have to modify the Processor of the VM and place a checkmark in the Compatibility Configuration to enable the “Migrate to a physical computer with a different processor version”. Additionally, although the live migration or move did complete successfully after the move and the system stayed up and remained in service, after a restart it lost it’s IP Configuration.  The new host had more memory available for the moved VM, and so I allotted more memory to it while the system was shut down. I noticed on power-up that the IP address had changed and so found that the Network Adapter Advanced Features had reverted back to a Dynamic Mac, instead of the hard-coded “Static” MAC Address. After changing this to it’s previous Mac Address, the server picked up the correct IP address and the move was complete.

In the graph above you’ll also notice a dip in memory. This was due to one of our Database servers going down. This is critical VM for our site and it was very beneficial to have Veeam One available to help troubleshoot the problem. First I checked the Server 2012 Event logs and found the following events.

info: 5.52pm Event 18510, Hyper-V-Worker ‘DBServer’ saved successfully. (Virtual machine ID 7DFDBBE8-CD76-4EBC-B015-27FF251BCFA5)

info: 5:52pm Event 16642, Hyper-V Hypervisor Hyper-V successfully deleted a partition (partition 2).

Error 5:51 Event 4096, Hyper-V-Integration-Data-Exchange: ‘DBServer’: The Data Exchange integration service is either not enabled, not running or not initialized. (Virtual machine ID 7DFDBBE8-CD76-4EBC-B015-27FF251BCFA5)

Veeam says:
The VM Configuration is no longer accessible.

Cause: Virtual Machine might not have right permissions on vsconfig.dll file or VMMS service is not started.

Resolution: Check that virtual machine configuration module (also called the “store”) is contained in the vsconfig.dll file an the path to the vsconfig.dll file is valid and has the right permissions. By default the vsconfig.dll file is installed at \windowssystem32. Verify that the VMMS service is in the running state in the Service Control Manager and that the failed operation now succeeds.

I Checked hyperv001, the Hyper-V Virtual Machine Management Service is in the Running/Automatic state. Logged in as Local System. I modified the Recovery options on the service from 1st failure: Restart the Service, 2nd failure: Restart the Service, Subsequent failures: Take no Action to: Restart the Service and Restart service after from 2 minutes to 1 minute.
I spun up the failed VM successfully (without changing any of the VM’s settings) and DB services were restored.I ran a Hyper-V Export successfully on ‘DBServer’ to D:ExportsDBServer to make sure we have a good image of the VM backed up.The HyperV Data Exchange Service is described as: Provides a mechanism to exchange data between the virtual machine and the operating system running.
The Startup Type for the service is: Manual (Trigger Start). This type of data integration seems related to “Integration Services” which is what you install inside a Windows OS with a virtual DVD in order for driver compatibility etc. Also I saw indications that we might need to “modify” NUMA or “extra processor features”.It could be that Linux DBServer doesn’t play well with Hyper-V CPU and/or integrated services and we may need to power-down the VMs, modify the processor features, or NUMA, then power them up again.

I Double-checked NUMA/Compatibility on both WebServer and DBServer:
DBServer:
Compatiblity is Checked, 6processors, 3 NUMA Nodes, 2 Sockets
WebServer:
Compatibility is UnChecked, 8processors, 2 NUMA Nodes, 1 SocketFor the DBServer, I believe we may need to shut down and UnCheck Compatibility option, in effect the opposite of “Dumbing-down” the CPU on the problematic server.
If the services fail again and the DBServer VM shuts down again, I’m going to:
1. Shut down the DBServer
2 . Do another Export of the VM so as to get another good backup
3. give it the same Hyper-V VM CPU settings as WebServer (8 processors, Compatibility Unchecked etc.)
4. Startup the system again and monitor.
In regards to Veeam One Monitor, in conclusion I think it’s a well-rounded monitoring system, in that it contains descriptive error analysis, cause and resolutions, easy to understand alerting , management of alerts, is well laid-out, and easy to setup and configure. I’m glad that I found this software at the right time, because I’m not sure I could have set up the same amount of alerts and visibility into hyper-v in the same amount of time as something like Nagios. Also I do believe the software helped me become aware of potential problems with our Hyper-V resource allocations, and resolutions to a downed critical VM.
Alerts have died down significantly to a trickle and now I’m only getting alerts when backups are running. I may need to modify my thresholds for disk queue length, or look further into optimizing my storage and/or networking. Because I have a background in Resolving, Exluding, and Editing Exclusions, the ability to fine-tune alerts is relatively easy for me, however, Veeam One has a nicely intuitive way of changing altering to fit your comfort level. I found that by excluding the really noisy top-level alerts until I got a handle on things, I was then better able to go back, re-enable the alerts and check into why they were occurring. I’m happy with the monitoring solution from Veeam and although it seems sufficient enough without buying the pro version, I’d be inclined to getting an approval for money and see what the finer details of the Pro version have in store.