SQL Server with Cluster Shared volumes (CSV) – Part 1

Microsoft SQL Server provides us with a wide variety of solutions to architect High availability (HA) and Disaster Recovery (DR) solutions for mission-critical workloads. In this article, let’s just focus on HA, specifically Failover Clustering. Failover clustering is probably the most mature, robust and stable high availability solution which Windows Server Operating system offers. It’s been there around for few decades now and did evolve over time along with SQL Server. In this article Let’s see a hidden feature of windows server failover cluster which helps in making our already highly available SQL Server Failover clustered instances even more highly available. The new feature which we are going to talk about is Cluster Shared Volumes, AKA CSVs. Considering windows server 2019 is around the corner, I say CSVs are not a new concept in clustering, it’s been there for almost a decade now. Microsoft introduced CSVs in windows server 2008R2, but at that time SQL Server was not supported on CSVs. Well, CSVs were originally designed for Hyper-V workloads and later on enhanced for File servers and eventually landed into SQL Server beginning version 2014.

Fair enough, but why should we care about Cluster shared volumes?

Well, the idea behind introducing CSVs is to provide truly shared disks to a failover cluster which are available to all the nodes for reading and write operations. Let’s talk about a traditional Failover clustered Instance setup for a moment. During Failover, to bring SQL Server resource online, the drives should be unmounted on the previous owner and remounted on the node which will act as the primary after failover. Should your IO subsystem become bottleneck for whatever reason, Unmounting and mounting process takes longer time thus impacting the availability of the system. Whereas with CSVs, there is no unmounting and mounting of disks since they are already made available for reading and write operations across all the nodes. In other words, it reduces downtime since SQL Server resource is no longer dependent on disks to come online. Let’s talk about one more scenario where CSVs outperform traditional shared storage. Let’s assume disk(s) loses connectivity from the node which is currently running SQL Server in middle of the day, under these circumstances the cluster can leverage another path (s) available to the shared disk without having to failover the resource group to another node. This will save us from potential unplanned downtime during business hours.

How CSVs work?

Further reading : Deploy SQL Server with CSVs

Advertisement

Upgrading Windows Failover Cluster 2012R2 to 2016

In this blog post, let’s see how to upgrade a Windows Server 2012R2 Failover cluster to Windows Server 2016. My current LAB setup is a 3 node cluster, all running 2012R2 and I decided to upgrade them to 2016 to learn new features/enhancements of windows server 2016. Thought of coming up with a short blog post on how to perform this if anyone of you are on the same boat as me.

Anyways….Here’s my current lab setup.

1_current_clust_config

Before moving on…what options do I have for upgrading my cluster?
Option 1: Install Windows Server 2016 on a completely new machine and introduce to the existing cluster and move roles and remove the old 2012R2 node….Work on the next node and perform the same….so on…till all your nodes in your cluster are 2016 machines. As the last step Raise the Cluster functionality level to 2016.

Option 2: Select a node which you want to upgrade, drain roles->evict node->Perform in-place upgrade from 2012R2 to 2016->Introduce the upgraded node back to cluster. Perform the same till all your nodes in your cluster are 2016 machines. As the last step Raise the Cluster functionality level to 2016.

NOTE: Yes, you can have windows server 2016 and windows server 2012R2 nodes participating in the same cluster. It’s called mixed mode, which is a new compatibility feature/enhancement introduced to transition from server 2012 R2 to 2016 without downtime. However you can’t leave the cluster in that state for ever. You’ve 4 weeks time to be in supported state. As soon as all the nodes in the cluster are upgraded to server 2016 the cluster functional level should be upgraded. Once this is done, we can’t revert/rollback, we can’t add 2012 R2 nodes to this cluster anymore.

I’ve chosen to perform inplace upgrades(Method 2 as mentioned above). Would I do this in Production? Probably not.

Now…I’ve drained the node which I would like to upgrade to 2016 and evicted from cluster.

2

3

It’s gone! My current cluster state at this point is shown below.

4_node_evicted

Now…I inserted 2016 media on the server which I just evicted and performed upgrade.

5_install

6

7_upgrade_drive

I had no free space on C$, so I ended up adding a new drive(U$) to facilitate upgrade process. Well, It failed…I had to expand Drive C$ and restart the process!!

8

After struggling for around 90 minutes or so, I am all set.

9_upgrade_done

Now, I am reintroducing the node back into the cluster.

10

12

13

14

Alrighttttttttttttttty…..Here it is!

15

As you can see below, Current cluster functional level is set to 8. Once all the nodes have been upgraded to 2016 in this cluster, we should upgrade functional level as I already mentioned and the below command would return 9 as the output.

16

There you go guys, Hope this helps. Happy weekend!

Update: After spending few more hours, I was able to upgrade all my nodes to 2016 and upgraded cluster functionality level. It’s no longer in Mixed Mode(in other words, I won’t be able to add a windows 2012 R2 node to my cluster). See below…As I mentioned above, the cluster functional level is set to 9 after I issued “update-clusterfunctionallevel”

last

 

Lab setup – AlwaysOn AGs in a Multi Subnet Cluster – Part 2

In part1, I’ve shown how to create a windows cluster in a multi subnet setup.  In this post let’s see how to create a AG and corresponding listener.

In my lab, I will be creating two AGs and two corresponding listeners.

Details:
I’ve two databases – > sales and customers.
Two AGs – > Sales_AG and Customers_AG.Two Listners – > sqllst_Sales and sqllst_Cust.
For sales AG, I’ve disabled “Database level health detection”, a new feature introduced in SQL 2016.

AG1

ag2

Now, for Listener two IPs from both subnets have been provided.

ag3

ag5

Same process has been followed for creating customers AG and listener as well(But this time I’ve enabled Database level health detection).

ag4

Since this is a multi-subnet setup two entries(one from each subnet) will be created in DNS for each listener name as shown below.

ag_final

That’s about it folks.

Lab setup – AlwaysOn AGs in a Multi Subnet Cluster – Part 1

Let’s see how to setup an AG(SQL 2016) in a multi subnet cluster(Geo cluster) in a lab environment.

Below is my lab setup:

Two Replicas sitting in my Production Data center.(Subnet 192.168.1.x) – Sync Mode Automatic Failover.
Third(Far) Replica sitting in my DR Data Center.(Subnet 192.168.2.x) – Asynch Mode Manual Failover.

So, What do we need to be able to setup multiple subnets and routing in a lab environment? Answer is “Routing and Remote Access“. Have that installed by going to Add Roles/Features on your AD/DNS server.

Pre-req Step: Created 2 NICs on my SANDC machine with IPs 192.168.1.100 and 192.168.2.100

Open Routing and remote access config tool; right click on the root node and select “Enable and Configure Routing and remote access”.

Now…under IPv4, under General right click and select new routing protocol and select “RIP Version 2 for Internet Protocol”.

1

Now right click on RIP and select new interface and select your NIC1 and hit okay and next repeat the same step and select NIC2 this time and click okay. You are done with routing…That’s all you need for routing to work(As long as you got all the IPs and DNS details right).

2

4

3

5

Now, I’ve setup 3 nodes(two nodes(Prd) in 1.x and one(DR) in 2.x) and installed Failover Cluster feature on all the nodes and disabled all firewalls.

My PRD nodes: STLSQLAG1 and STLSQLAG2.
My DR Node:AZSQLAG3

Do find all my NIC settings from all my nodes at the very end of this post.

Now, Let’s create Windows Cluster:

Please refer to my earlier posts on how to create a cluster under “Clustering category” for detailed steps. Below are the steps at a high level.

clu1

clu2

clu3

Now…My cluster is ready, but it’s missing Quorum which is very critical for a cluster to be healthy. For that I’ve created a File share witness as Quorum.

clu4

clu5

FYI, NIC settings from Cluster manager are shown below.

clu6

clu7

NIC settings on all of my nodes:

STLSQLAG1: (1 NIC Card)

nic1

STLSQLAG2: ( 1 NIC Card)

nic2

AZSQLAG3(DR Server): 1 NIC Card

nic_AZ_DR

AD/DNS Server: ( 2 NICs one for 1.x and other for 2.x)

nic3_sandc

nic4_sandc

In this post we’ve seen how to setup a geo cluster in a lab environment. So, this completes the prep work needed from Windows stand point…Let’s see how to create AGs and Listeners in our Multi Subnet environment in next part of this series.

Setting up TDE on SQL Server Failover Cluster

Back in 2011 I wrote couple of Blog posts when I was initially exploring Encryption options we have in SQL Server. Never really got a chance to work on TDE since then. Fast forward to 2016, I am participating in TDE project, where we are enabling TDE for few of our databases which are hosted on SQL Server Failover Clustered Instances. Currently we are in testting/POC phase. In this post, I will share what my findings are about TDE on clustered instances. Let’s get started…

Enabling TDE on a clustered database is no different than how we enable on a Standalone Instance. Steps are as follows:

--a.
Create database TDE_Test
GO
--1.create a master key in master database.
USE master
GO
CREATE MASTER KEY
ENCRYPTION BY PASSWORD = 'Very$ecurepwd123'
--2.Create a certificate(name it)
USE master
GO
CREATE CERTIFICATE CLUSTPRD1_TDECert
WITH SUBJECT = 'Transparent Data Encryption Certificate'
--3. Create a database encryption key. –Based on the certificate from Step 2–Can use AES, DES, Triple DES, RC4 etc
USE TDE_Test--The db to protect
GO
CREATE DATABASE ENCRYPTION KEY WITH
ALGORITHM = AES_256
ENCRYPTION BY SERVER CERTIFICATE
CLUSTPRD1_TDECert--Cert from Step 2

--4. Turn it on!
USE TDE_Test--The db to protect
GO
ALTER DATABASE TDE_Test
SET ENCRYPTION ON
GO
-----Backup Certs---V V Imp
USE MASTER
GO
BACKUP CERTIFICATE CLUSTPRD1_TDECert
TO FILE = 'C:\TDE_Keys\CLUSTPRD1_TDECert.cer'
WITH PRIVATE KEY (FILE = 'C:\TDE_Keys\CLUSTPRD1_TDECert_Key.pvk' ,
ENCRYPTION BY PASSWORD = 'MyAw3S0m3Pwd123#' )
GO
--Backup SMK---VVV Important</pre>
BACKUP SERVICE MASTER KEY TO FILE = 'c:\keys\SQLPRD1_SMK.bak' ENCRYPTION BY PASSWORD = 'gytj6%&amp;5gjOUytp';

Okay…so what happens when SQL gets failed over to another node? Will TDE work? Will the database be still in usable state? The answer is “YES” from my initial testing. Everything remains intact. Nothing breaks!

Note: This is different from the way how it works in Alwayson AGs.

Reason: See below Screenshot taken from BOL.

3_arch

The Master Key which we created on step 1 is protected by Service master key(SMK) which is scoped at Instance level is the root of our Hierarchy, And then the certificate and the keys are scoped at master and user database levels which will be failed over and moved between nodes.So….Nothing special is required for FCIs. Please correct me If I am wrong and post any thoughts in comments.

Cheers!