Always On availability group(s) – Rolling upgrades

Aim: To upgrade/migrate (side-side) SQL Server 2014 Availability group(s) running on Windows Server 2012 R2 to SQL Server 2019 running on Win 2016 with the least amount of downtime.

Couple of years ago, I wrote a blog post explaining how to upgrade Windows OS from 2012 R2 to 2016 on nodes participating in fail over clustering with minimal downtime using rolling upgrade technique. In this blog post I will be sharing something similar but throwing SQL Server availability groups into the mix. So let me briefly explain what we are trying to achieve here.

Current environment:
I’ve a two node Failover cluster (Windows Server 2012 R2) hosting SQL Serevr 2014 Always On Availability Group with synchronous commit mode. I have a listener configured for my applications to connect. These replicas are running on the latest build of SQL 2014 as of the date this post is published.

As you can see, W12SQL2016A/B are my two replicas(Nodes) which are running Win2012R2+SQL 2014.

Originally I thought of Installing SQL 2016(hence the host names), but ended up installing SQL 2014 for now based on our specific requirement. I didn’t want to change the host names as I had my windows Fail over cluster all setup by this time and I really don’t want to deal with fixing any annoying errors that might popup because of messing up my host names of my nodes. Anyways…the bottom line is I have SQL 2014 AG running on Win 2012R2 which needs to be upgraded/migrated to SQL 2019 running on Windows 2016.


Listener Name: RestAGL

Goal:
To upgrade these SQL Instances to 2019  running on windows server 2016 with a very minimal downtime and no configuration changes for the App teams, assuming In-place upgrades are not allowed.

What’s the high level plan:

Take Full Backups.

  • Add W16SQL2019A and W16SQL2019B nodes to the same windows cluster leveraging mixed mode.
  • Install SQL 2019 and add these two nodes as replicas at SQL Server AOAG layer.
  • Join the databases and let the magic happen.
  • During the final cutover date/time, failover to SQL 2019 and remove the old replicas from AG.
  • Evict both windows 2012R2 nodes from the cluster and raise the functionality level to 2016.

Now, let’s see this in action one step at a time.

Below is the screenshot of all my SQL Instances which I will be working on. To begin with I have two brand new SQL Server 2019 standalone Instances(W16SQL2019A and W16SQL2019B), on which I just enabled HADR feature.

Let’s go, I added the new Win2019 nodes to the existing windows failover cluster which is running on 2012 R2 functionality level.

Note: You don’t want to run in mixed mode of WSFC for long periods. Microsoft might not support if you stay in mixed mode for more than 4 weeks. This is only to perform rolling upgrades to make your systems really highly available. Wrap up the entire process in a day or two and be done with it.

This is expected.  For more details on this, hop on to the blog post that I provided in the beginning of this blog post.

Now it’s time to jump into SQL Server to add these servers as replicas into our AG.

Awesome, so far so good 🙂

Let’s move on….Connecting to one of the SQL 2019 instances, below is what I have. Oops!!


I had to join the individual databases manually to the AG from both the newly added SQL 2019 replicas. Once that is done, below is how my dashboard looked like.

Also, I changed the failover mode to manual just to make sure cluster has no control over failing over my AG. I want to have total control over how and when to failover my AG till the entire upgrade process is complete. Hey BTW, did you take Full Backups?

Did I mention, I have a table called “McD” in “American” database with one row in it? See below…

Now comes the fun part. Set one of the SQL 2019 Instances availability mode to Synchronous commit and perform a controlled manual failover. In my case, I selected W16SQL2019A on which I changed it to Synchronous mode and failed over my AG from W12SQL2016A(Which is my current primary) to W16SQL2019A .

Awesome, At this point, W16SQL2019A took over the primary role all your databases participating in your AG have been upgraded to SQL 2019 and the other SQL 2019(W16SQL2019B in my case) Instance will be in sync from now on, but the two SQL 2014 Instances will be in unhealthy state, In fact those databases become inaccessible at this time, since Logs can’t be shipped from higher(2019) to lower(2014) version. Duh!!!!!….

Perfectoo! Also, I have my table and data intact, double perfectoo!

If you are curious, this is how the error log looked like. You can clearly see, the internal database version is getting upgraded from SQL 2014 all the way to 2019.

Below is a screenshot showing what to expect on old SQL instances after failing over AG to newer version.

Time to do some clean up now. I removed both SQL 2014 Instances from AG as replicas and boom……PRESTO!

The only thing left now is to take care of WSFC by evicting old windows server 2012 R2 nodes and raise the functional level of the cluster to come out of mixed mode.

That’s it folks. Hope this is helpful, Cheers!

Simulating a Multi Subnet cluster for setting up SQL Server Always On Availability Groups – lab setup

In this article, we are going to see how to create a multi subnet cluster spanning across multiple subnets for lab purposes. Creating such an environment should help creating Availability groups simulating a far replica residing in a different Data Center (Subnet) acting as a disaster recovery site in your lab for learning/experimenting real world scenarios.

Let’s get into action, below is what we are going to setup for our lab purposes to simulate a multi subnet cluster environment and create an Always On Availability group.

  • Two nodes (Replicas) residing in my production/primary Data center (Subnet 192.168.1.x) – Let’s say these two replicas are used for High availability, assuming we have low latency, since they are residing in the same data center (Probably in the same rack), I will set them up with Synchronous Mode with Automatic Failover
  • Third node (Replica) sitting in my disaster recovery data center which is geographically dispersed (Subnet 192.168.2.x) – Assuming it’s connected over WAN and is used just for DR purposes, I would set this replica in Asynchronous Mode with Manual Failover

For more reading…go to below link.

Simulating a Multi Subnet cluster for setting up SQL Server Always On Availability Groups – lab setup

SQL Server with Cluster Shared volumes (CSV) – Part 2

In the previous part of this series, we have seen what a cluster shared volume is and what are the advantages and other considerations to keep in mind when deploying CSVs for SQL Server workloads. In this article, I will walk though actual installation of a failover cluster Instance leveraging CSVs.

To begin with, I will walk you through my cluster setup from 20,000 foot view. I created two brand new VMs running windows server 2012 R2 and renamed them accordingly. Nothing special w.r.t disk drives at this point, Just basic VMs with a system drive(C$).

I also created 2 virtual networks in my VMWare workstation which I will be using for configuring my Public and private NICs on my nodes.

On my first node:

192.168.1.100 is the IPv4 address of my DNS server and below are my NIC settings.

Once IP address has been configured, below is how I joined my node to domain (sqltrek.local in my case).

Further reading: Deploy SQL with CSVs – Part 2

SQL Server with Cluster Shared volumes (CSV) – Part 1

Microsoft SQL Server provides us with a wide variety of solutions to architect High availability (HA) and Disaster Recovery (DR) solutions for mission-critical workloads. In this article, let’s just focus on HA, specifically Failover Clustering. Failover clustering is probably the most mature, robust and stable high availability solution which Windows Server Operating system offers. It’s been there around for few decades now and did evolve over time along with SQL Server. In this article Let’s see a hidden feature of windows server failover cluster which helps in making our already highly available SQL Server Failover clustered instances even more highly available. The new feature which we are going to talk about is Cluster Shared Volumes, AKA CSVs. Considering windows server 2019 is around the corner, I say CSVs are not a new concept in clustering, it’s been there for almost a decade now. Microsoft introduced CSVs in windows server 2008R2, but at that time SQL Server was not supported on CSVs. Well, CSVs were originally designed for Hyper-V workloads and later on enhanced for File servers and eventually landed into SQL Server beginning version 2014.

Fair enough, but why should we care about Cluster shared volumes?

Well, the idea behind introducing CSVs is to provide truly shared disks to a failover cluster which are available to all the nodes for reading and write operations. Let’s talk about a traditional Failover clustered Instance setup for a moment. During Failover, to bring SQL Server resource online, the drives should be unmounted on the previous owner and remounted on the node which will act as the primary after failover. Should your IO subsystem become bottleneck for whatever reason, Unmounting and mounting process takes longer time thus impacting the availability of the system. Whereas with CSVs, there is no unmounting and mounting of disks since they are already made available for reading and write operations across all the nodes. In other words, it reduces downtime since SQL Server resource is no longer dependent on disks to come online. Let’s talk about one more scenario where CSVs outperform traditional shared storage. Let’s assume disk(s) loses connectivity from the node which is currently running SQL Server in middle of the day, under these circumstances the cluster can leverage another path (s) available to the shared disk without having to failover the resource group to another node. This will save us from potential unplanned downtime during business hours.

How CSVs work?

Further reading : Deploy SQL Server with CSVs

Upgrading Windows Failover Cluster 2012R2 to 2016

In this blog post, let’s see how to upgrade a Windows Server 2012R2 Failover cluster to Windows Server 2016. My current LAB setup is a 3 node cluster, all running 2012R2 and I decided to upgrade them to 2016 to learn new features/enhancements of windows server 2016. Thought of coming up with a short blog post on how to perform this if anyone of you are on the same boat as me.

Anyways….Here’s my current lab setup.

1_current_clust_config

Before moving on…what options do I have for upgrading my cluster?
Option 1: Install Windows Server 2016 on a completely new machine and introduce to the existing cluster and move roles and remove the old 2012R2 node….Work on the next node and perform the same….so on…till all your nodes in your cluster are 2016 machines. As the last step Raise the Cluster functionality level to 2016.

Option 2: Select a node which you want to upgrade, drain roles->evict node->Perform in-place upgrade from 2012R2 to 2016->Introduce the upgraded node back to cluster. Perform the same till all your nodes in your cluster are 2016 machines. As the last step Raise the Cluster functionality level to 2016.

NOTE: Yes, you can have windows server 2016 and windows server 2012R2 nodes participating in the same cluster. It’s called mixed mode, which is a new compatibility feature/enhancement introduced to transition from server 2012 R2 to 2016 without downtime. However you can’t leave the cluster in that state for ever. You’ve 4 weeks time to be in supported state. As soon as all the nodes in the cluster are upgraded to server 2016 the cluster functional level should be upgraded. Once this is done, we can’t revert/rollback, we can’t add 2012 R2 nodes to this cluster anymore.

I’ve chosen to perform inplace upgrades(Method 2 as mentioned above). Would I do this in Production? Probably not.

Now…I’ve drained the node which I would like to upgrade to 2016 and evicted from cluster.

2

3

It’s gone! My current cluster state at this point is shown below.

4_node_evicted

Now…I inserted 2016 media on the server which I just evicted and performed upgrade.

5_install

6

7_upgrade_drive

I had no free space on C$, so I ended up adding a new drive(U$) to facilitate upgrade process. Well, It failed…I had to expand Drive C$ and restart the process!!

8

After struggling for around 90 minutes or so, I am all set.

9_upgrade_done

Now, I am reintroducing the node back into the cluster.

10

12

13

14

Alrighttttttttttttttty…..Here it is!

15

As you can see below, Current cluster functional level is set to 8. Once all the nodes have been upgraded to 2016 in this cluster, we should upgrade functional level as I already mentioned and the below command would return 9 as the output.

16

There you go guys, Hope this helps. Happy weekend!

Update: After spending few more hours, I was able to upgrade all my nodes to 2016 and upgraded cluster functionality level. It’s no longer in Mixed Mode(in other words, I won’t be able to add a windows 2012 R2 node to my cluster). See below…As I mentioned above, the cluster functional level is set to 9 after I issued “update-clusterfunctionallevel”

last