SQL Server 2019 – VLDB struck in “In Recovery”!

Getting straight to the point, Chances are very high that you will run into the issue that we are going to see in this blog post when you try to restore a database larger than 30 TB running on SQL Server 2019 CU1.

Environment:

SQL Server 2019 CU1
DB1 – 6 TB.
DB2 – 30+TB.

Behavior:

Restoring 6 TB database – Everything goes normal as it should.
Restoring 30 TB database – Database stays in “In Recovery” state indefinitely. Infamous “Non yielding scheduler” stack dump gets created when the recovery process enters roll forward(REDO) phase. To give you an idea, look at below log entries.

Message

Recovery of database ‘VLDB’ (8) is 0% complete (approximately 138 seconds remain). Phase 2 of 3.
Recovery of database ‘VLDB’ (8) is 22% complete (approximately 7 seconds remain). Phase 2 of 3.

***Unable to get thread context for spid 0
* *******************************************************************************
** BEGIN STACK DUMP:
* MM/DD/YY HH:MM:SS spid 4368
** Non-yielding Scheduler
** *******************************************************************************
Stack Signature for the dump is 0x00000000000001BC
External dump process return code 0x20000001.
External dump process returned no errors.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.
Approx Thread CPU Used: kernel 0 ms, user 70375 ms. Process Utilization 16%. System Idle 79%. Interval: 70579 ms.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.
Process 0:0:0 (0x110c) Worker 0x000001B39D42C160 appears to be non-yielding on Scheduler 1. Thread creation time: 13234458222710.

Note: I noticed this exact behavior consistently on multiple servers irrespective of what method we use to restore this DB.
Native restore – Fails
Attach/Detach – Fails
Restored using EMC DDBoost – Fails
Restore virtual database by presenting virtual data copies using third party tools such as Delphix – Fails.

Fix:
Drop the database which got struck in “In recovery” and apply 2019 CU4 or above.

I am not sure at what size SQL 2019 CU1 starts to panic, but it was happy with 6TB DB and caused misery for 30 TB DB. BTW…At the time of writing this blog post, we have CU5 available which I strongly recommend if you are planning to patch.

 

Happy Monday!

 

Uninstalling CUs/Downgrading SQL Server in Linux(RHEL7)

Back in 2011 I wrote an article on how to uninstall service packs in version SQL 2008/R2 which was quite a popular post at that time based on my wordpress stats. (Well, SQL 2008 was the very first version that allowed us to do that) Fast forward to year 2020 where we can have SQL Server running on Linux in production environments, let’s see how to achieve the same if you ever want to uninstall a cumulative update. Did you notice, I said CU, not service packs this time. See this post for more info on MSM.

First thing first, let’s see what we have got on the box. Pulling that information is quite simple, Run “sudo yum info mssql-server” which should return something like this.

Of course you can get this info from SQL or several other ways in Linux. Okay, now we know we got SQL Server 2019 CU5 running on this server to work with. Let’s just assume CU5 broke something in my database and I want to go back to CU4. How do I do that?

Run “sudo yum downgrade mssql-server-<your_desired_version_number>.x86_64“. Okay, so how do I get those version number details? Microsoft has those details maintained and updated regularly in their release notes. Check below links for 2019 or 2017 based on your version.
SQL 2019 release notes.
SQL 2017 release notes.

Also, We can get this information directly using YUM super powers as shown below…
sudo yum list mssql-server –showduplicates

If you want additional details for individual packages as shown below, just replace “list” with “info”, the command would be “sudo yum info mssql-server –showduplicates

Now that we have all the information what we need, let’s go to the actual fun part.

Downgrading from SQL 2019 Cu5 to CU4:
Command:
sudo yum downgrade mssql-server-15.0.4033.1-2.x86_64

As you can see it clearly states that CU5 was removed and CU4 was downloaded and installed successfully 🙂

BTW, the basic golden rule still remains the same,  you can’t downgrade to a lower version(from SQL 2019 to 2017, –> Nope, that’s not how downgrade works)! You are allowed to do whatever you want as long as you are staying at same version.

Always On availability group(s) – Rolling upgrades

Aim: To upgrade/migrate (side-side) SQL Server 2014 Availability group(s) running on Windows Server 2012 R2 to SQL Server 2019 running on Win 2016 with the least amount of downtime.

Couple of years ago, I wrote a blog post explaining how to upgrade Windows OS from 2012 R2 to 2016 on nodes participating in fail over clustering with minimal downtime using rolling upgrade technique. In this blog post I will be sharing something similar but throwing SQL Server availability groups into the mix. So let me briefly explain what we are trying to achieve here.

Current environment:
I’ve a two node Failover cluster (Windows Server 2012 R2) hosting SQL Serevr 2014 Always On Availability Group with synchronous commit mode. I have a listener configured for my applications to connect. These replicas are running on the latest build of SQL 2014 as of the date this post is published.

As you can see, W12SQL2016A/B are my two replicas(Nodes) which are running Win2012R2+SQL 2014.

Originally I thought of Installing SQL 2016(hence the host names), but ended up installing SQL 2014 for now based on our specific requirement. I didn’t want to change the host names as I had my windows Fail over cluster all setup by this time and I really don’t want to deal with fixing any annoying errors that might popup because of messing up my host names of my nodes. Anyways…the bottom line is I have SQL 2014 AG running on Win 2012R2 which needs to be upgraded/migrated to SQL 2019 running on Windows 2016.


Listener Name: RestAGL

Goal:
To upgrade these SQL Instances to 2019  running on windows server 2016 with a very minimal downtime and no configuration changes for the App teams, assuming In-place upgrades are not allowed.

What’s the high level plan:

Take Full Backups.

  • Add W16SQL2019A and W16SQL2019B nodes to the same windows cluster leveraging mixed mode.
  • Install SQL 2019 and add these two nodes as replicas at SQL Server AOAG layer.
  • Join the databases and let the magic happen.
  • During the final cutover date/time, failover to SQL 2019 and remove the old replicas from AG.
  • Evict both windows 2012R2 nodes from the cluster and raise the functionality level to 2016.

Now, let’s see this in action one step at a time.

Below is the screenshot of all my SQL Instances which I will be working on. To begin with I have two brand new SQL Server 2019 standalone Instances(W16SQL2019A and W16SQL2019B), on which I just enabled HADR feature.

Let’s go, I added the new Win2019 nodes to the existing windows failover cluster which is running on 2012 R2 functionality level.

Note: You don’t want to run in mixed mode of WSFC for long periods. Microsoft might not support if you stay in mixed mode for more than 4 weeks. This is only to perform rolling upgrades to make your systems really highly available. Wrap up the entire process in a day or two and be done with it.

This is expected.  For more details on this, hop on to the blog post that I provided in the beginning of this blog post.

Now it’s time to jump into SQL Server to add these servers as replicas into our AG.

Awesome, so far so good 🙂

Let’s move on….Connecting to one of the SQL 2019 instances, below is what I have. Oops!!


I had to join the individual databases manually to the AG from both the newly added SQL 2019 replicas. Once that is done, below is how my dashboard looked like.

Also, I changed the failover mode to manual just to make sure cluster has no control over failing over my AG. I want to have total control over how and when to failover my AG till the entire upgrade process is complete. Hey BTW, did you take Full Backups?

Did I mention, I have a table called “McD” in “American” database with one row in it? See below…

Now comes the fun part. Set one of the SQL 2019 Instances availability mode to Synchronous commit and perform a controlled manual failover. In my case, I selected W16SQL2019A on which I changed it to Synchronous mode and failed over my AG from W12SQL2016A(Which is my current primary) to W16SQL2019A .

Awesome, At this point, W16SQL2019A took over the primary role all your databases participating in your AG have been upgraded to SQL 2019 and the other SQL 2019(W16SQL2019B in my case) Instance will be in sync from now on, but the two SQL 2014 Instances will be in unhealthy state, In fact those databases become inaccessible at this time, since Logs can’t be shipped from higher(2019) to lower(2014) version. Duh!!!!!….

Perfectoo! Also, I have my table and data intact, double perfectoo!

If you are curious, this is how the error log looked like. You can clearly see, the internal database version is getting upgraded from SQL 2014 all the way to 2019.

Below is a screenshot showing what to expect on old SQL instances after failing over AG to newer version.

Time to do some clean up now. I removed both SQL 2014 Instances from AG as replicas and boom……PRESTO!

The only thing left now is to take care of WSFC by evicting old windows server 2012 R2 nodes and raise the functional level of the cluster to come out of mixed mode.

That’s it folks. Hope this is helpful, Cheers!

Installing SQL 2019 on LInux (RHEL/CENTOS)

Wish you all a very happy new year folks!  Been a while since I posted something, so here I am 🙂  I would like to begin my blogging in year 2020 with Linux. Long ago, I wrote an article on how to setup a VM with CentOS and install SQL 2017 on top of it. I am not going to walk through those steps again., Instead let’s see if the setup experience is any different from SQL 2017 on RHEL.

Before begin installing, Let me examine repos on my linux machine. you can do that by navigating to /etc/yum.repos.d folder.

Well, this is a brand new installation. That explains all the repositories I have on my machine! Now, It’s time to download SQL Server repo from microsoft.
command: 
sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/8/mssql-server-2019.repo

Once this part is done, it’s time for actual installation.
Command:
sudo yum install -y mssql-server

Alright, let’s perform the configuration portion(EULA,Edition,sa pwd) as instructed by setup program.
Command:
sudo /opt/mssql/bin/mssql-conf setup

Oops! My configuration failed. you can clearly see the reason why my installation failed. I was way too conservative in terms of granting resources to my VM. Once I bumped up my RAM to 4 GB, all went well.

Check status using either ps-ef | grep mssql-server or systemctl command as shown below.

Now it is time to install client tools on the server. Microsoft has provided a separate repo for client tools which we need to download similar to what we did in step 1.
Command:
sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/8/prod.repo
Note: Check RHEL version (7 or 8) and download appropriate repos.

Don’t forget to add sqlcmd and bcp tools to path to avoid providing literal path every time we need to access sqlcmd. I hope microsoft will take care of this as part of post installation in future releases instead of manual intervention. ( Symbolic links may be??)

Testing what we have done so far…..

That’s it for this blog post folks! As you can see the installation experience hasn’t been changed much from the last version of SQL on Linux!

Note: This is online setup experience, assuming your Linux machine can talk to internet. Microsoft did a great job on documenting online and offline installation methods. I would definitely recommend anyone to go through the documentation for more details.

Happy learning!

SSMS 18.0 – SQL Server Priority Boost

I’ve been playing with SQL Server 2019/SSMS 18.0 since past few days and noticed something really awesome, there’s no more “Boost SQL Server Priority” option is SSMS 18.0 !!!! It’s probably one of the most requested feature in SSMS by DBA community.

The name “Boost SQL Server Priority” sounds like an amazing switch which makes SQL run on steroids, but in reality it’s the other way around.

So, what exactly does this option do?
Ans: It doesn’t matter in year 2019, Let’s just pretend we never had this option 🙂  Well, If you are still curious what this is about, enabling this setting will make SQL Server process run in “HIGH_PRIORITY_CLASS”. Almost every process in windows run in “NORMAL_EXECUTION_CONTEXT” including network drivers, Storage drivers, I/O drivers and etc etc etc. In other words, you are making SQL Server run in higher execution context on CPU than almost any other process on your server.

So, when do we need to enable this on our SQL Servers?
Ans: Never. There’s a reason why microsoft removed this option from SSMS.

Enough background, see what I am talking about in the below screenshots.

Cheers!