In the previous part of this series, we have seen what a cluster shared volume is and what are the advantages and other considerations to keep in mind when deploying CSVs for SQL Server workloads. In this article, I will walk though actual installation of a failover cluster Instance leveraging CSVs.
To begin with, I will walk you through my cluster setup from 20,000 foot view. I created two brand new VMs running windows server 2012 R2 and renamed them accordingly. Nothing special w.r.t disk drives at this point, Just basic VMs with a system drive(C$).
I also created 2 virtual networks in my VMWare workstation which I will be using for configuring my Public and private NICs on my nodes.
On my first node:
192.168.1.100 is the IPv4 address of my DNS server and below are my NIC settings.
Once IP address has been configured, below is how I joined my node to domain (sqltrek.local in my case).
Further reading: Deploy SQL with CSVs – Part 2
Microsoft SQL Server provides us with a wide variety of solutions to architect High availability (HA) and Disaster Recovery (DR) solutions for mission-critical workloads. In this article, let’s just focus on HA, specifically Failover Clustering. Failover clustering is probably the most mature, robust and stable high availability solution which Windows Server Operating system offers. It’s been there around for few decades now and did evolve over time along with SQL Server. In this article Let’s see a hidden feature of windows server failover cluster which helps in making our already highly available SQL Server Failover clustered instances even more highly available. The new feature which we are going to talk about is Cluster Shared Volumes, AKA CSVs. Considering windows server 2019 is around the corner, I say CSVs are not a new concept in clustering, it’s been there for almost a decade now. Microsoft introduced CSVs in windows server 2008R2, but at that time SQL Server was not supported on CSVs. Well, CSVs were originally designed for Hyper-V workloads and later on enhanced for File servers and eventually landed into SQL Server beginning version 2014.
Fair enough, but why should we care about Cluster shared volumes?
Well, the idea behind introducing CSVs is to provide truly shared disks to a failover cluster which are available to all the nodes for reading and write operations. Let’s talk about a traditional Failover clustered Instance setup for a moment. During Failover, to bring SQL Server resource online, the drives should be unmounted on the previous owner and remounted on the node which will act as the primary after failover. Should your IO subsystem become bottleneck for whatever reason, Unmounting and mounting process takes longer time thus impacting the availability of the system. Whereas with CSVs, there is no unmounting and mounting of disks since they are already made available for reading and write operations across all the nodes. In other words, it reduces downtime since SQL Server resource is no longer dependent on disks to come online. Let’s talk about one more scenario where CSVs outperform traditional shared storage. Let’s assume disk(s) loses connectivity from the node which is currently running SQL Server in middle of the day, under these circumstances the cluster can leverage another path (s) available to the shared disk without having to failover the resource group to another node. This will save us from potential unplanned downtime during business hours.
How CSVs work?
Further reading : Deploy SQL Server with CSVs
Couple of weeks ago I was analyzing a server for space and noticed SSISDB database was abnormally huge (this Instance was running just a handful of packages). I noticed couple of internal schema tables in SSISDB were huge (with some hundreds of millions of rows), well that’s not right. There should be SSIS Server maintenance job which SQL server creates to purge older entries based on the retention settings right? My immediate action was to check the retention period set and what’s the status of the job. As I suspected, the job was failing (looks like this has been failing since ages) with below error.
The job failed. The Job was invoked by Schedule 9 (SSISDB Scheduler). The last step to run was step 1 (SSIS Server Operation Records Maintenance).
Execute as Login failed for the requested login ‘##MS_SSISServerCleanupJobLogin##’
Next step is to investigate whats up with the user account which the job is complaining about and I noticed ‘##MS_SSISServerCleanupJobUser##’ is orphaned in SSIS database as shown below.
Compare with how it looks on an Instance where this Job is running successfully.
Well, with this the fix is pretty evident, Map the poor fella to the corresponding login (Syntax shown below).
Once this has been done, I kicked off the job and it ran successfully (It took almost 2 hours to finish – had to purge few millions of records) and the subsequent scheduled runs were quick (Just few seconds as expected)
In this blog post, let’s see how to regain admin access on a SQL Server Instance in case you lost it by mistake or for whatever reason. It’s not a very common scenario, but hey you never know. I ran into this some time last week(Fortunately it’s in our POC environment), Okay, Here’s the deal – we have a POC SQL Instance which was installed by an individual who is no longer working with us and apparently he forgot to make our DBA grp as sysadmins. Basically we don’t have admin rights to our own SQL Instance, SA account is disabled(Well, No one has no clue what that pwd was to begin with). So, how did we recover from this disastrous event? Before going to the details on how I fixed it, below is where I started with.
- Since this is a POC box, we(DBA Grp) are already members of Local Admins on the server level. Okay, that’s a good start. In case you are not local admins at windows level, You should request your Windows admins to grant you those rights(At least temporarily for the duration you regain access to SQL)
- I stopped SQL Server services and started(Made sure Just the DB Engine is started) in single user mode by using “-m” switch as a startup parameter.
3. Now I connected to my SQL Server instance(You can use either SSMS or SQLCMD). I will show you how to use SSMS(Run as Admin just to be sure) using the trick I showed here.
Note: when using SSMS method, don’t hit connect in your “Connect to Server” window. You will probably end up with multiple connections and yours might not be the first connection(Spoiler: Object Explorer)
I hit cancel in the above window and I selected “Database Engine Query” and created our DBA Grp login and granted sysadmin rights.
4. Once that’s done, All I had to do was go to configuration manager and remove SQL from single user mode and restart my services and Voilaa we are in as Sysadmins! If you are curios why it worked, SQL Server when started in single user mode allows any user who is a member of BUILTIN\ADMINISTRATORS group at windows level login as sql server admin. This is by design to recover form these kind of situations.
I hope you won’t get into this awkward situation, but you are probably here reading this blog post because you are dealing with this right now 🙂 Hope this helps. Cheers!
In this short blog post, Let’s see a strange error which one could encounter when dealing with permissions in SQL Server Analysis Services(SSAS). So, I was working on a DEV SSAS Instance to grant admin rights for our DW team this morning and ran into something which is very interesting. Basically I was unable to add any new AD user accounts/Groups to the server administrator role. SQL Server was throwing below error at me.
The following system error occurred: No mapping between account names and security IDs was done.
If you notice in the above screenshot, there are couple of stale AD accounts(Shown as random GUIDs). These accounts were added couple of years ago(These guys are no longer working here and they were deactivated in AD) and SSAS is basically validating all the accounts which are listed under server administrator role when I am trying to add a new valid account, not just the one which I am currently adding.
Simple, I just removed those stale SIDs and presto!! I ran into a similar issue just few weeks ago when I was using Kerberos configuration manager where I had to remove invalid accounts(listed as Random GUIDs) from local admin group on the windows server.
So there you go, if you ever run into this issue, the fix is simple, Just get rid of those garbage accounts.
Transparent Data Encryption (TDE) was originally introduced in SQL Server 2008 (Enterprise Edition) with a goal to protect SQL Server data at rest. In other words, the physical data and log files along with the database backup sitting on file system are protected (encrypted).
Few things to be aware of when implementing TDE:
With TDE, The data transmitted over the network is not encrypted and the data at the object level remains unencrypted. In other words, if a user has select access to a table(s) within the TDE enabled database, he/she will be able to read data with simple select statements, as the name suggests, it’s transparent. TDE does not protect FILESTREAM data and any files related to Buffer Pool Extension (BPE) are not encrypted as well, you should use file system encryption tools like windows Bit-locker or any other third party tools for this purpose. Another caveat is TDE doesn’t support Instant File Initialization for database files. Also, when TDE is enabled on a user database your tempdb database gets encrypted behind the scenes.
Okay, let’s move on to our topic. In this article, we will see how to monitor and manage TDE progress, not essentially how to setup TDE on a user database. Before moving on to our main topic, here is a quick refresher on how Transparent Data Encryption works.
Enabling TDE on a given database is a very straightforward process. 20,000-foot view of the process is basically creating a DMK (Master DB) which is protected by Service Master Key, Cert (Master DB), DEK (User DB) and enable TDE(User DB) and you are done. But things get little tricky when you are dealing with VLDBs. What if you have a ginormous database (Let’s say a 30 TeraBytes monster) on which you have to enable TDE? Enabling TDE is not instantaneous, the SQL Server Encryption Scanner has to read all the underlying database pages and encrypt them, For a 30 TB database it might take multiple days for SQL Server to encrypt the entire database and we as DBAs should monitor the encryption progress making sure there are no side effects. In this context I am not talking about server resources, I am talking about the impact on the transaction log file when encryption scanner is in progress. So, why should we worry about LDF file when TDE scanner is in progress? Well, SQL Server doesn’t truncate the transaction log file of your database when TDE Encryption Scanner is doing its job. Things get more complicated if you are not allowed to run TDE during business hours and let’s say you have nightly ETL loads or some other scheduled job(s) which generates a considerable amount of log records. In this article, let’s see how to monitor TDE progress and how to manage transaction log when TDE scanner is in progress.
Further reading : Monitoring/Managing TDE
The already awesome Ola’s Maintenance solution for SQL Server became more awesome with his latest release. If you are not aware of Ola’s maintenance solution, you should definitely check it out. In the July 2018 release, quite a few enhancements were introduced. My personal favorites are ability to add percentage of modified rows to update stats, resumable index rebuilds support and support for EMC DDboost(Data Domain Boost). In fact I was badly waiting for DDboost support last year(we were using DDboost to backup our VLDBs). Unfortunately, DDboost was not something Ola’s solution suported at that time and what I ended up was writing up my own stored procs calling “dbo.emc_run_backup” and “ddbmexptool.exe” based on our Lockbox settings and scheduled the jobs as needed. Well, I got it worked without any issues but had to spend quite a bit of time to understand all the bells and whistles of DDboost and come up with my own procs and error logging.
If you happen to have any plans evaluating DDBoost for SQL backups now or in near future, well you have an awesome solution already available. All you need to do is pass ‘DATA_DOMAIN_BOOST’ as a parameter for third-part backup software in Ola’s backup job and specify below parameters.
Image source: https://ola.hallengren.com/sql-server-backup.html
A big shout-out to Ola! Cheers, Happy weekend.