Cluster Logs in Windows Server 2008??

In this Blog post, let’s focus on where/how we can review Failover Cluster Logs on Windows Server 2008 and above. As most of us know on Windows Server 2003 Cluster, we used to have  “cluster.log” file on each node participating in cluster, which contains debug information. FYI, One can locate these files in “%systemroot% \ cluster” Folder. But how about cluster log files in Windows Server 2008/2008R2?? Uhuhh…It’s not something which you can review directly by navigating to systemroot folder. Below is the screenshot of that folder in my cluster.

You can see a folder called “REPORTS” in the above screenshot where all the cluster Validation Reports will be stored by default. attaching below Screenshot Just to prove, that cluster.log file can’t be located in the “reports” folder as well 🙂

Starting Windows Server 2008, cluster logs are managed by something called as “Windows Event Tracing“. Just an FYI, If you are interested, You can pull all the current  running traces by opening “perfmon” and navigating to Data Collector Sets. (Shown Below in the Screenshot)

So, as any other logs, cluster logs are stored in “C:\Windows\System32\winevt\Logs” folder with “etl” extension as you can see below.

Well, so How to read those .ETL files??

For that, we have to use “cluster.exe” command with “/gen” switch. Basically this will generate a human readable text file in your “Reports” folder.

Syntax: Cluster log /gen

Output:

 

As you can see in the above Screenshot, it will communicate with all the nodes in your cluster. In my scenario, Node2 is offline(Powered down).  BTW, even though Node2 is down, it will create “Cluster.txt” file in your Reports Folder with related information.

So, how to generate Logs related to a specific Node?

You have to use “/NODE” switch with your cluster log syntax. Please see below Screenshot.

As you can see, this time, we had no RPC Errors.

So, there is lot to explore/learn in 2008 Failover Clustering, if you are using 2003 since long time, things got changed drastically. There are lot of other options/switches available with cluster.exe. Even you can limit the size if you are interested. BTW, everything which I’ve shown here can be achieved via Powershell Cmdlets as well!

Hope this is informative….

DPM(Data Protection Manager 2010) Gotchas….

In the previous post, we’ve seen what is DPM and what can be achieved via DPM at a very High level. In this Blog Post, let’s see few GOTCHAS(The Oops Moments 😉 , things you should be aware of) to keep in mind implementing DPM.

Let me get this very straight, Absolute 100% protection/Up time  is of course not possible. Yes, this is the same case with any other technology out there in market.  Most of the cases, we the administrators, see Business saying ” we can’t tolerate even a single second  of downtime, we can’t tolerate any data loss. Absolutely we need all the data to be recovered at any given point of time”. Let me say this, in practical world, this is not possible. Step back a little and work with your business to understanding the real business needs. Make them understand how technology works. Define RPO’s and RTO’s, Document SLA’s. Define allowable Maintenance Window, where we can perform any maintenance tasks on our Servers. The bottom line is, we should be able to recover from a disaster with least amount of data loss and least amount of time as much as we can!

Anyways, few things which one should be aware of before implementing DPM:

  • DPM is heavily dependent on your Network Bandwidth as any other Backup tool available out there. Of course It has to copy data over wires across multiple Sites. With Compression enabled, make sure your servers have enough CPU cycles available.
  • Of course, It’s also heavily dependent on your Disks performance.
  • Initial Replica might take considerable amount of time depending on amount of data and Network Speeds.
  • DPM doesn’t support FAT Based Disks. Disks can be either DAS, or SAN Based with NTFS.
  • Disks within DPM Storage Pool should not contain any other Application Data, Volumes etc. All the Partitions/Volumes will be erased while initial Configuration of your Storage Pool.
  • 15 Minutes is the least possible frequency for your Recovery Points.
  • DPM Can’t be Installed on Machines with Fail Over Clustering Services being enabled. You’ve to remove that Role prior to Installation of DPM. Can’t be Installed on a machine with SCOM Agent on it.
  • Be prepared for Multilpe Reboots for successful Installation of DPM.
  • You can either Encrypt your Data or Compress your Data, but not both!
  • If you’ve to use BMR(Bare Metal Recovery), System State Protection must be enabled.
  • You can’t Backup Junction Points, recycle BIN, Paging File,SysVol Information Foder.
  • Protected Systems can’t be moved between Protection Groups on fly. It’s not supported. You’ve to manually remove system from Protection Group and add it to new Protection Group.
  • Don’t use SQL Server Application Protection for backing up your MOSS(Share point) Databases if you are already backing up MOSS Application. DPM will get confused with LSN’s and your Backups will fail.
  • If you are backing up your SQL Databases, make sure that no other tool besides DPM is truncating your T-Logs. DPM Backups will fail in that case.
  • If you plan to Install DPM SQL Databases remotely, make sure that you’ve to acquire License for your SQL Server. If you’ve chosen to Install Locally on the same machine, DPM will cover SQL License for you!
  • DPM Installation will fail, if your SQL Installation fails. Make sure to Install DB Engine, SSRS, SQL Client Connectivity SDK and Management Tools as bare minimum. For remote SQL Deployments, Named Pipes must be enabled.  We’ve to Install SQL Server DPM Support Files prior to DPM Installation( Can be found in SQLPREPINSTALLER folder in your DPM Media)

Hope this is informative! BTW, There’s fabulous documentation from microsoft on troubleshooting any issues related to DPM.

It can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=15954.

Data Protection Manager(DPM) – Protected System Requirements!

Let’s deviate a little bit from SQL Server in this Post and see what is DPM and how one can get benefited by using/implementing DPM in their environments at a very High Level.

Why I’ve started exploring and teaching myself DPM? Well, being a Consultant, I’ve to work with multiple clients and Each client will be using their own set of tools(can be MSFT or even any Third Party Tools). Being said that, more I’ve vision on Microsoft tools and Products, and other 3rd Party tools related to my Skills,  more I can succeed in my career. Single point I love working as a consultant – I’ll get an opportunity to learn and explore various technologies which helps me growing Up and in return I can help another client(may be in next assignment).As I always say, best way to learn something is teach yourself. Create a Lab, simulate a corporate environment and start getting familiar with whatever the tool/product is!

Anyways…So, what is DPM? Well, as the name indicates it is a tool for Protecting our Mission Critical Data. It can be either a File Server, an Exchange Server, a Sharepoint Server or a SQL Server or your Windows Server Itself or even your Client/Desktop Machines! DPM 2010 has some real cool features and I’ve observed recently, many companies started leaning towards implementing DPM rather than depending on some third Party Products as their Backup/Recovery Solution. I don’t mean, 3rd Party tools can be Ignored either…IMHO, I prefer being as a complete Microsoft Shop rather than dealing with multiple vendors in case of any support needed. YMMV! I may be coming up with few other posts showing some DEMO’s, once I Install DPM in my LAB!  Okay…enough blabbering, It’s time to do some justice to the Post Title now 😉

So, what are the minimum requirements for your protected Systems, if you want DPM 2010 to support them?

Supported Win Servers: Windows Server 2003-2008R2. Yes, DPM 2010 won’t support your Windows Servers 2000(Hope you don’t have any in your shop), Small Business Server 2008. For Win Server 2003, HotFix KB940349 is Mandatory. For Win Server 2008R2 you’ve to enable Windows Backup Service manually(Basically adding a new feature from your Server Manager)

Application Servers:

SQL Server 2000 through 2008R2. Note: SQL 2000 should be running on minimum SP4 and SQL 2005 RTM is not supported as well.(VSS – Volume Shadow Service) must be enabled.

Exchange Server 2003 through 2010.

MOSS(Sharepoint Server) 2003 through 2010.

Other General Considerations to be considered:

It won’t support FAT!! Only NTFS Disks are supported. If you are considering using System State Backup/BMR(Bare Metal Recovery), you should have additional 20 GB of free space available.

Note: It’s always better to have latest Service Pack/Hotfix Installed on your Servers for avoiding any potential Issues.

Once you Install DPM on any Server, It will automatically create a SQL Instance for it’s own. You can choose Local Server(on DPM) or a remote Server. I prefer Local Server and Keeping DPM SQL Instance dedicated for those Databases. Don’t create your Application Databases on DPM SQL Server!

One Interesting Point with DPM 2010 is, It will Backup your SQL Server 2008R2 Servers, but actual DPM SQL Instance should be on SQL Server 2008. In other words, the DPM Databases cannot be hosted on SQL Server 2008R2 and Agent Service should be set to Automatic(Clustering your DPM SQL Server is not Supported, but you can backup Clustered SQL Servers)…Hope Am not confusing you 🙂

Epitome – I’m really glad that I got an opportunity to explore this awesome tool in recent days. Thanks to the person(Shh..It’s a secret ;)) who made me to think about this tool! As any other tool, You’ve to remember few GOTCHAS…before rolling out this guy into production. Let’s see those in future posts…

Hope this in informative…Cheers!

Quorum Drive in Failover Cluster(Windows Server 2008R2)?

Let me share one of the interesting conversations I had couple of days ago with one of my colleagues.  This was regarding Quorum Drive in Failover Cluster(This was Windows Server 2008R2).

Let me tell you the story in short. ” She had a Question regarding Quorum(Q Drive typically) disappearing from Failover Cluster Manager. She also verified all the 3 Nodes(Remember this was a 3 Node Cluster) in the cluster and was not able to locate the Q Drive.

So, is this Normal in Failover Cluster?? The Answer is Absolutely Yes, this is perfectly normal for any Failover Cluster(Starting Win Server 2008) with Odd Number of Nodes. Quorum Drive is not mandatory anymore starting Win Server 2008. This might be really confusing for folks who are coming from Windows Server 2003! Yes, you heard it right..There is no need of having a dedicated Drive to act as Quorum. In this case our Cluster Quorum was configured as “Node Majority“. In Other Words, all the nodes will be participating in forming a Quorum and will be voting. If one Node dies, we still have 2 nodes Up and Running(so, we still have Majority of votes) and hence our cluster will be running without any issues. If 2 Nodes dies, majority of Nodes are down, at this point our cluster goes down. The basic idea is to avoid single point of failure!

Modes of Quorum starting Windows Server 2008:

Node Majority: Each node that is available can vote. The cluster functions only with majority of the votes. – For Odd Number of Nodes.

Node and Disk Majority: Each node + a Dedicated Drive(Typically Q) will be voting. – For Even Number of Nodes.

Node and File Share Majority: Each node + File Share Witness will be voting. ( Personally, I wouldn’t prefer this for any number of nodes)

No Majority(aka Disk only): This is what we used to have till Windows Server 2003, where a dedicated Disk will be acting as a Quorum.

Note: Leave the Quorum Config as is, chosen by windows by default when configuring the cluster.  Windows is smart enough to choose appropriate Quorum configuration for your cluster. Change only if you are sure about what you are doing. Also, please note that Quorum configuration can be changed any time even after creating a cluster.

In My lab, I’ve a 2 Node Single Instance Cluster. So I’ve Node and Disk Majority as my Quorum. Please see below screenshots.

See Quorum Configuration as Node and Disk Majority. ( Just Ignore the warning above :D, my Node2 is turned off as of now, hence that warning!)

How to change the Quorum Configuration?

Right click on your Windows Cluster and choose More Actions as shown below.

Now, you can click Next and choose your Quorum Configuration type and proceed further.

Just want to remind again, Don’t change anything unless you understand completely what you are doing!!…

CNOs/VCOs(Computer Objects) and few ways to protect them…!

If you already have experience working on Clustered Environments, you might already know about CNO(Cluster Name Object) and VCO(Virtual Computer Object). For Newbies, let me explain what CNO and VCO are in a line or two…

CNO: This is the Core piece of your Windows Cluster and acts as an identity of your Windows Cluster. This is a computer Object which will be created in your AD under Computer Node(under your Domain or OU, if you have any).  It will be same name as your Cluster.

VCO: Again, these are the Objects being created in AD under Computer Node depending on the Services and Applications which you are creating inside your Cluster. Yes, CNO is responsible for creating those VCO’s.  CNO’s should not be deleted or not even touched in terms of security by any means and by any person.  Services won’t come Online if CNO permissions are modified or CNO gets dropped accidentally, which is a potential threat for your cluster.

In order to Recover from deleted CNO situation, your Domain Admin should be involved and he/she needs to restore your Active Directory Objects which is not a simple task, especially in larger enterprises. Good News is Starting Windows Server 2008R2, we’ve something called Active Directory RecycleBin which is an awesome way to recover AD Objects. Hold on Guys….there’s a GOTCHA though!

Gotcha: AD Recycle Bin is not Enabled by Default. It has to be enabled within your Domain by your Domain Admin!

What if we communicate with our Domain/OU/Server Admins to enable some settings which basically prevents any accidental deletion of Computer Objects? It would be really nice if we could prevent that deletion action in first place right, instead of recovering after a disaster. So what can be done here? Windows Server 2008/2008R2 offers a really simple way to prevent these accidental operations(mostly Human Mistakes). There’s a small checkBox which we should enable to make this happen. Once enabled, It won’t let anyone to delete that Object.

Demo:

I’m on my Domain Controller and I’ve opened AD Users and Computers from Administrative Tools. You can see WINCLUST is my CNO.

Very Imp: Now You’ve to go to View and select Advanced Features, to be able to see/perform all the available options/operations we’ve. You can see below

Now, I’m trying to protect my CNO from accidental deletion. All I’ve to do is Right click on CNO and select properties and navigate to “Object” tab and check that tiny box as shown below:)

That’s it! Is it really hard? Nope. You can check with your Domain Admins to make sure that this is checked on all your CNO’s and VCO’s.

Note:

All the New OUs are automatically set to be protected.

Any New Users/Groups are not set to be automatically protected.

Any New Computers are not set to be automatically protected.

With the protection being enabled, now let’s see what happens if we try to delete that CNO manually from my AD.

Note: Don’t even think about doing this in your Company(In the First Place, we will not be having those level of privileges). I bet, you’ll be fired the very next moment!

I got this warning Message saying, Are you really Sure about what you are trying to do here?? See below Screenshot.

Let’s say…I’m one Stupid Guy and went ahead and clicked on Yes. The below is the screenshot of what I got.

Remember I logged onto this as a Domain Administrator, even then..Windows is saying, “Uhuhhhhh….No Idiot! I’m not letting you to perform this operation unless you uncheck that tiny box which we checked earlier” 😀

Isn’t it Something Awesome Guys? I really really encourage you to check with your Server Admins on this Option being enabled if you are responsible/accountable for some Mission critical Production SQL Server Clusters.

Hope this is useful info you learnt something new! Cheers!…