Simulating a 824 IO Error in SQL ServerJ. Sounds Cool huh! Not really, Trust me, you won’t and never say 823/824 (and more often ignored Informational Sev 10 825 Message) is cool once you encounter it on your Production Database! Okay…..Enough scaring you people.
I tried to play a little bit with my test databases on my test server simulating these errors and I would like to share it with my peers.
As I posted in my previous post, I learnt this (in fact most of the SQL internals) from Paul Randal presentations and demos. Again….trust me if you want to dig into real internals of SQL Server, Paul Randal is the one for you. I will post my personal favorites in SQL Server world in a future blog (no disrespect to many other fabulous MCM’s and MVP’s around the world). I admire Paul at most…that’s it!
Note: you should not try this on any of your Server anywhere near your Domain. Try this on your own personal laptop just to get familiarize with these errors instead of surprises on your Servers; I’ve to warn you all sometimes it even takes reinstalling SQL Server!!
HEX EDITOR XVI32 – You can also, use your favorite editor for messing up with your M/N/LDF file.
Ok, It’s time to corrupt your Database (yes, you heard it right ;p)
Let’s choose “AdventureworksLT” database as a victim for our experiment. You can download that database from codeplex.com.
Let’s choose product table and PK_PRODUCT_PRODUCTID cluster Index. Before that we’ve to identify what underlying page is storing that clustered index. For that we can use DBCC IND.
DBCC IND(‘AdventureWorksLT’,‘SalesLT.Product’, 1) —Am using 1 since I’m targetting clustered index. From the results pane choose any page you want.
I choose PagePID = 500 as my target page.As I mentioned in my last post under DBCC lable, you can make use of DBCC PAGE to look at the underlying contents of that page. Also, you need to findout the starting offset for the page you’ve chosen(in my case i’ve chosen Poor Page 500 ;p) to change/edit/Damage using Editor. That becomes 4096000 for me(Since each page is 8KB). Use your own Math for your page!
Let’s do that just as an FYI. (I’m still learning how to exaclty interpret entire DBCC PAGE Output.)
Now, in order to access the Database file(s) you’ve to take the respective database offline, If not SQL Server will hold lock on it and you can’t edit succesfully using hexeditor.
It’s time to corrupt your Page:
Take DB Offline. (Your DB might not go Offline sometimes if it still has any Open connections in some cases.) Open your MDF file with Hex editor.
Search for your offset value, in my case it’s 4096000. I changed my values to x x x x from 0 0 0 0 (FYI i changed Column 12 to 16 in the highlighted row) and save the file and bring the database Online! Voilaaaa..That’s it. You’ve introduced corruption to your database.
Note: Don’t choose Page 9 of your First file in your Database. That’s the Boot Page and the most important piece for you database. DB may not come Online if you mess around your 9th page of 1st File.
Now It’s time to Test your work:
Run this Query on your Database, it’s nothing but you are hitting that index with a Select Statement.
SELECTTOP 1000 [ProductID],[Name],[ProductNumber],[Color],[StandardCost],[ListPrice],[Size]
Hohooooooooo…In results pane I’m seeing
Msg 824, Level 24, State 2, Line 2
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x3be14ad4; actual: 0x071b76e8).
It occurred during a read of page (1:500) in database ID 12 at offset 0x000000003e8000 in file
‘C:\Program Files\Microsoft SQL Server\MSSQL10_50.PROD\MSSQL\DATA\AdventureWorksLT_Data.mdf’.
Additional messages in the SQL Server error log or system event log may provide more detail.
This is a severe error condition that threatens database integrity and must be corrected immediately.
Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors;
for more information, see SQL Server Books Online.
Let’s run CHECKDB on this Database:
DBCC CHECKDB (‘AdventureWorksLT’) WITH NO_INFOMSGS,ALL_ERRORMSGS
Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 229575856, index ID 1, partition ID 72057594039828480, alloc unit ID 72057594041270272 (type In-row data). Page (1:499) is missing a reference from previous page (1:500). Possible chain linkage problem.
Msg 8928, Level 16, State 1, Line 1
Object ID 229575856, index ID 1, partition ID 72057594039828480, alloc unit ID 72057594041270272 (type In-row data): Page (1:500) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 229575856, index ID 1, partition ID 72057594039828480, alloc unit ID 72057594041270272 (type In-row data), page (1:500). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
Msg 8976, Level 16, State 1, Line 1
Table error: Object ID 229575856, index ID 1, partition ID 72057594039828480, alloc unit ID 72057594041270272 (type In-row data). Page (1:500) was not seen in the scan although its parent (1:182) and previous (1:501) refer to it. Check any previous errors.
CHECKDB found 0 allocation errors and 4 consistency errors in table ‘SalesLT.Product’ (object ID 229575856).
CHECKDB found 0 allocation errors and 4 consistency errors in database ‘AdventureWorksLT’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (AdventureWorksLT).
And ofcourse your ErrorLog will scream with whole bunch of StackDumps created bla bla…. As you can see below.
That’s it Folks! Pretty Simple huh!!
Play with DBCC IND and PAGE commands and get familiarize with them if you want to become a Geek.
Note: These are Undocumented commands. Use @ Ur own risk! Just want to remind you again, you should not practice this on any of your server on your domain. In other words if you care about that Server/Database don’t play with it with the above demo.