Wednesday, February 11, 2015

Troubleshooting - Failed to write file data on cluster disk 1 partition 1, failure reason: The disk structure is corrupted and unreadable

One of the initial task in setting up a windows cluster for SQL failover cluster, is to validate if the servers fulfill all the network, storage, hardware and software requirements. It helps you to work through the setting up and straights things up before the really deal.

Sometimes, it is a step with full of surprises, especially when you don't have the control over how the server was set up. To be fair, it is quite a list of items and it is fair to say my network/system team missed on one or two things, which is still acceptable. Validation is like a QC check against the built servers to match the standard. On the other hands, some issue would just a pure kind of jerk that shows up to make your hard day harder.

I have the following issue returned, on the storage section, which make me and my network/SAN dude scratch our head a bit. The "Validate a configuration" check shows:

Interestingly, new shared drive created at the SAN shows up as corrupted. I tested the drive and I can read and write to it. It just not pass the validation. First come to my mind, is to reformat the drive 1 from the disk management msc. And that doesn't help. Indeed, I realise that the approach was wrong.

I researched on a few forums to look for the solution. First everyone suggested that (potential) Cluster Disk 1 doesn't corresponding to the physical disks which listed in the disk management msc. To determine what disk is the troublemaker, we need to check the List Potential Cluster Disks at the storage section in the validation report, then take a note of the disk ID. For example, I have an issue with Cluster Disk 1, and look up the list of potential cluster disks give me the ID c188f5ac:

Then, from the List All Disks, which lists any disk attached to the servers, we can then identify the disk in trouble with the disk ID mentioned above.

Now that we know which disk requires our attention, so let's get the solution out. There is a few potential reason that may contribute to the symptom that "The disk structure is corrupted and unreadable". I have a disk drive issue, what should I do? It sounds like one of my old days job interview question.

It would be the old school "chkdsk /f". The truth is, if you found 1 drive shows up disk drive issue, I would run chkdsk against all the drives, just in case. In my case, there are 3 drives out of 8 reported with drive issue, and "chkdsk /f" just fixed that.

No comments:

Post a Comment