Steps required to repair voting disks
An alternative Suggestion – undrop ASM disk
Although this document covers the steps followed at the time for this case another suggestion came in once the work was completed.
Â
alter diskgroup undrop disk
This seems like a sensible approach given an error received earlier in the day trying to add the erroneous disk without clearing it out first :
Â
ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”
Should this have been attempted and been successful that would probably have been enough once the final checks had been carried out. I say this as ocrcheck had shown a clean bill of health for the OCR through this process.
Â
Reference
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/votocr.htm#CHDHBBIJ
Steps Required to restore a Voting Disk
Make a note of the current Voting disk details.
[root]# /oracle/dbadmin/scripts/multipath_l.ksh -a
RAW Device Size ASM Disk Based on Minor,Major
==========Â Â Â Â Â ====Â Â Â Â Â Â Â Â Â Â ========Â Â Â ========Â Â Â ===========
:
VOTE1_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE1Â Â /dev/dm-54Â [253,54]
VOTE2_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE2Â Â /dev/dm-56Â [253,56]
VOTE3_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE3Â Â /dev/dm-58Â [253,58]
VOTE4_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE4Â Â /dev/dm-59Â [253,59]
VOTE5_01 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE5Â Â /dev/dm-60Â [253,60]
Â
Take a Manual Backup (just in case)
As root on one node
Â
cd /oracle/GRID/11203/bin
./ocrconfig -manualbackup
:
2012/06/21 15:30:34Â Â Â Â /oracle/GRID/11203/cdata/clustername/backup_20120621_153034.ocr
./ocrconfig -showbackup
:
wyclorah011Â Â Â Â 2012/06/21 15:30:34Â Â Â Â /oracle/GRID/11203/cdata/racsaplp1a/backup_20120621_153034.ocr
Shutdown CRS and restart on one node in exclusive mode.
[root]# pwd
/oracle/GRID/11203/bin
[root]# ./crsctl stop crs
[root]# ./crsctl stop crs
[root]# ./crsctl stop crs
CRS-4000: Command Stop failed, or completed with errors.
So I forced the issue
[root]# ./crsctl stop crs -f
This hung trying to stop the ASM instance (alert log showed this). So I killed the ASM pmon process which immediately freed up the stop crs which, in turn, completed successfully.
Then restart on one node in exclusive mode.
[root]# ./crsctl start crs -excl -nocrs
ensure that the crsd process did not start
[root]# ./crsctl stat res -init -t
——————————————————————————–
NAMEÂ Â Â Â Â Â Â Â Â Â TARGETÂ STATEÂ Â Â Â Â Â Â SERVERÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â STATE_DETAILS
Cluster Resources
ora.asm 1 ONLINE ONLINE node1 Started
ora.cluster_interconnect.haip 1 ONLINE ONLINE wnode1
ora.crf 1 OFFLINE OFFLINE
ora.crsd 1Â Â Â Â Â Â Â OFFLINE OFFLINE
ora.cssd 1 ONLINE ONLINE node1
ora.cssdmonitor 1 ONLINE ONLINE node1
ora.ctssd 1 ONLINE ONLINE node1 OBSERVER
ora.diskmon 1 OFFLINE OFFLINE
ora.drivers.acfs 1 ONLINE ONLINE node1
ora.evmd 1 OFFLINE OFFLINE
ora.gipcd 1 ONLINE ONLINE node1
ora.gpnpd 1 ONLINE ONLINE node1
ora.mdnsd 1 ONLINE ONLINE node1
Re-create the errant OCR Disk.
We can see from this query that the disk is still a valid ASM disk and marked as a Voting disk.
oracle wyclorah010> . ./crs_env
wyclorah010[+ASM1]>sqlplus / as sysasm
SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;
GROUP_NUMBER NAME FAILGROUP PATH
0 /dev/oracleasm/disks/OCR_VOTE5
16 OCR_VOTE_0003Â Â Â Â Â Â OCR_VOTE_0003Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE4
16 OCR_VOTE_0002Â Â Â Â Â Â OCR_VOTE_0002Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE3
16 OCR_VOTE_0001Â Â Â Â Â Â OCR_VOTE_0001Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE2
16 OCR_VOTE_0000Â Â Â Â Â Â OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE1
Earlier in the day I had tried to add it back into the diskgroup and was given short shrift.
From the ASM alert log :
ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”
ERROR: ALTER DISKGROUP OCR_VOTE ADDÂ DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M /* ASMCA */
So a deleted it followed by a scan disks on the other nodes.
[root]# oracleasm querydisk ‘/dev/oracleasm/disks/OCR_VOTE5’
Device “/dev/oracleasm/disks/OCR_VOTE5” is marked an ASM disk with the label “OCR_VOTE5”
[root]# oracleasm deletedisk OCR_VOTE5
Clearing disk header: done
Dropping disk: done
[root]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Cleaning disk “OCR_VOTE5”
Scanning system for ASM disks…
[root]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Cleaning disk “OCR_VOTE5”
Scanning system for ASM disks…
And then re-created the ASM disk. Good job I made a note of this earlier.
[root]# oracleasm createdisk OCR_VOTE5 /dev/mapper/VOTE5_01
Writing disk header: done
Instantiating disk: done
[root]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Scanning system for ASM disks…
Instantiating disk “OCR_VOTE5”
[root]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks…
Scanning system for ASM disks…
Instantiating disk “OCR_VOTE5”
Add the disk to the diskgroup.
[root]# su – oracle
Emergency Local Admin Environment configured
oracle > . ./crs_env
[+ASM1]>sqlplus / as sysasm
SQL> ALTER DISKGROUP OCR_VOTE ADDÂ DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M;
Diskgroup altered.
I’m not sure that I needed to do this, ocrcheck always returned a valid status when run before attempting this fix. I wish I had run another ocrcheck and crsctl query css votedisk before doing this restore.
Anyway, the restore was run as follows :
[root]# ./ocrconfig -restore /oracle/GRID/11203/cdata/clustername/day.ocr
The note I was following suggested that I should run the following on the other nodes
ocrconfig -repair –replace
but I missed this, it doesn’t seem to have mattered.
Check Voting Diskgroup and OCR Integrity.
[+ASM1]>sqlplus / as sysasm
SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;
GROUP_NUMBER NAME FAILGROUP PATH
16 OCR_VOTE_0004Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0004Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE5
16 OCR_VOTE_0003Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0003Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE4
16 OCR_VOTE_0002Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0002Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE3
16 OCR_VOTE_0001Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0001Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE2
16 OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE1
[root]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                 :         3
Total space (kbytes)Â Â Â Â :Â Â Â Â 262120
Used space (kbytes)Â Â Â Â Â :Â Â Â Â Â Â 5260
Available space (kbytes) :Â Â Â Â 256860
IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â 207396515
Device/File Name        : +OCR_VOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root]# ./crsctl query css votedisk
## STATE   File Universal Id                File Name Disk group
1. ONLINEÂ Â 16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]
2. ONLINEÂ Â 01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]
3. ONLINEÂ Â a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]
4. ONLINEÂ Â 32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]
5. ONLINEÂ Â 1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]
Located 5 (yes five) voting disk(s).
Stop CRS on the on exclusive node and restart on the other three.
[root]# ./crsctl stop crs
And then restart
[root]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
Check everything comes up on all nodes.
[root]# ./crsctl stat res -init –t
On all nodes
[root]# ./crsctl query css votedisk
## STATE   File Universal Id               File Name Disk group
1. ONLINEÂ Â 16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]
2. ONLINEÂ Â 01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]
3. ONLINEÂ Â a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]
4. ONLINEÂ Â 32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]
5. ONLINEÂ Â 1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]
Located 5 voting disk(s).
[root]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                 :         3
Total space (kbytes)Â Â Â Â :Â Â Â Â 262120
Used space (kbytes)Â Â Â Â Â :Â Â Â Â Â Â 5260
Available space (kbytes) :Â Â Â Â 256860
IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â 207396515
Device/File Name        : +OCR_VOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
oracle +ASM1 > cluvfy comp ocr -n all -verbose
Verifying OCR integrity
Checking OCR integrity…
Checking the absence of a non-clustered configuration…
All nodes free of non-clustered, local-only configurations
ASM Running check passed. ASM is running on all specified nodes
Checking OCR config file “/etc/oracle/ocr.loc”…
OCR config file “/etc/oracle/ocr.loc” check successful
Disk group for ocr location “+OCR_VOTE” available on all the nodes
NOTE:
This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck’ as a privileged user to verify the contents of OCR.
OCR integrity check passed
Verification of OCR integrity was successful.
oracle +ASM2 > crsstat | grep OFFL
ora.gsd                       OFFLINE, OFFLINE, OFFLINE
OK
Â
Discussion ¬