Steps required to repair voting disks

on July 2, 2014 at 11:06 am

An alternative Suggestion – undrop ASM disk

Although this document covers the steps followed at the time for this case another suggestion came in once the work was completed.

alter diskgroupÂ undrop disk ;

This seems like a sensible approach given an error received earlier in the day trying to add the erroneous disk without clearing it out first :

ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”

Should this have been attempted and been successful that would probably have been enough once the final checks had been carried out. I say this as ocrcheck had shown a clean bill of health for the OCR through this process.

Reference

http://docs.oracle.com/cd/E11882_01/rac.112/e16794/votocr.htm#CHDHBBIJ

Steps Required to restore a Voting Disk

Make a note of the current Voting disk details.

[root]# /oracle/dbadmin/scripts/multipath_l.ksh -a

RAW Device Size ASM Disk Based on Minor,Major

==========Â Â Â Â Â ====Â Â Â Â Â Â Â Â Â Â ========Â Â Â ========Â Â Â ===========

VOTE1_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE1Â Â /dev/dm-54Â [253,54]

VOTE2_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE2Â Â /dev/dm-56Â [253,56]

VOTE3_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE3Â Â /dev/dm-58Â [253,58]

VOTE4_01Â Â Â Â Â Â Â 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE4Â Â /dev/dm-59Â [253,59]

VOTE5_01 2.0GÂ Â Â Â Â Â Â Â Â Â OCR_VOTE5Â Â /dev/dm-60Â [253,60]

Take a Manual Backup (just in case)

As root on one node

cd /oracle/GRID/11203/bin

./ocrconfig -manualbackup

2012/06/21 15:30:34Â Â Â Â /oracle/GRID/11203/cdata/clustername/backup_20120621_153034.ocr

./ocrconfig -showbackup

wyclorah011Â Â Â Â 2012/06/21 15:30:34Â Â Â Â /oracle/GRID/11203/cdata/racsaplp1a/backup_20120621_153034.ocr

Shutdown CRS and restart on one node in exclusive mode.

[root]# pwd

/oracle/GRID/11203/bin

[root]# ./crsctl stop crs

CRS-4000: Command Stop failed, or completed with errors.

So I forced the issue

[root]# ./crsctl stop crs -f

This hung trying to stop the ASM instance (alert log showed this). So I killed the ASM pmon process which immediately freed up the stop crs which, in turn, completed successfully.

Then restart on one node in exclusive mode.

[root]# ./crsctl start crs -excl -nocrs

ensure that the crsd process did not start

[root]# ./crsctl stat res -init -t

——————————————————————————–

NAMEÂ Â Â Â Â Â Â Â Â Â TARGETÂ STATEÂ Â Â Â Â Â Â SERVERÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â STATE_DETAILS

Cluster Resources

ora.asm 1 ONLINE ONLINE node1 Started

ora.cluster_interconnect.haip 1 ONLINE ONLINE wnode1

ora.crf 1 OFFLINE OFFLINE

ora.crsd 1Â Â Â Â Â Â Â OFFLINE OFFLINE

ora.cssd 1 ONLINE ONLINE node1

ora.cssdmonitor 1 ONLINE ONLINE node1

ora.ctssd 1 ONLINE ONLINE node1 OBSERVER

ora.diskmon 1 OFFLINE OFFLINE

ora.drivers.acfs 1 ONLINE ONLINE node1

ora.evmd 1 OFFLINE OFFLINE

ora.gipcd 1 ONLINE ONLINE node1

ora.gpnpd 1 ONLINE ONLINE node1

ora.mdnsd 1 ONLINE ONLINE node1

Re-create the errant OCR Disk.

We can see from this query that the disk is still a valid ASM disk and marked as a Voting disk.

oracle wyclorah010> . ./crs_env

wyclorah010[+ASM1]>sqlplus / as sysasm

SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;

GROUP_NUMBER NAME FAILGROUP PATH

0 /dev/oracleasm/disks/OCR_VOTE5

16 OCR_VOTE_0003Â Â Â Â Â Â OCR_VOTE_0003Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE4

16 OCR_VOTE_0002Â Â Â Â Â Â OCR_VOTE_0002Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE3

16 OCR_VOTE_0001Â Â Â Â Â Â OCR_VOTE_0001Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE2

16 OCR_VOTE_0000Â Â Â Â Â Â OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE1

Earlier in the day I had tried to add it back into the diskgroup and was given short shrift.

From the ASM alert log :

ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”

ERROR: ALTER DISKGROUP OCR_VOTE ADDÂ DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M /* ASMCA */

So a deleted it followed by a scan disks on the other nodes.

[root]# oracleasm querydisk ‘/dev/oracleasm/disks/OCR_VOTE5’

Device “/dev/oracleasm/disks/OCR_VOTE5” is marked an ASM disk with the label “OCR_VOTE5”

[root]# oracleasm deletedisk OCR_VOTE5

Clearing disk header: done

Dropping disk: done

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Cleaning disk “OCR_VOTE5”

Scanning system for ASM disks…

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Cleaning disk “OCR_VOTE5”

Scanning system for ASM disks…

And then re-created the ASM disk. Good job I made a note of this earlier.

[root]# oracleasm createdisk OCR_VOTE5 /dev/mapper/VOTE5_01

Writing disk header: done

Instantiating disk: done

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Scanning system for ASM disks…

Instantiating disk “OCR_VOTE5”

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Scanning system for ASM disks…

Instantiating disk “OCR_VOTE5”

Add the disk to the diskgroup.

[root]# su – oracle

Emergency Local Admin Environment configured

oracle > . ./crs_env

[+ASM1]>sqlplus / as sysasm

SQL> ALTER DISKGROUP OCR_VOTE ADDÂ DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M;

Diskgroup altered.

Restore the OCR

Iâ€™m not sure that I needed to do this, ocrcheck always returned a valid status when run before attempting this fix. I wish I had run another ocrcheck and crsctl query css votedisk before doing this restore.

Anyway, the restore was run as follows :

[root]# ./ocrconfig -restore /oracle/GRID/11203/cdata/clustername/day.ocr

The note I was following suggested that I should run the following on the other nodes

ocrconfig -repair â€“replace

but I missed this, it doesnâ€™t seem to have mattered.

Check Voting Diskgroup and OCR Integrity.

[+ASM1]>sqlplus / as sysasm

SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;

GROUP_NUMBER NAME FAILGROUP PATH

16 OCR_VOTE_0004Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0004Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE5

16 OCR_VOTE_0003Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0003Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE4

16 OCR_VOTE_0002Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0002Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE3

16 OCR_VOTE_0001Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0001Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE2

16 OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OCR_VOTE_0000Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /dev/oracleasm/disks/OCR_VOTE1

[root]# ./ocrcheck

Status of Oracle Cluster Registry is as follows :

VersionÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â Â Â Â Â Â Â Â Â 3

Total space (kbytes)Â Â Â Â :Â Â Â Â 262120

Used space (kbytes)Â Â Â Â Â :Â Â Â Â Â Â 5260

Available space (kbytes) :Â Â Â Â 256860

IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â 207396515

Device/File NameÂ Â Â Â Â Â Â Â :Â +OCR_VOTE

Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[root]# ./crsctl query css votedisk

##Â STATEÂ Â Â File Universal Id Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â File Name Disk group

1. ONLINEÂ Â 16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]

2. ONLINEÂ Â 01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]

3. ONLINEÂ Â a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]

4. ONLINEÂ Â 32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]

5. ONLINEÂ Â 1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]

Located 5 (yes five) voting disk(s).

Stop CRS on the on exclusive node and restart on the other three.

[root]# ./crsctl stop crs

And then restart

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

Check everything comes up on all nodes.

[root]# ./crsctl stat res -init â€“t

On all nodes

[root]# ./crsctl query css votedisk

##Â STATEÂ Â Â File Universal IdÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â File Name Disk group

1. ONLINEÂ Â 16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]

2. ONLINEÂ Â 01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]

3. ONLINEÂ Â a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]

4. ONLINEÂ Â 32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]

5. ONLINEÂ Â 1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]

Located 5 voting disk(s).

[root]# ./ocrcheck

Status of Oracle Cluster Registry is as follows :

VersionÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â Â Â Â Â Â Â Â Â 3

Total space (kbytes)Â Â Â Â :Â Â Â Â 262120

Used space (kbytes)Â Â Â Â Â :Â Â Â Â Â Â 5260

Available space (kbytes) :Â Â Â Â 256860

IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â :Â 207396515

Device/File NameÂ Â Â Â Â Â Â Â :Â +OCR_VOTE

Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

oracle +ASM1 > cluvfy comp ocr -n all -verbose

Verifying OCR integrity

Checking OCR integrity…

Checking the absence of a non-clustered configuration…

All nodes free of non-clustered, local-only configurations

ASM Running check passed. ASM is running on all specified nodes

Checking OCR config file “/etc/oracle/ocr.loc”…

OCR config file “/etc/oracle/ocr.loc” check successful

Disk group for ocr location “+OCR_VOTE” available on all the nodes

NOTE:

This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck’ as a privileged user to verify the contents of OCR.

OCR integrity check passed

Verification of OCR integrity was successful.

oracle +ASM2 > crsstat | grep OFFL

ora.gsdÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â OFFLINE, OFFLINE, OFFLINE

isitdevops.com/databases

Categories

Categories

Meta