Thursday, February 4, 2010

Information on Hot Spare

HOT SPARE:
1. Hot spare faciltiy included with Disk Suite allows automatic replacement of failed sub-mirror/RAID-5 components, provided spare components are avialable & reserved.
2. Because component replacement & resyncing of failed components is automatic.
3. A hot spare is a component that is running (but not being used) which can be substituted for a broken component in a sub-mirror of two or three way meta mirror or RAID-5 device.

Note:
4. Failed components in a one-way meta mirror cannot be replaced by a hot spare.
5. Components designated as hot sapres cannot be used in sub-mirrors or another meta device in the 'md.tab' file. They must remian ready for immediate use in the even of a component failure.


Hot spare states:
1. Has 3 states
a. Available
b. In-use
c. Broken
a. Available:
'Available' hot spares are running and ready to accept data, but are not currently being written to or read from.


b. In-use:
'In-use' hot spares are currenlty being written to and read from.

c. Broken:
1. 'Broken' hot spares are out of the service.
2. A hot spare is placed in the broken state when an I/O error occurs.

2. The number of hot spare pools is limited to 1000.


Defining Hot spare:
1. Hot spare pools are named as 'hspnnn'
where 'nnn' is a number in the range 000-999
2. A metadevice cannot be configured as a hot spare.
3. Once the hot spare pools are defined and associated with a sub-mirror, the hot spares are "availabe" for use. If a component failure occurs, disk-suite searches through the list of hot spares in the assinged pool and selects the first 'available" compoenet that is equal or greated in disk capacity.
4. If a hot spare of adequate size is found, the hot spare state changes to "in-use" and a resync operation is automatically performed. The resync operation brings the hot spare into sync with other sub-mirror or RAID-5 components.
5. If a component of adequate size is "not found" in the list of host spare, the sub-mirror that failed is considered "erred" and the porting of the sub-mirror no longer replicated the data.


Hot spare conditions to avoid:
1. Associating hot spares of the wrong size with sub-mirror. This condition occurs when hot spare pools are defined and associated with a sub-mirror & none of the hot spares in the hot spare pool are equal to or greater than the smallest component in the sub-mirror.
2. Having all the hot spare withing the hot spare pool in use.
In this case immediate action is required:
a. 2 possible solutions or actions can be taken
i. First is to add additional hot spare
ii. To repair some of the components that hace been
hot spare replaced
Note:
If all hot spare are in-use and a sub-mirror fails due to errors, that portion of the mirror will no longer be replicated.

Manipulating hot spare spools:
1. # metahs
= adding hot spares to hot spare pools
= deleting hot spares from hot spare pool
= replacing hot spares in hot spare pools
= enabling hot spare
= checking the status of the hot spare

Adding a hot spare:
Creating a hot spare spool:
1. # metainit hsp000 c0t2d0s5
Creates a hot spare device with the name 'hsp000'

2. # metainit
# metainit hsp001 c0t1d0s4 c0t11d0s4
(or)
# metahs -a hsp001 c0t1d0s4 c0t11d0s4
-a = to add a hot spare
-i = to obtain the information


Deleting hot spare:
1. Hot spares can be deleted from any or all the hot spare pools to which they have been associated.
2. When a hot spare is delted from a hot spare pool, the position of the remianinig hot spares changes to reflect the new position. For eg, if the second of 3 hot spares in a hot spare spool is deleted, the 3rd hot spare moves to the seocnd position.
3. # metahs -d hsp000 c0t11d0s4
Removes the slice from the hot spare pool
-d = to delete

4. Removing hot spare pool:
Note:
Before removing the hot spare pool, remove all the hot spare fromthe pools using 'metahs' with -d options and provide hot spare name.

# metahs -d
-d = deletes only the spare

# metahs -d
To delete the hot spare pool

Replacing hot spare:
Note:
1. Hot spares that are in the 'In-use' state cannot be replaced by other hot spare.
2. The order of hot spares in the hot spare pools is NOT CHANGED when replacemebt occurs.
3. # metahs -r
# metahs -r hsp000 c0t10d0s4 c0t11d0s4
c0t11d0s4 replaces c0t10d0s4

Associting the hot spare pool with sub-mirror/Raid-5 metadevice:
1. # metaparam
modifies the parameters of the meta devices.

# metaparam -h

# metaparam -h hsp000 d101
# metaparam -h hsp000 d102


Note:
Where d101, d102 sub-mirrors of d103 mirror.
where
-h = specifies the hot spare spool to be used by a meta device

Disassociating the hot spare pool with sub-mirror/raid-5 metadevice:
# metaparam -h none
# metaparam -h none d101
# metaparam -h none d102

where,
'none' specifies the meta decie is disassociated with the hot spare pool associated to it.

# metahs -d hsp000 c0t2d0s5 c0t2d0s6
# metahs -d hsp000
# metaclear d100
# metadetach d15 d12
# metaclear d12
# metaclear -r d15


To view the status fo hot spare pool:
# metahs -i

Note:
Suppose the failed disk is going to be repalced to free up hot spare.
# metadevadm
updates the meta device information
-u = obtain the device ID associated with the disk specifier.
This option is used when a disk drive has had its device ID changed during a firmware upgrade or due to changing the controller of a storage.
-v = execution in verbose mode. Has not effect when used with -u option. verbose is default.

# metadevadm -v -u
Updating the device infomation.

# metadevadm -v -u c0t11d0s4

# metareplace -e d103 c0t10d0s3
To replace in the same location
1. Now hot spare will be available
2. Stuatus of the spare disk will change from 'in-

use' to 'available'


Outputs:

bash-3.00# metahs -a hsp001 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4
hsp001: Hotspares are added
bash-3.00# metahs -i
hsp001: 4 hot spares
Device Status Length Reloc
c0t9d0s0 Available 1027216 blocks Yes
c0t9d0s1 Available 1027216 blocks Yes
c0t9d0s3 Available 1027216 blocks Yes
c0t9d0s4 Available 1027216 blocks Yes

Device Relocation Information:
Device Reloc Device ID
c0t9d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_01534930____
bash-3.00#
bash-3.00# metastat -p
d5 -m d0 d10 1
d0 1 1 c0t8d0s0
d10 1 1 c0t10d0s0
d15 1 1 c0t12d0s0
hsp001 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4
hsp001: 4 hot spares
Device Status Length Reloc
c0t9d0s0 Available 1027216 blocks Yes
c0t9d0s1 Available 1027216 blocks Yes
c0t9d0s3 Available 1027216 blocks Yes
c0t9d0s4 Available 1027216 blocks Yes
bash-3.00# metahs -a hsp001 c0t9d0s5
hsp001: Hotspare is added
bash-3.00# metastat -p
d5 -m d0 d10 1
d0 1 1 c0t8d0s0
d10 1 1 c0t10d0s0
d15 1 1 c0t12d0s0
hsp001 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4 c0t9d0s5
bash-3.00# metahs -d hsp001 c0t9d0s5
hsp001: Hotspare is deleted
bash-3.00# metahs -i
hsp001: 4 hot spares
Device Status Length Reloc
c0t9d0s0 Available 1027216 blocks Yes
c0t9d0s1 Available 1027216 blocks Yes
c0t9d0s3 Available 1027216 blocks Yes
c0t9d0s4 Available 1027216 blocks Yes

Device Relocation Information:
Device Reloc Device ID
c0t9d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_01534930_
bash-3.00# metahs -r hsp001 c0t9d0s3 c0t9d0s5
hsp001: Hotspare c0t9d0s3 is replaced with c0t9d0s5
bash-3.00# metahs -i
hsp001: 4 hot spares
Device Status Length Reloc
c0t9d0s0 Available 1027216 blocks Yes
c0t9d0s1 Available 1027216 blocks Yes
c0t9d0s5 Available 1027216 blocks Yes
c0t9d0s4 Available 1027216 blocks Yes

Device Relocation Information:
Device Reloc Device ID
c0t9d0 Yes id1,sd@SFUJITSU_MAG3182L_SUN18G_01534930____

bash-3.00# metahs -d hsp001
metahs: ent250: hsp001: hotspare pool is busy

bash-3.00# metahs -d hsp001 c0t9d0s0 c0t9d0s1 c0t9d0s5 c0t9d0s4
hsp001: Hotspares are deleted
bash-3.00# metahs -d hsp001
hsp001: Hotspare pool is cleared
bash-3.00# metahs -i
metahs: ent250: no hotspare pools found

metaparam -h hsp005 d0
bash-3.00# metaparam -h hsp005 d10
bash-3.00# metastat -p
d5 -m d0 d10 1
d0 1 1 c0t8d0s0 -h hsp005
d10 1 1 c0t10d0s0 -h hsp005
d15 1 1 c0t12d0s0
hsp005 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4
bash-3.00# metainit d100 -r c0t8d0s1 c0t10d0s1 c0t12d0s1
d100: RAID is setup
bash-3.00# metaparam -h hsp005 d100
bash-3.00# metastat -p
d5 -m d0 d10 1
d0 1 1 c0t8d0s0 -h hsp005
d10 1 1 c0t10d0s0 -h hsp005
d100 -r c0t8d0s1 c0t10d0s1 c0t12d0s1 -k -i 32b -h hsp005
d15 1 1 c0t12d0s0
hsp005 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4


bash-3.00# metastat | more
d5: Mirror
Submirror 0: d0
State: Okay
Submirror 1: d10
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 1015808 blocks (496 MB)

d0: Submirror of d5
State: Okay
Hot spare pool: hsp005
Size: 1015808 blocks (496 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t8d0s0 0 No Okay Yes


d10: Submirror of d5
State: Okay
Hot spare pool: hsp005
Size: 1015808 blocks (496 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t10d0s0 0 No Okay Yes


d100: RAID
State: Okay
Hot spare pool: hsp005
Interlace: 32 blocks
Size: 2031616 blocks (992 MB)
(Output truncated)


bash-3.00# metaparam -h none d100
bash-3.00# metastat -p
d5 -m d0 d10 1
d0 1 1 c0t8d0s0 -h hsp005
d10 1 1 c0t10d0s0 -h hsp005
d100 -r c0t8d0s1 c0t10d0s1 c0t12d0s1 -k -i 32b
d15 1 1 c0t12d0s0
hsp005 c0t9d0s0 c0t9d0s1 c0t9d0s3 c0t9d0s4


Output - truncated:
# metastat
d0: Submirror of d5
State: Resyncing
Hot spare pool: hsp005
Size: 1015808 blocks (496 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t8d0s0 0 No Resyncing Yes c0t9d0s1


d10: Submirror of d5
State: Okay
Hot spare pool: hsp005
Size: 1015808 blocks (496 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t10d0s0 0 No Okay Yes

No comments:

Post a Comment