============================================================================================================================================================
In this exercise, a temporary disk failure is simulated. The goal is to recover all of the redundant and nonredundant volumes that were on the failed drive.
============================================================================================================================================================
Check the current disks availability on the system:
==================================================
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk - - online
emc0_dd2 auto:cdsdisk - - online
emc0_dd3 auto:cdsdisk - - online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
[root@server1 /]#
Create a DG:
============
[root@server1 /]# vxdg init testdg testdg01=emc0_dd1 testdg02=emc0_dd2 testdg03=emc0_dd3
[root@server1 /]#
[root@server1 /]#
List the devices and the DG:
============================
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk testdg02 testdg online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
[root@server1 /]#
___________________________________________________________________________
Now create the setup and disk failure as follows. Firstly disable the hot relocation daemon.
Then, create two volumes (test1 and test2), create a vxfs file system on each,
and then copy duplicate files to each file system. Both file systems are
then mounted:
* test1 with a mirrored layout
* test2 with a concatenated layout
Then introduce the temporary disk failure and follow the procedure described here to fix it.
___________________________________________________________________________
Killing the hot relocation daemon:
==================================
* Kill the hot relocation daemon "vxrelocd".
vxrelocd : vxrelocd is the hot-relocation daemon that monitors events
that affect data redundancy. If redundancy failures are detected, vxrelocd
automatically relocates affected data from mirrored or RAID-5 subdisks to
spare disks or other free space within the disk group.
* Disabling vxrelocd :
If you do not want automatic subdisk relocation, you can disable the hot-relocation feature by killing the relocation daemon, vxrelocd, and preventing it from restarting. However, do not kill the daemon while it is doing the relocation. To kill the daemon, run the command:
ps -ef
from the command line and find the two entries for vxrelocd. Execute the command:
kill -9 PID1 PID2
(substituting PID1 and PID2 with the process IDs for the two vxrelocd processes). To prevent vxrelocd from being started again, you must comment out the line that starts up vxrelocd in the startup script /etc/init.d/vxvm-recover.
*Note: Just for information: relocation daemon (vxrelocd) can be started as :- vxrelocd root &
______________________________________________________
Create 2 volumes as discussed above:
/usr/sbin/vxassist -g testdg make test1 102400 layout=mirror testdg01 testdg02
The test1 volume was created successfully.
/usr/sbin/vxassist -g testdg make test2 102400 layout=concat testdg02
The test2 volume was created successfully.
______________________________________________________
Create 2 file systems and mount them:
/sbin/mkfs -t vxfs /dev/vx/rdsk/testdg/test1
mkdir /test1
/bin/mount -t vxfs /dev/vx/dsk/testdg/test1 /test1
/sbin/mkfs -t vxfs /dev/vx/rdsk/testdg/test2
mkdir /test2
/bin/mount -t vxfs /dev/vx/dsk/testdg/test2 /test2
______________________________________________________
Copy some files inside both the file systems:
/bin/cp /etc/default/* /test1
/bin/cp /etc/default/* /test2
______________________________________________________
Fail the device as follows:
The device to fail is emc0_dd2.
/usr/lib/vxvm/bin/vxpartinfo /dev/vx/rdmp/emc0_dd2 8 > /tmp/emc0_dd2.part8
/bin/dd if=/dev/vx/rdmp/emc0_dd2 of=/tmp/emc0_dd2.private bs=128k skip=1 count=256 >/dev/null 2>&1
Overwriting the private region of emc0_dd2. Please wait...
/bin/dd if=/dev/zero of=/dev/vx/rdmp/emc0_dd2 bs=128k seek=1 count=2 >/dev/null 2>&1
Force VxVM to detect the failure.
/sbin/vxdctl disable
/sbin/vxdctl enable
/sbin/vxdctl enable
Restoring the private region of emc0_dd2. Please wait...
fmthard -s /tmp/emc0_dd2.vtoc /dev/vx/rdmp/emc0_dd2
/bin/dd if=/tmp/emc0_dd2.private of=/dev/vx/rdmp/emc0_dd2 bs=128k seek=1 count=256 >/dev/null 2>&1
/sbin/vxdisk scandisks
A temporary disk failure has now occurred in the testdg disk group.
Troubleshoot and repair the failure. Ensure all volumes are started!
NOTE - for this failure you are using the same disk because this was only
a temporary failure.
______________________________________________________
Verify the disk has failed:
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk - (testdg) online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
- - testdg02 testdg failed was:emc0_dd2
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 - - - - NODEVICE
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 DISABLED NODEVICE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 - NDEV
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED NODEVICE 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 - NDEV
______________________________________________________
Attempt to view the files that were copied to mount points /test1 and /test2.
Because the test1 volume is mirrored, the files in the /test1 mount point are
still accessible. When trying to view the files in /test2, you should see the
following error:
/test2: I/O error
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709932 15420152 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
[root@server1 /]# ls -l /test1
total 4
drwxr-xr-x 2 root root 96 Jan 29 01:53 lost+found
-rw-r--r-- 1 root root 1302 Jan 29 01:53 nss
-rw-r----- 1 root root 525 Jan 29 01:53 sfm_resolv.conf
-rw------- 1 root root 119 Jan 29 01:53 useradd
[root@server1 /]# ls -l /test2
ls: /test2: Input/output error
====================================
Recover from the failure as follows:
====================================
a) If you are using enclosure based naming, identify the OS native name of
the disk that has temporarily failed. You will use this OS disk name while
verifying that the operating system recognizes the device.
[root@server1 /]# vxdisk -e list emc0_dd2
DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME ATTR
emc0_dd2 auto:cdsdisk - - online sde lun
b) Ensure that the operating system recognizes the device using the
appropriate OS commands.
[root@server1 /]# partprobe /dev/sde
Warning: The disk CHS geometry (261,255,63) reported by the operating system does not match the geometry stored on the disk label (1024,128,32).
**Note: partprobe is a program that informs the operating system kernel of partition table changes, by requesting that the operating system re-read the partition table.
c) Verify that the operating system recognizes the device using the
appropriate OS commands.
[root@server1 /]# fdisk -l /dev/sde
Disk /dev/sde (Sun disk label): 128 heads, 32 sectors, 1022 cylinders
Units = cylinders of 4096 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sde3 u 0 1022 2093056 5 Whole disk
/dev/sde8 u 0 1022 2093056 f Unknown
d) Force the VxVM configuration daemon to reread all of the drives in the system.
[root@server1 /]# vxdctl enable
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk - (testdg) online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
- - testdg02 testdg failed was:emc0_dd2
e) Reattach the device to the disk media record using the vxreattach command.
[root@server1 /]# vxreattach
** Below command shows that the disk has come online:
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk testdg02 testdg online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
** Check the volumes, whether they have been recovered ?
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
f) Recover the volumes using the vxrecover command.
[root@server1 /]# vxrecover
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709968 15420116 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
g) Use the vxvol command to start the nonredundant volume.
[root@server1 /]# vxvol -g testdg -f start test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
h) Check the mount points:
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709976 15420108 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
i) Because this is a temporary failure, the files in the test2 volume (and file
system) are still available. Recover the mount point by performing the
following:
a> Unmount the /test2 mount point.
b> Perform an fsck on the file system.
c> Mount the test2 volume to /test2.
[root@server1 /]# umount /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# fsck -t vxfs /dev/vx/rdsk/testdg/test2
fsck 1.39 (29-May-2006)
log replay in progress
replay complete - marking super-block as CLEAN
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# mount -t vxfs /dev/vx/dsk/testdg/test2 /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709980 15420104 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
/dev/vx/dsk/testdg/test2
51200 3173 45032 7% /test2
j) Unmount the file systems and delete the test1 and test2 volumes:
[root@server1 /]# umount /test1
[root@server1 /]# umount /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxassist -g testdg remove volume test1
[root@server1 /]# vxassist -g testdg remove volume test2
[root@server1 /]#
In this exercise, a temporary disk failure is simulated. The goal is to recover all of the redundant and nonredundant volumes that were on the failed drive.
============================================================================================================================================================
Check the current disks availability on the system:
==================================================
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk - - online
emc0_dd2 auto:cdsdisk - - online
emc0_dd3 auto:cdsdisk - - online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
[root@server1 /]#
Create a DG:
============
[root@server1 /]# vxdg init testdg testdg01=emc0_dd1 testdg02=emc0_dd2 testdg03=emc0_dd3
[root@server1 /]#
[root@server1 /]#
List the devices and the DG:
============================
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk testdg02 testdg online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
[root@server1 /]#
___________________________________________________________________________
Now create the setup and disk failure as follows. Firstly disable the hot relocation daemon.
Then, create two volumes (test1 and test2), create a vxfs file system on each,
and then copy duplicate files to each file system. Both file systems are
then mounted:
* test1 with a mirrored layout
* test2 with a concatenated layout
Then introduce the temporary disk failure and follow the procedure described here to fix it.
___________________________________________________________________________
Killing the hot relocation daemon:
==================================
* Kill the hot relocation daemon "vxrelocd".
vxrelocd : vxrelocd is the hot-relocation daemon that monitors events
that affect data redundancy. If redundancy failures are detected, vxrelocd
automatically relocates affected data from mirrored or RAID-5 subdisks to
spare disks or other free space within the disk group.
* Disabling vxrelocd :
If you do not want automatic subdisk relocation, you can disable the hot-relocation feature by killing the relocation daemon, vxrelocd, and preventing it from restarting. However, do not kill the daemon while it is doing the relocation. To kill the daemon, run the command:
ps -ef
from the command line and find the two entries for vxrelocd. Execute the command:
kill -9 PID1 PID2
(substituting PID1 and PID2 with the process IDs for the two vxrelocd processes). To prevent vxrelocd from being started again, you must comment out the line that starts up vxrelocd in the startup script /etc/init.d/vxvm-recover.
*Note: Just for information: relocation daemon (vxrelocd) can be started as :- vxrelocd root &
______________________________________________________
Create 2 volumes as discussed above:
/usr/sbin/vxassist -g testdg make test1 102400 layout=mirror testdg01 testdg02
The test1 volume was created successfully.
/usr/sbin/vxassist -g testdg make test2 102400 layout=concat testdg02
The test2 volume was created successfully.
______________________________________________________
Create 2 file systems and mount them:
/sbin/mkfs -t vxfs /dev/vx/rdsk/testdg/test1
mkdir /test1
/bin/mount -t vxfs /dev/vx/dsk/testdg/test1 /test1
/sbin/mkfs -t vxfs /dev/vx/rdsk/testdg/test2
mkdir /test2
/bin/mount -t vxfs /dev/vx/dsk/testdg/test2 /test2
______________________________________________________
Copy some files inside both the file systems:
/bin/cp /etc/default/* /test1
/bin/cp /etc/default/* /test2
______________________________________________________
Fail the device as follows:
The device to fail is emc0_dd2.
/usr/lib/vxvm/bin/vxpartinfo /dev/vx/rdmp/emc0_dd2 8 > /tmp/emc0_dd2.part8
/bin/dd if=/dev/vx/rdmp/emc0_dd2 of=/tmp/emc0_dd2.private bs=128k skip=1 count=256 >/dev/null 2>&1
Overwriting the private region of emc0_dd2. Please wait...
/bin/dd if=/dev/zero of=/dev/vx/rdmp/emc0_dd2 bs=128k seek=1 count=2 >/dev/null 2>&1
Force VxVM to detect the failure.
/sbin/vxdctl disable
/sbin/vxdctl enable
/sbin/vxdctl enable
Restoring the private region of emc0_dd2. Please wait...
fmthard -s /tmp/emc0_dd2.vtoc /dev/vx/rdmp/emc0_dd2
/bin/dd if=/tmp/emc0_dd2.private of=/dev/vx/rdmp/emc0_dd2 bs=128k seek=1 count=256 >/dev/null 2>&1
/sbin/vxdisk scandisks
A temporary disk failure has now occurred in the testdg disk group.
Troubleshoot and repair the failure. Ensure all volumes are started!
NOTE - for this failure you are using the same disk because this was only
a temporary failure.
______________________________________________________
Verify the disk has failed:
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk - (testdg) online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
- - testdg02 testdg failed was:emc0_dd2
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 - - - - NODEVICE
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 DISABLED NODEVICE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 - NDEV
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED NODEVICE 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 - NDEV
______________________________________________________
Attempt to view the files that were copied to mount points /test1 and /test2.
Because the test1 volume is mirrored, the files in the /test1 mount point are
still accessible. When trying to view the files in /test2, you should see the
following error:
/test2: I/O error
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709932 15420152 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
[root@server1 /]# ls -l /test1
total 4
drwxr-xr-x 2 root root 96 Jan 29 01:53 lost+found
-rw-r--r-- 1 root root 1302 Jan 29 01:53 nss
-rw-r----- 1 root root 525 Jan 29 01:53 sfm_resolv.conf
-rw------- 1 root root 119 Jan 29 01:53 useradd
[root@server1 /]# ls -l /test2
ls: /test2: Input/output error
====================================
Recover from the failure as follows:
====================================
a) If you are using enclosure based naming, identify the OS native name of
the disk that has temporarily failed. You will use this OS disk name while
verifying that the operating system recognizes the device.
[root@server1 /]# vxdisk -e list emc0_dd2
DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME ATTR
emc0_dd2 auto:cdsdisk - - online sde lun
b) Ensure that the operating system recognizes the device using the
appropriate OS commands.
[root@server1 /]# partprobe /dev/sde
Warning: The disk CHS geometry (261,255,63) reported by the operating system does not match the geometry stored on the disk label (1024,128,32).
**Note: partprobe is a program that informs the operating system kernel of partition table changes, by requesting that the operating system re-read the partition table.
c) Verify that the operating system recognizes the device using the
appropriate OS commands.
[root@server1 /]# fdisk -l /dev/sde
Disk /dev/sde (Sun disk label): 128 heads, 32 sectors, 1022 cylinders
Units = cylinders of 4096 * 512 bytes
Device Flag Start End Blocks Id System
/dev/sde3 u 0 1022 2093056 5 Whole disk
/dev/sde8 u 0 1022 2093056 f Unknown
d) Force the VxVM configuration daemon to reread all of the drives in the system.
[root@server1 /]# vxdctl enable
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk - (testdg) online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
- - testdg02 testdg failed was:emc0_dd2
e) Reattach the device to the disk media record using the vxreattach command.
[root@server1 /]# vxreattach
** Below command shows that the disk has come online:
[root@server1 /]# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
emc0_dd1 auto:cdsdisk testdg01 testdg online
emc0_dd2 auto:cdsdisk testdg02 testdg online
emc0_dd3 auto:cdsdisk testdg03 testdg online
emc0_dd4 auto:cdsdisk - - online
emc0_dd5 auto:none - - online invalid
emc0_dd6 auto:none - - online invalid
emc0_dd7 auto:none - - online invalid
emc0_dd8 auto:none - - online invalid
emc0_dd9 auto:none - - online invalid
emc0_d10 auto:none - - online invalid
emc0_d11 auto:none - - online invalid
emc0_d12 auto:none - - online invalid
sda auto:none - - online invalid
sdb auto:none - - online invalid
** Check the volumes, whether they have been recovered ?
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
f) Recover the volumes using the vxrecover command.
[root@server1 /]# vxrecover
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - DISABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 DISABLED IOFAIL 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709968 15420116 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
g) Use the vxvol command to start the nonredundant volume.
[root@server1 /]# vxvol -g testdg -f start test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxprint -g testdg -htu h
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg testdg default default 1000 1422525023.21.sym1
dm testdg01 emc0_dd1 auto 32.00m 1.96g -
dm testdg02 emc0_dd2 auto 32.00m 1.96g -
dm testdg03 emc0_dd3 auto 32.00m 1.96g -
v test1 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test1-01 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg01-01 test1-01 testdg01 0.00 50.00m 0.00 emc0_dd1 ENA
pl test1-02 test1 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-01 test1-02 testdg02 0.00 50.00m 0.00 emc0_dd2 ENA
v test2 - ENABLED ACTIVE 50.00m SELECT - fsgen
pl test2-01 test2 ENABLED ACTIVE 50.00m CONCAT - RW
sd testdg02-02 test2-01 testdg02 50.00m 50.00m 0.00 emc0_dd2 ENA
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
h) Check the mount points:
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709976 15420108 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
df: `/test2': Input/output error
[root@server1 /]#
[root@server1 /]#
[root@server1 /]#
i) Because this is a temporary failure, the files in the test2 volume (and file
system) are still available. Recover the mount point by performing the
following:
a> Unmount the /test2 mount point.
b> Perform an fsck on the file system.
c> Mount the test2 volume to /test2.
[root@server1 /]# umount /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# fsck -t vxfs /dev/vx/rdsk/testdg/test2
fsck 1.39 (29-May-2006)
log replay in progress
replay complete - marking super-block as CLEAN
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# mount -t vxfs /dev/vx/dsk/testdg/test2 /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 21225712 4709980 15420104 24% /
tmpfs 1029756 0 1029756 0% /dev/shm
tmpfs 4 0 4 0% /dev/vx
mgt:/student 21225728 6176640 13953472 31% /student
/dev/vx/dsk/testdg/test1
51200 3177 45028 7% /test1
/dev/vx/dsk/testdg/test2
51200 3173 45032 7% /test2
j) Unmount the file systems and delete the test1 and test2 volumes:
[root@server1 /]# umount /test1
[root@server1 /]# umount /test2
[root@server1 /]#
[root@server1 /]#
[root@server1 /]# vxassist -g testdg remove volume test1
[root@server1 /]# vxassist -g testdg remove volume test2
[root@server1 /]#
No comments:
Post a Comment