mxin
作者mxin·2017-11-03 09:07
资深工程师·上海宝信软件股份有限公司

某证券公司核心交易系统定修GPFS详细步骤

字数 34959阅读 6826评论 1赞 8

某证券公司核心交易系统定修GPFS部分
目录

  1. 事前准备(6-8小时): 1
    1.1 确认现在七个运行环境的现状 2
    1.2 确认GPFS3.3升级介质 2
    1.3 确认不用的NSD哪些需要删 2
    1.4 确认SAN环境 3
    1.5 扩容需新加的盘分配确认 3
    1.6 收集及备份现有GPFS数据 4
    1.7 确认其他gpfs定修时要用到的文件 4
    1.8 确认删除NSD盘相关脚本 4
  2. 定修时GPFS集群实施步骤(2-4.5小时): 4
    2.1 GPFS升级(45-75分钟) 5
    2.1.1 GPFS升级事前准备(15分钟) 5
    2.1.2 GPFS升级步骤(30分钟) 6
    2.1.3 GPFS升级故障紧急处理(20分钟) 8
    2.2 GPFS删除p55ADRgpfs节点(10分钟) 8
    2.3 GPFS调整(60-80分钟) 10
    2.3.1 DS8700磁盘事前准备(5分钟) 10
    2.3.2 添加DS8700磁盘(15分钟) 10
    2.3.3 拆除DS6800镜像换为DS8700镜像(40分钟) 12
    2.4 GPFS扩容(20-100分钟) 14
    2.4.1 GPFS扩容事前准备(20分钟) 14
    2.4.2 GPFS扩容步骤(0-80分钟) 14
    2.5 GPFS新加节点p55ADRgpfs(20分钟) 17
    2.5.1 事前准备 17
    2.5.2 新增节点操作步骤 18
    2.6 删除不用的NSD盘(视情况而定 30分钟) 19
    2.6.1 事前准备 19
  3. GPFS集群故障紧急处理(3小时): 19
    3.1 导出数据盘后重新导入(30分钟) 20
    3.2 搭建新的GPFS集群并导入数据盘(60分钟) 21
    3.3 搭建新的GPFS集群并启用备份恢复数据文件(60-180分钟)(某证券公司人员为主) 23

1.事前准备(6-8小时):
1.1 确认现在七个运行环境的现状

收集/etc/hosts,/.rhosts,/home/scripts/gpfs/下和/tmp/gpfs/下各脚本和配置文件(run_cmd.sh,rcp_file.sh,/tmp/gpfs/nodefile,/tmp/gpfs/nsdfile等)是否符合规范,rsh和rcp都能正常运行。 (用showconf脚本收集, 需在1月24日前确认)

1.2 确认GPFS3.3升级介质

将gpfs3.3.0.11的两个介质放在七个运行环境的/worktmp/gpfs3.3/下,并写一个简单的安装脚本放在/home/scripts/gpfs/下,方便定修时安装 (可在1月24日确认)
p59021[/home/scripts/gpfs]#vi install_gpfs3.3.sh
cd /worktmp/gpfs3.3/GPFS3.3_aix
installp -acYd . all

cd /worktmp/gpfs3.3/gpfs3.3.0.11
installp -acYd . all

1.3 确认不用的NSD哪些需要删(可选)

root@p59011:[/]mmlsnsd

File system Disk name Primary node Backup node

gpfs_erpHome gpfs3nsd p59022gpfs p59012gpfs
gpfs_erpHome gpfs4nsd p59022gpfs p59012gpfs
gpfs_erpHome gpfs14nsd p59021gpfs p59011gpfs
gpfs_erpHome gpfs15nsd p59021gpfs p59011gpfs
(free disk) gpfs10nsd p59011gpfs
(free disk) gpfs11nsd p59011gpfs
(free disk) gpfs18nsd p55ADRgpfs
(free disk) gpfs19nsd p55ADRgpfs
(free disk) gpfs1nsd p59022gpfs p59012gpfs
(free disk) gpfs2nsd p59022gpfs p59012gpfs
(free disk) gpfs5nsd p55ADRgpfs p59023gpfs
(free disk) gpfs6nsd p55ADRgpfs p59023gpfs
(free disk) gpfs9nsd p59011gpfs

实际gpfs集群只用了四块盘,其他九块为free disk状态。其中gpfs1nsd,gpfs2nsd作为DS8100扩容的盘保留,其他gpfs9nsd,gpfs10nsd(DS8100),gpfs5nsd,gpfs6nsd,gpfs11nsd,gpfs18nsd,gpfs19nsd(DS6800)在周二定修的最后有时间的话可删。
sh sh run_cmd.sh "sh /home/scripts/gpfs/cleargpfs6800.sh”

1.4 确认SAN环境

alias的命名和zone的划分,原有的zone不变,新加的需按照一卡一zone规范。(在1月23,24号和谢磊确认,1月25号设备搬迁后再确认盘是否能认到)

1.5 扩容需新加的盘分配确认

DS8100的两块盘可在1月24号前建好,分给各分区并将pvid设好并建好nsd(记录vpath号,nsd号);
p59011[/home/mxin]#grep -E "p[0-9]|gpfs2nsd" vpaths.list
p55ADR
vpath35 8100 0107 hdisk72 51.2GB hdisk79 00cf49dc4e5e93c1 gpfs2nsd
p59011
vpath16 8100 0107 hdisk20 51.2GB hdisk93 00cf49dc4e5e93c1 gpfs2nsd
p59012
vpath1 8100 0107 hdisk3 51.2GB hdisk23 00cf49dc4e5e93c1 gpfs2nsd
p59021
vpath16 8100 0107 hdisk18 51.2GB hdisk91 00cf49dc4e5e93c1 gpfs2nsd
p59022
vpath1 8100 0107 hdisk3 51.2GB hdisk23 00cf49dc4e5e93c1 gpfs2nsd
p59023
vpath2 8100 0107 hdisk4 51.2GB hdisk16 00cf49dc4e5e93c1 gpfs2nsd
p59024
p59011[/home/mxin]#grep -E "p[0-9]|gpfs1nsd" vpaths.list
p55ADR
vpath33 8100 0007 hdisk70 51.2GB hdisk77 00cf49dc4e5e8e68 gpfs1nsd
p59011
vpath7 8100 0007 hdisk11 51.2GB hdisk84 00cf49dc4e5e8e68 gpfs1nsd
p59012
vpath0 8100 0007 hdisk2 51.2GB hdisk21 00cf49dc4e5e8e68 gpfs1nsd
p59021
vpath7 8100 0007 hdisk9 51.2GB hdisk82 00cf49dc4e5e8e68 gpfs1nsd
p59022
vpath0 8100 0007 hdisk2 51.2GB hdisk21 00cf49dc4e5e8e68 gpfs1nsd
p59023
vpath0 8100 0007 hdisk2 51.2GB hdisk14 00cf49dc4e5e8e68 gpfs1nsd
p59024

DS8700阵列需提前做好mkvolgrp,mkhostconnect,mkfbvol和chvolgrp工作(1月24号之前完成)。
在定修时san的划分都调整过来后,做扫盘和分配pvid的工作

1.6 收集及备份现有GPFS数据
备份gpfs到tsm 1月24号完成

1.7 确认其他gpfs定修时要用到的文件

确认1月25号要用到的文件(/tmp/gpfs/nsdfile1,/tmp/gpfs/nsdfile2)已放置到各分区
root@p59011:[/]more /tmp/gpfs/nsdfile1
xvpath4:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath5:p59021gpfs:p59011gpfs:dataAndMetadata:4003

root@p59021:[/]more tmp/gpfs/nsdfile2
xvpath6:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath7:p59021gpfs:p59011gpfs:dataAndMetadata:4003

1.8确认删除NSD盘相关脚本
root@p59021:[/]more /home/scripts/gpfs/cleargpfs6800.sh
lspv|egrep "00cf49dc549a3bc3|00cf49dc549a4e5d|00cf49dc549a5744|000b0482373186db|000b0482373197e0|00cf49dc549a3a94|00cf49dc549a4555">
/tmp/6800a.list
cat /tmp/6800a.list|awk '{print "rmdev -Rdl "$1}'
datapath query essmap|egrep ""cat /tmp/6800a.list|awk '{printf $1"|";} END{ printf "xxxx"}'""|awk '{print "rmdev -Rdl "$2}'


2.定修时GPFS集群实施步骤(2-4.5小时):

关闭gpfs集群(某证券公司人员负责)
p59021[/home/scripts/gpfs]#mmunmount /erpHome
p59021[/home/scripts/gpfs]#mmshutdown -a

交换机,阵列,服务器的搬迁和调整到规划位置。

启动p59011等七个分区,确认lspv能看到原本DS8100和6800的盘
其中p59024分区要谨慎核对vpath数量和pvid号
sh “/home/scripts/gpfs/lsvp.sh”
重新扫盘,应该会扫到DS8700的四块盘,并分配pvid
p59021[/]#lsvpcfg | awk '{print "chdev -l "$1,"-a pv=yes"}'|sh
确认各分区vpath的pvid一致,确认GPFS操作的磁盘为DS8100 4块盘,DS6800 2块盘,DS8700 4块盘,并记录hdisk,vpath信息,或者可简单运行showconf.sh.

Sh rum_cmd.sh “/home/mxin/showconf.sh”


2.1 GPFS升级(45-75分钟)
2.1.1GPFS升级事前准备(15分钟)

gpfs版本确认
p59021[/home/scripts/gpfs]#sh run_cmd.sh lslpp –l |grep gpfs
gpfs.base 3.1.0.21 COMMITTED GPFS File Manager
gpfs.msg.en_US 3.1.0.12 COMMITTED GPFS Server Messages - U.S.

                                             English

安装介质位置确认
p59021[/home/scripts/gpfs]#sh run_cmd.sh ls -l /worktmp/gpfs3.3/GPFS3.3_aix
-rw-r----- 1 root system 3587 Jan 18 10:26 .toc
-rw-r----- 1 root system 97595 Jan 18 10:27 GPFS_GUI_Help_3.3.pdf
-rw-r----- 1 root system 7741 Jan 18 10:27 Readme3.3GUI.txt
-rw-r----- 1 root system 31717376 Jan 18 10:26 gpfs.base
-rw-r----- 1 root system 708608 Jan 18 10:26 gpfs.docs.data
-rw-r----- 1 root system 68166656 Jan 18 10:27 gpfs.gui
-rw-r----- 1 root system 138240 Jan 18 10:27 gpfs.msg.en_US

p59021[/home/scripts/gpfs]#sh run_cmd.sh ls -l /worktmp/gpfs3.3/gpfs3.3.0.11
-rw-r--r-- 1 root system 7501 Jan 18 16:18 .toc
-rwxr-xr-x 1 root system 84152 Jan 18 16:11 GPFS-3.3.0.11-power-AIX.readme
-rwxr-xr-x 1 root system 76215327 Jan 18 16:11 GPFS-3.3.0.11-power-AIX.tar.gz
-rw-r--r-- 1 root system 3628 Dec 16 00:00 README
-rw-r--r-- 1 root system 15360 Aug 21 02:18 U829687.gpfs.docs.data.bff
-rw-r--r-- 1 root system 69010432 Aug 21 02:18 U829690.gpfs.gui.bff
-rw-r--r-- 1 root system 27413504 Dec 16 00:01 U840617.gpfs.base.bff
-rw-r--r-- 1 root system 141312 Dec 16 00:01 U840618.gpfs.msg.en_US.bff
-rw-r--r-- 1 root system 7366 Dec 11 03:19 changelog

升级脚本确认
p59021[/home/scripts/gpfs]# sh run_cmd.sh more /home/scripts/gpfs/install_gpfs3.3.sh
cd /worktmp/gpfs3.3/GPFS3.3_aix
installp -acYd . all

cd /worktmp/gpfs3.3/gpfs3.3.0.11
installp -acYd . all

2.1.2GPFS升级步骤(30分钟)

运行升级脚本
p59021[/home/scripts/gpfs]# sh run_cmd.sh "/home/scripts/gpfs/install_gpfs3.3.sh &"

确认各分区是否版本已升到3.3.0.11
p59021[/home/scripts/gpfs]#sh run_cmd.sh lslpp -l |grep gpfs
gpfs.base 3.3.0.11 COMMITTED GPFS File Manager
gpfs.gui 3.3.0.1 COMMITTED GPFS GUI
gpfs.msg.en_US 3.3.0.7 COMMITTED GPFS Server Messages - U.S.
gpfs.base 3.3.0.11 COMMITTED GPFS File Manager

启动gpfs集群
p59021[/home/scripts/gpfs]#mmstartup –a
Mon Jan 17 08:20:42 BEIST 2011: 6027-1642 mmstartup: Starting GPFS ...

挂载文件系统
p59021[/home/scripts/gpfs]#mmmount all –a
Fri Jan 17 08:25:00 BEIST 2011: 6027-1623 mmmount: Mounting file systems ...

查看节点状态(节点状态为down的处理方法)
p59021[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

确认文件系统(需某证券公司人员配合检验)
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
gpfs14nsd nsd 512 4003 yes yes ready up 3 system desc
gpfs15nsd nsd 512 4003 yes yes ready up 4 system desc
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2

p59021[/home/scripts/gpfs]#mmlsfs gpfs_erpHome

File system attributes for /dev/gpfs_erpHome:

flag value description


-s roundRobin Stripe method
-f 16384 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 2 Default number of data replicas
-R 2 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 524288 Block size
-Q none Quotas enforced

 none           Default quotas enabled

-F 1044480 Maximum number of inodes
-V 9.03 File system version. Highest supported version: 9.03
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-E yes Exact mtime mount option
-S no Suppress atime mount option
-K whenpossible Strict replica allocation option
-P system Disk storage pools in file system
-d gpfs3nsd;gpfs4nsd;gpfs14nsd;gpfs15nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /erpHome Default mount point

2.1.3GPFS升级故障紧急处理(20分钟)

节点状态为down的处理方法
p59011[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       down
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

关闭gpfs集群,清除原有GPFS进程及内存信息,此步骤要慎重,有可能分区会自动重起.
p59011[/home/scripts/gpfs]#mmshutdown -a
p59011[/home/scripts/gpfs]#mmfsenv –u

如果还是不行重启该分区,重启后确认状态正常
p59011[/home/scripts/gpfs]#mmstartup -a
p59011[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

2.2GPFS删除p55ADRgpfs节点(10分钟)

确认是否所有节点都active
p59021[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

关闭p55a上的gpfs文件系统
p59021[/]#mmshutdown -N p55ADRgpfs
Sun Jan 23 20:14:47 BEIST 2011: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
p55ADRgpfs: forced unmount of /erpHome
Sun Jan 23 20:14:52 BEIST 2011: 6027-1344 mmshutdown: Shutting down GPFS daemons
p55ADRgpfs: Shutting down!
p55ADRgpfs: 'shutdown' command about to kill process 745498
Sun Jan 23 20:14:58 BEIST 2011: 6027-1345 mmshutdown: Finished

删除p55ADRgpfs节点
p59021[/]#mmdelnode -N p55ADRgpfs
Verifying GPFS is stopped on all affected nodes ...
mmdelnode: Command successfully completed
mmdelnode: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

确认节点现状
p59021[/erpHome/]#mmlsnode
GPFS nodeset Node list


gpfs_erp p59011gpfs p59012gpfs p59021gpfs p59022gpfs p59023gpfs p59024gpfs

p59021[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   8      p59024gpfs       active

2.3 GPFS调整(60-80分钟)
2.3.1DS8700磁盘事前准备(5分钟)

确认8700的两块盘的现状
sh lsvp.sh|grep -i 8700|egrep -i “200b|200c|200e|200d”

p59021[/home/scripts/gpfs]#lsvpcfg
xvpath4 (Avail pv ) 75VX7612204 = hdisk8 (Avail )
xvpath5 (Avail pv ) 75VX7612205 = hdisk9 (Avail )

p59021[/home/scripts/gpfs]#lspv
xvpath4 00c4ff169ce1cab9 none
xvpath5 00c4ff169ce1cb81 none

确认并修改nsdfile
p59021[/home/scripts/gpfs]#vi /tmp/gpfs/nsdfile1
xvpath4:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath5:p59021gpfs:p59011gpfs:dataAndMetadata:4003

确认gpfs集群状态是否正常
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
gpfs14nsd nsd 512 4003 yes yes ready up 3 system desc
gpfs15nsd nsd 512 4003 yes yes ready up 4 system desc
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2

2.3.2添加DS8700磁盘(15分钟)

生成NSD盘
p59021[/home/scripts/gpfs]#mmcrnsd -F /tmp/gpfs/nsdfile1
mmcrnsd: Processing disk xvpath4
mmcrnsd: Processing disk xvpath5
mmcrnsd: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

确认NSD盘为可用状态
p59021[/home/scripts/gpfs]#lspv
xvpath4 00c4ff169ce1cab9 xgpfs5nsd
xvpath5 00c4ff169ce1cb81 xgpfs6nsd
p59021[/home/scripts/gpfs]#mmlsnsd

File system Disk name NSD servers

gpfs_erpHome gpfs3nsd p59022gpfs,p59012gpfs
gpfs_erpHome gpfs4nsd p59022gpfs,p59012gpfs
gpfs_erpHome gpfs14nsd p59021gpfs,p59011gpfs
gpfs_erpHome gpfs15nsd p59021gpfs,p59011gpfs
(free disk) xgpfs5nsd p59021gpfs,p59011gpfs
(free disk) xgpfs6nsd p59021gpfs,p59011gpfs

添加这两块盘到现有的gpfs集群
p59021[/home/scripts/gpfs]#mmadddisk gpfs_erpHome "xgpfs5nsd;xgpfs6nsd"
GPFS: 6027-531 The following disks of gpfs_erpHome will be formatted on node p59021:

xgpfs5nsd: size 52428800 KB
xgpfs6nsd: size 52428800 KB

Extending Allocation Map
Checking Allocation Map for storage pool 'system'
GPFS: 6027-1503 Completed adding disks to file system gpfs_erpHome.
mmadddisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
如果盘之前在其他地方测试过导致报错,可加 -v no参数强制执行

确认现状,状态应类似于如下状态
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
gpfs14nsd nsd 512 4003 yes yes ready up 3 system desc
gpfs15nsd nsd 512 4003 yes yes ready up 4 system desc
xgpfs5nsd nsd 512 4003 yes yes ready up 5 system
xgpfs6nsd nsd 512 4003 yes yes ready up 6 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
GPFS: 6027-739 Attention: Due to an earlier configuration change the file system
is no longer properly balanced.
出现不均衡现象是正常的,暂时不理会

确认容量已增加
p59021[/home/scripts/gpfs]#df -g
/dev/gpfs_erpHome 300.00 197.87 34% 9478 5% /erpHome

2.3.3拆除DS6800镜像换为DS8700镜像(40分钟)

将DS6800的盘从gpfs集群删除
p59021[/home/scripts/gpfs]#mmdeldisk gpfs_erpHome "gpfs14nsd;gpfs15nsd"
Deleting disks ...
Scanning system storage pool
GPFS: 6027-589 Scanning file system metadata, phase 1 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata ...
4.85 % complete on Wed Jan 19 10:12:58 2011 ( 174960 inodes 4601 MB)
10.54 % complete on Wed Jan 19 10:13:18 2011 ( 175252 inodes 10001 MB)
17.34 % complete on Wed Jan 19 10:13:38 2011 ( 188744 inodes 16454 MB)
22.68 % complete on Wed Jan 19 10:13:58 2011 ( 189048 inodes 21524 MB)
29.08 % complete on Wed Jan 19 10:14:19 2011 ( 189299 inodes 27594 MB)
35.60 % complete on Wed Jan 19 10:14:39 2011 ( 189588 inodes 33779 MB)
42.17 % complete on Wed Jan 19 10:14:59 2011 ( 189896 inodes 40019 MB)
49.69 % complete on Wed Jan 19 10:15:20 2011 ( 190223 inodes 47146 MB)
56.13 % complete on Wed Jan 19 10:15:40 2011 ( 190543 inodes 53265 MB)
63.78 % complete on Wed Jan 19 10:16:00 2011 ( 190892 inodes 60522 MB)
69.77 % complete on Wed Jan 19 10:16:24 2011 ( 203322 inodes 66202 MB)
75.71 % complete on Wed Jan 19 10:16:44 2011 ( 203676 inodes 71845 MB)
81.59 % complete on Wed Jan 19 10:17:04 2011 ( 204287 inodes 77419 MB)
85.86 % complete on Wed Jan 19 10:17:25 2011 ( 204532 inodes 81472 MB)
90.24 % complete on Wed Jan 19 10:17:45 2011 ( 204771 inodes 85626 MB)
92.52 % complete on Wed Jan 19 10:18:10 2011 ( 204920 inodes 87787 MB)
93.41 % complete on Wed Jan 19 10:18:30 2011 ( 205002 inodes 88637 MB)
93.89 % complete on Wed Jan 19 10:18:51 2011 ( 205046 inodes 89087 MB)
94.37 % complete on Wed Jan 19 10:19:11 2011 ( 205092 inodes 89548 MB)
94.88 % complete on Wed Jan 19 10:19:32 2011 ( 205140 inodes 90028 MB)
95.33 % complete on Wed Jan 19 10:19:53 2011 ( 205183 inodes 90459 MB)
95.71 % complete on Wed Jan 19 10:20:15 2011 ( 205219 inodes 90819 MB)
96.08 % complete on Wed Jan 19 10:20:37 2011 ( 205254 inodes 91169 MB)
96.57 % complete on Wed Jan 19 10:20:59 2011 ( 205301 inodes 91639 MB)
100.00 % complete on Wed Jan 19 10:21:07 2011
GPFS: 6027-552 Scan completed successfully.
Checking Allocation Map for storage pool 'system'
GPFS: 6027-370 tsdeldisk64 completed.
mmdeldisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
此步骤在数据量大时会花较多时间,预计100G文件要花去约30-60分钟.

查看集群状态
如果自动均衡了最好,如不均衡暂时不理会
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system desc
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
xgpfs5nsd nsd 512 4003 yes yes ready up 5 system desc
xgpfs6nsd nsd 512 4003 yes yes ready up 6 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
GPFS: 6027-740 Attention: Due to an earlier configuration change the file system is no longer properly replicated.

删除DS6800的nsd盘,包括其他free状态的disk (此步需在事前准备工作中确认)
p59021[/soft_ins]#mmdelnsd "gpfs14nsd;gpfs15nsd;..."

确认gpfs集群里已没有DS6800的盘
p59021[/home/scripts/gpfs]#lspv
p59021[/home/scripts/gpfs]#df -g

2.4 GPFS扩容(20-100分钟)
2.4.1GPFS扩容事前准备(20分钟)

确认8100,8700的各两块新盘的现状
p59021[/home/scripts/gpfs]#lsvpcfg
vpath7 (Avail pv ) 75DKRK10007 = hdisk9 (Avail ) hdisk82 (Avail )
vpath16 (Avail pv ) 75DKRK10107 = hdisk18 (Avail ) hdisk91 (Avail )
xvpath6 (Avail pv ) 75VX7612206 = hdisk10 (Avail )
xvpath7 (Avail pv ) 75VX7612207 = hdisk11 (Avail )
p59021[/home/scripts/gpfs]#lspv
vpath7 00cf49dc4e5e8e68 gpfs1nsd
vpath16 00cf49dc4e5e93c1 gpfs2nsd
xvpath6 00c4ff169ce1cc49 none
xvpath7 00c4ff169ce1cd11 none
确认并确认nsdfile
p59021[/home/scripts/gpfs]#more /tmp/gpfs/nsdfile2
xvpath6:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath7:p59021gpfs:p59011gpfs:dataAndMetadata:4003

2.4.2GPFS扩容步骤(0-80分钟)

生成NSD盘
p59021[/home/scripts/gpfs]#mmcrnsd -F /tmp/gpfs/nsdfile2

确认NSD盘为可用状态
p59021[/home/scripts/gpfs]#lspv
vpath7 00cf49dc4e5e8e68 gpfs1nsd
vpath16 00cf49dc4e5e93c1 gpfs2nsd
xvpath6 00c4ff169ce1cc49 xgpfs13nsd
xvpath7 00c4ff169ce1cd11 xgpfs14nsd

查看NSD状态
p59021[/home/scripts/gpfs]#mmlsnsd

File system Disk name NSD servers

gpfs_erpHome gpfs3nsd p59022gpfs,p59012gpfs
gpfs_erpHome gpfs4nsd p59022gpfs,p59012gpfs
gpfs_erpHome xgpfs5nsd p59021gpfs,p59011gpfs
gpfs_erpHome xgpfs6nsd p59021gpfs,p59011gpfs
(free disk) gpfs1nsd p59022gpfs,p59012gpfs
(free disk) gpfs2nsd p59022gpfs,p59012gpfs
(free disk) xgpfs13nsd p59021gpfs,p59011gpfs
(free disk) xgpfs14nsd p59021gpfs,p59011gpfs

添加这四块盘到现有的GPFS集群
p59021[/home/scripts/gpfs]#mmadddisk gpfs_erpHome "gpfs1nsd;gpfs2nsd;xgpfs13nsd;xgpfs14nsd"

GPFS: 6027-531 The following disks of gpfs_erpHome will be formatted on node p59011:

gpfs1nsd: size 52428800 KB
gpfs2nsd: size 52428800 KB
xgpfs13nsd: size 52428800 KB
xgpfs14nsd: size 52428800 KB

Extending Allocation Map
Checking Allocation Map for storage pool 'system'
GPFS: 6027-1503 Completed adding disks to file system gpfs_erpHome.
mmadddisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

查看磁盘现状,可能是如下状态
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system desc
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
xgpfs5nsd nsd 512 4003 yes yes ready up 3 system desc
xgpfs6nsd nsd 512 4003 yes yes ready up 4 system
gpfs1nsd nsd 512 4003 yes yes ready up 5 system
gpfs2nsd nsd 512 4003 yes yes ready up 6 system
xgpfs13nsd nsd 512 4003 yes yes ready up 7 system
xgpfs14nsd nsd 512 4003 yes yes ready up 8 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
GPFS: 6027-740 Attention: Due to an earlier configuration change the file system
is no longer properly replicated.

新加的NSD盘failure group状态可能会不太对,手工将DS8100的两个nsd的failure group改为1
p59021[/home/scripts/gpfs]#mmchdisk gpfs_erpHome change -d "gpfs1nsd::::1;gpfs2nsd::::1"

最后重新平衡一下gpfs集群
p59021[/home/scripts/gpfs]#mmrestripefs gpfs_erpHome –b
GPFS: 6027-589 Scanning file system metadata, phase 1 ...
63 % complete on Wed Jan 19 11:07:22 2011
100 % complete on Wed Jan 19 11:07:24 2011
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 ...
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata ...
0.54 % complete on Wed Jan 19 11:07:45 2011 ( 41867 inodes 708 MB)
2.16 % complete on Wed Jan 19 11:08:05 2011 ( 79783 inodes 2842 MB)
3.76 % complete on Wed Jan 19 11:08:27 2011 ( 101985 inodes 4954 MB)
……
83.00 % complete on Wed Jan 19 11:23:20 2011 ( 626057 inodes 109343 MB)
83.69 % complete on Wed Jan 19 11:23:40 2011 ( 626147 inodes 110254 MB)
84.49 % complete on Wed Jan 19 11:24:00 2011 ( 626252 inodes 111305 MB)
85.35 % complete on Wed Jan 19 11:24:21 2011 ( 626365 inodes 112436 MB)
86.15 % complete on Wed Jan 19 11:24:41 2011 ( 626471 inodes 113497 MB)
87.00 % complete on Wed Jan 19 11:25:01 2011 ( 626583 inodes 114617 MB)
87.90 % complete on Wed Jan 19 11:25:21 2011 ( 626701 inodes 115798 MB)
88.75 % complete on Wed Jan 19 11:25:42 2011 ( 626813 inodes 116919 MB)
89.40 % complete on Wed Jan 19 11:26:02 2011 ( 626895 inodes 117770 MB)
90.37 % complete on Wed Jan 19 11:26:23 2011 ( 626959 inodes 119051 MB)
91.23 % complete on Wed Jan 19 11:26:43 2011 ( 627016 inodes 120192 MB)
92.30 % complete on Wed Jan 19 11:27:04 2011 ( 627086 inodes 121593 MB)
93.42 % complete on Wed Jan 19 11:27:24 2011 ( 627160 inodes 123074 MB)
94.62 % complete on Wed Jan 19 11:27:44 2011 ( 627239 inodes 124656 MB)
95.38 % complete on Wed Jan 19 11:28:04 2011 ( 627289 inodes 125656 MB)
96.32 % complete on Wed Jan 19 11:28:24 2011 ( 627351 inodes 126897 MB)
97.40 % complete on Wed Jan 19 11:28:45 2011 ( 627421 inodes 128318 MB)
97.42 % complete on Wed Jan 19 11:29:05 2011 ( 627491 inodes 129680 MB)
97.44 % complete on Wed Jan 19 11:29:36 2011 ( 627599 inodes 131861 MB)
97.46 % complete on Wed Jan 19 11:30:17 2011 ( 629598 inodes 134238 MB)
100.00 % complete on Wed Jan 19 11:30:20 2011
GPFS: 6027-552 Scan completed successfully.
此步骤约要近一小时

最后状态确认,应类似于如下状态
p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system desc
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
xgpfs5nsd nsd 512 4003 yes yes ready up 3 system desc
xgpfs6nsd nsd 512 4003 yes yes ready up 4 system
gpfs1nsd nsd 512 1 yes yes ready up 5 system
gpfs2nsd nsd 512 1 yes yes ready up 6 system
xgpfs13nsd nsd 512 4003 yes yes ready up 7 system
xgpfs14nsd nsd 512 4003 yes yes ready up 8 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2

p59021[/home/scripts/gpfs]#df –g
/dev/gpfs_erpHome 400.00 213.87 43% 1578586 16% /erpHome

做一些简单的读写测试确认没有问题(某证券公司人员配合)

2.5GPFS新加节点p55ADRgpfs(20分钟)

2.5.1事前准备

在p55ADR分区扫盘并确认新增DS8700盘的pvid是否一致
p55ADRgpfs[/]#cfgmgr –v
p59021[/]#lsvpcfg | awk '{print "chdev -l "$1,"-a pv=yes"}'|sh

在p590上六个节点上修改/etc/hosts, 主节点p59021上修改/home/scripts/gpfs/machines.list中p55a的相关信息
p59021[/]#vi /etc/hosts
10.0.14.10 p55ADR 改为 10.0.12.31 p55ADR
10.0.15.10 p55ADRgpfs 改为10.0.13.31 p55ADRgpfs

p59021[/]#vi /home/scripts/gpfs/machines.list
10.0.15.10 p55ADRgpfs 改为0.0.13.31 p55ADRgpfs

测试rsh连接
p59021[/]#rsh p55ADRgpfs date

2.5.2新增节点操作步骤

新增p55ADRgpfs节点
p59021[/home/scripts/gpfs]#mmaddnode -N p55ADRgpfs:quorum
Sun Jan 23 21:05:31 BEIST 2011: 6027-1664 mmaddnode: Processing node p55ADRgpfs
mmaddnode: Command successfully completed
mmaddnode: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

启动p55ADRgpfs节点上的gpfs
p59021[/home/scripts/gpfs]#mmstartup -a
Sun Jan 23 21:08:17 BEIST 2011: 6027-1642 mmstartup: Starting GPFS ...
p59011gpfs: 6027-2114 The GPFS subsystem is already active.
p59022gpfs: 6027-2114 The GPFS subsystem is already active.
p59021gpfs: 6027-2114 The GPFS subsystem is already active.
p59012gpfs: 6027-2114 The GPFS subsystem is already active.
p59023gpfs: 6027-2114 The GPFS subsystem is already active.
p59024gpfs: 6027-2114 The GPFS subsystem is already active.

查看节点状态
p59021[/home/scripts/gpfs]#mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

用dd,cp进行简单的读写测试
p59021[/home/scripts/gpfs]#dd if=/dev/zero of=/erpHome/test1 bs=1024k count=5000

2.6删除不用的NSD盘(视情况而定 30分钟)
2.6.1事前准备

确认nsd的盘为free状态
root@p59011:[/]mmlsnsd

File system Disk name Primary node Backup node

(free disk) gpfs10nsd p59011gpfs
(free disk) gpfs11nsd p59011gpfs
(free disk) gpfs18nsd p55ADRgpfs
(free disk) gpfs19nsd p55ADRgpfs
(free disk) gpfs14nsd p59021gpfs p59012gpfs
(free disk) gpfs15nsd p59021gpfs p59012gpfs
(free disk) gpfs5nsd p55ADRgpfs p59023gpfs
(free disk) gpfs6nsd p55ADRgpfs p59023gpfs
(free disk) gpfs9nsd p59011gpfs

删除NSD
p59021[/home/scripts/gpfs]#mmdelnsd “gpfs5nsd;gpfs6nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs18nsd;gpfs19nsd;gpfs14nsd;gpfs15nsd”

删除各分区上的盘
sh sh run_cmd.sh "sh /home/scripts/gpfs/cleargpfs6800.sh”


3.GPFS集群故障紧急处理(3小时):

在GPFS升级的过程中如果有异常导致报错,可尝试用下列两种方法:

3.1 导出数据盘后重新导入(30分钟)
前提:数据盘没有丢失或损坏

关闭文件系统
p59021[/home/scripts/gpfs]#mmunmount –a

导出gpfs信息到gpfs.exp文件中
p59021[/home/scripts/gpfs]#mmexportfs all -o gpfs.exp

mmexportfs: Processing file system gpfs_erpHome ...
mmexportfs: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

如有必要删除原有的nsd盘
p59021[/home/scripts/gpfs]#mmdelnsd -a

启动gpfs集群
p59021[/home/scripts/gpfs]#mmstartup –a

将gpfs.exp文件导入到gpfs集群
p59021[/home/scripts/gpfs]#mmimportfs all -i gpfs.exp

mmimportfs: Processing file system gpfs_erpHome ...
mmimportfs: Processing disk gpfs3nsd
mmimportfs: Processing disk gpfs4nsd
mmimportfs: Processing disk gpfs14nsd
mmimportfs: Processing disk gpfs15nsd

mmimportfs: Committing the changes ...

mmimportfs: The following file systems were successfully imported:

    gpfs_erpHome

mmimportfs: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

确认状态
p59021[/home/mxin]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system desc
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
gpfs14nsd nsd 512 4003 yes yes ready up 3 system desc
gpfs15nsd nsd 512 4003 yes yes ready up 4 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2

3.2 搭建新的GPFS集群并导入数据盘(60分钟)
前提:3.1失败,数据盘没有损坏,需要清除现有的gpfs集群后重新搭建新的集群,import数据盘
mmunmount /erpHome
mmdelfs gpfs_erpHome
mmdelnsd -F /tmp/gpfs/nsdfile
mmshutdown -a
mmdelnode -a

卸载GPFS 3.1,直接安装将gpfs软件
p59021[/home/scripts/gpfs]# sh run_cmd.sh "/home/scripts/gpfs/install_gpfs3.3.sh &"

重新搭建gpfs集群
p59011[/home/scripts/gpfs]#vi /tmp/gpfs/nodefile
p59011gpfs:quorum-manager
p59012gpfs:client
p59021gpfs:quorum-manager
p59022gpfs:client
p59023gpfs:client
p55ADRgpfs:quorum
p59024gpfs:client
p59011[/home/scripts/gpfs]#mmcrcluster -C gpfs_erp.p59023gpfs -U p59023gpfs -N /tmp/gpfs/nodefile -p p59021gpfs -s p59011gpfs

配置nsd盘的信息
p59021[/home/scripts/gpfs]#vi /tmp/gpfs/nsdfile
xvpath0:p59022gpfs:p59012gpfs:dataAndMetadata:1
xvpath1:p59022gpfs:p59012gpfs:dataAndMetadata:1
xvpath2:p59022gpfs:p59012gpfs:dataAndMetadata:1
xvpath3:p59022gpfs:p59012gpfs:dataAndMetadata:1
xvpath4:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath5:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath6:p59021gpfs:p59011gpfs:dataAndMetadata:4003
xvpath7:p59021gpfs:p59011gpfs:dataAndMetadata:4003

xvpath0-3为81的盘,xvpath4-7为87的盘

生成NSD盘
p59011[/home/scripts/gpfs]#mmcrnsd -F /tmp/gpfs/nsdfile

启动GPFS集群
p59011[/]#mmstartup -a

建立共享文件系统
p59021[/home/scripts/gpfs]#mmcrfs /erpHome gpfs_erpHome -F /tmp/gpfs/nsdfile -A yes -B 64K -N 10241024 -m 2 -r 2 -n 30 -v no

确认状态正常
p59021[/home/scripts/gpfs]mmgetstate -a

Node number Node name GPFS state

   1      p59011gpfs       active
   2      p59012gpfs       active
   3      p59021gpfs       active
   4      p59022gpfs       active
   5      p59023gpfs       active
   7      p55ADRgpfs       active
   8      p59024gpfs       active

p59021[/home/scripts/gpfs]#mmlsdisk gpfs_erpHome -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks


gpfs3nsd nsd 512 1 yes yes ready up 1 system desc
gpfs4nsd nsd 512 1 yes yes ready up 2 system desc
xgpfs11nsd nsd 512 1 yes yes ready up 3 system
xgpfs12nsd nsd 512 1 yes yes ready up 4 system
xgpfs5nsd nsd 512 4003 yes yes ready up 5 system desc
xgpfs6nsd nsd 512 4003 yes yes ready up 6 system
xgpfs13nsd nsd 512 4003 yes yes ready up 7 system
xgpfs14nsd nsd 512 4003 yes yes ready up 8 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2

3.3搭建新的GPFS集群并启用备份恢复数据文件(60-180分钟)(某证券公司人员为主)
前提:3.2也失败,数据盘也损坏了.
只能重建GPFS,然后用TSM将原先备份数据导入回来

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

8

添加新评论1 条评论

mephistomephisto存储架构师VMware
2022-09-27 17:13
谢谢分享!
Ctrl+Enter 发表

本文隶属于专栏

AIX运维专栏
专注于AIX系统运维,系统管理。

作者其他文章

相关文章

相关问题

相关资料

X社区推广