OS:7100-03-04-1441
HACMP:7.1.3.3
# errpt -aj A8576C0D |more
---------------------------------------------------------------------------
LABEL: STORAGERM_STOPPED_S
IDENTIFIER: A8576C0D
Date/Time: Tue Oct 25 23:34:01 CST 2016
Sequence Number: 466
Machine Id: 00F9AA764C00
Node Id: HISDB1
Class: O
Type: INFO
WPAR: Global
Resource Name: StorageRM
Description
IBM.StorageRM daemon has been stopped.
Probable Causes
The RSCT Configuration Manager daemon(IBM.StorageRMd) has been stopped.
User Causes
The stopsrc -s IBM.StorageRM command has been executed.
Recommended Actions
Confirm that the daemon should be stopped. Normally, this daemon should
not be stopped explicitly by the user.
Detail Data
DETECTING MODULE
RSCT,StorageRMDaemon.C,1.65,362
ERROR ID
REFERENCE CODE
# errpt -aj 28854E81 |more
---------------------------------------------------------------------------
LABEL: GS_STOP_ST
IDENTIFIER: 28854E81
Date/Time: Tue Oct 25 23:34:07 CST 2016
Sequence Number: 467
Machine Id: 00F9AA764C00
Node Id: HISDB1
Class: O
Type: INFO
WPAR: Global
Resource Name: cthags
Description
Group Services daemon stopped
Probable Causes
Daemon stopped by SRC
Daemon stopped by signal
User Causes
Daemon stopped manually by user
Recommended Actions
Check that Group Services daemon is stopped
Detail Data
DETECTING MODULE
RSCT,SRCSocket.C,1.94,423
ERROR ID
6/uIVc.jhr1M/KUY1/2.2.1...................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
Exiting for STOP NORMAL request from SRC.
# errpt -aj DB14100E |more
---------------------------------------------------------------------------
LABEL: LVM_GS_CONNECTIVITY
IDENTIFIER: DB14100E
Date/Time: Tue Oct 25 23:34:08 CST 2016
Sequence Number: 469
Machine Id: 00F9AA764C00
Node Id: HISDB1
Class: U
Type: PERM
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Group Services detected a failure
Probable Causes
Unable to establish communication with Cluster daemons
Failure Causes
Concurrent Volume Group forced offline
Recommended Actions
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
Volume Group ID
00F9 AA76 0000 4C00 0000 014E 67C0 79F4
MAJOR/MINOR DEVICE NUMBER
003C 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_GS_CONNECTIVITY
IDENTIFIER: DB14100E
Date/Time: Tue Oct 25 23:34:08 CST 2016
Sequence Number: 468
Machine Id: 00F9AA764C00
Node Id: HISDB1
Class: U
Type: PERM
WPAR: Global
Resource Name: LIBLVM
Resource Class: NONE
Resource Type: NONE
Location:
Description
Group Services detected a failure
Probable Causes
Unable to establish communication with Cluster daemons
Failure Causes
Concurrent Volume Group forced offline
Recommended Actions
CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES
IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
Volume Group ID
00F9 AA76 0000 4C00 0000 014E 67BF 7264
MAJOR/MINOR DEVICE NUMBER
0032 0000
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
昨天晚上23点34分,A节点突然宕机,资源组切换到B节点了。今天早上已经将A节点启动并重启了HACMP。现在查看errpt发现StorageRM 就这么直接stop了,hacmp.out也没有23点的日志。不知道什么原因。麻烦各位大侠帮忙看看。
Oct 25 23:34:01 HISDB1 daemon:notice StorageRM[8061148]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Templat
e ID: a8576c0d:::Details File: :::Location: RSCT,StorageRMDaemon.C,1.65,362 :::STORAGERM_STOPPED_ST IBM.StorageRM dae
mon has been stopped.
Oct 25 23:34:07 HISDB1 daemon:notice cthags[7209198]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6/uIVc.jhr1M/KUY1/2.2.1........
...........:::Reference ID: :::Template ID: 28854e81:::Details File: :::Location: RSCT,SRCSocket.C,1.94,423 ::
:GS_STOP_ST Group Services daemon stopped DIAGNOSTIC EXPLANATION Exiting for STOP NORMAL request from SRC.
Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: Called, state=ST_STABLE, provider token
1
Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: GsToken 3, AdapterToken 4, rm_GsToken 1
Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: GRPSVCS announcment code=512; exiting
Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (cthags)
Oct 25 23:34:08 HISDB1 daemon:notice ConfigRM[6160848]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID: :::Template
ID: 2625c573:::Details File: :::Location: RSCT,PeerDomain.C,1.99.30.1,25415 :::CONFIGRM_OFFLINE_ST The node is offline
.
Oct 25 23:34:08 HISDB1 user:alert Cache(CACHE)[9764994]: CACHE JOURNALING SYSTEM: Write to journal file has failed
Oct 25 23:34:09 HISDB1 daemon:notice snmpd[7471306]: NOTICE: lost peer (SMUX ::1+32784+1)
Oct 25 23:34:09 HISDB1 user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of clstrmgrES.
Oct 25 23:34:09 HISDB1 user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!!
报了IBM800,以他经验判断可能是RSCT的BUG,要打IV66606补丁。我又抓了snap日志给他,等他最终确定。