740宕机,IBM.StorageRM daemon自动停止了。

OS:7100-03-04-1441

HACMP:7.1.3.3

# errpt -aj A8576C0D |more

---------------------------------------------------------------------------

LABEL:          STORAGERM_STOPPED_S

IDENTIFIER:     A8576C0D

Date/Time:       Tue Oct 25 23:34:01 CST 2016

Sequence Number: 466

Machine Id:      00F9AA764C00

Node Id:         HISDB1

Class:           O

Type:            INFO

WPAR:            Global

Resource Name:   StorageRM      

Description

IBM.StorageRM daemon has been stopped.

Probable Causes

The RSCT Configuration Manager daemon(IBM.StorageRMd) has been stopped.

User Causes

The stopsrc -s IBM.StorageRM command has been executed.

        Recommended Actions

        Confirm that the daemon should be stopped. Normally, this daemon should

not be stopped explicitly by the user.

Detail Data

DETECTING MODULE

RSCT,StorageRMDaemon.C,1.65,362               

ERROR ID

                                          

REFERENCE CODE


# errpt -aj 28854E81 |more

---------------------------------------------------------------------------

LABEL:          GS_STOP_ST

IDENTIFIER:     28854E81

Date/Time:       Tue Oct 25 23:34:07 CST 2016

Sequence Number: 467

Machine Id:      00F9AA764C00

Node Id:         HISDB1

Class:           O

Type:            INFO

WPAR:            Global

Resource Name:   cthags         

Description

Group Services daemon stopped

Probable Causes

Daemon stopped by SRC

Daemon stopped by signal

User Causes

Daemon stopped manually by user

        Recommended Actions

        Check that Group Services daemon is stopped

Detail Data

DETECTING MODULE

RSCT,SRCSocket.C,1.94,423                     

ERROR ID

6/uIVc.jhr1M/KUY1/2.2.1...................

REFERENCE CODE

                                          

DIAGNOSTIC EXPLANATION

Exiting for STOP NORMAL request from SRC.


# errpt -aj DB14100E |more

---------------------------------------------------------------------------

LABEL:          LVM_GS_CONNECTIVITY

IDENTIFIER:     DB14100E

Date/Time:       Tue Oct 25 23:34:08 CST 2016

Sequence Number: 469

Machine Id:      00F9AA764C00

Node Id:         HISDB1

Class:           U

Type:            PERM

WPAR:            Global

Resource Name:   LIBLVM         

Resource Class:  NONE

Resource Type:   NONE

Location:        

Description

Group Services detected a failure

Probable Causes

Unable to establish communication with Cluster daemons

Failure Causes

Concurrent Volume Group forced offline

        Recommended Actions

        CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES

        IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

Volume Group ID

00F9 AA76 0000 4C00 0000 014E 67C0 79F4

MAJOR/MINOR DEVICE NUMBER

003C 0000

SENSE DATA

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

---------------------------------------------------------------------------

LABEL:          LVM_GS_CONNECTIVITY

IDENTIFIER:     DB14100E

Date/Time:       Tue Oct 25 23:34:08 CST 2016

Sequence Number: 468

Machine Id:      00F9AA764C00

Node Id:         HISDB1

Class:           U

Type:            PERM

WPAR:            Global

Resource Name:   LIBLVM         

Resource Class:  NONE

Resource Type:   NONE

Location:        

Description

Group Services detected a failure

Probable Causes

Unable to establish communication with Cluster daemons

Failure Causes

Concurrent Volume Group forced offline

        Recommended Actions

        CHECK ERROR LOG FOR ADDITIONAL RELATED ENTRIES

        IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

Volume Group ID

00F9 AA76 0000 4C00 0000 014E 67BF 7264

MAJOR/MINOR DEVICE NUMBER

0032 0000

SENSE DATA

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000


昨天晚上23点34分,A节点突然宕机,资源组切换到B节点了。今天早上已经将A节点启动并重启了HACMP。现在查看errpt发现StorageRM 就这么直接stop了,hacmp.out也没有23点的日志。不知道什么原因。麻烦各位大侠帮忙看看。

参与18

3同行回答

zwz99999zwz99999系统工程师dcits
估计得打补丁了,powerha7.1 没有powerha6.1好用显示全部

估计得打补丁了,powerha7.1 没有powerha6.1好用

收起
系统集成 · 2016-10-28
浏览7283
rojackrojack系统工程师某软件股份公司
Oct 25 23:34:01 HISDB1 daemon:notice StorageRM[8061148]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: a8576c0d:::Details File:  :::Location: RSCT,StorageRMDaemon.C,1.65,362     &n...显示全部

Oct 25 23:34:01 HISDB1 daemon:notice StorageRM[8061148]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Templat

e ID: a8576c0d:::Details File:  :::Location: RSCT,StorageRMDaemon.C,1.65,362               :::STORAGERM_STOPPED_ST IBM.StorageRM dae

mon has been stopped.

Oct 25 23:34:07 HISDB1 daemon:notice cthags[7209198]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6/uIVc.jhr1M/KUY1/2.2.1........

...........:::Reference ID:  :::Template ID: 28854e81:::Details File:  :::Location: RSCT,SRCSocket.C,1.94,423                     ::

:GS_STOP_ST Group Services daemon stopped DIAGNOSTIC EXPLANATION Exiting for STOP NORMAL request from SRC.

Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: Called, state=ST_STABLE, provider token

1

Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: GsToken 3, AdapterToken 4, rm_GsToken 1

Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08 announcementCb: GRPSVCS announcment code=512; exiting

Oct 25 23:34:08 HISDB1 local0:crit clstrmgrES[5374388]: Tue Oct 25 23:34:08  CHECK FOR FAILURE OF RSCT SUBSYSTEMS (cthags)

Oct 25 23:34:08 HISDB1 daemon:notice ConfigRM[6160848]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template

ID: 2625c573:::Details File:  :::Location: RSCT,PeerDomain.C,1.99.30.1,25415             :::CONFIGRM_OFFLINE_ST The node is offline

.

Oct 25 23:34:08 HISDB1 user:alert Cache(CACHE)[9764994]: CACHE JOURNALING SYSTEM: Write to journal file has failed

Oct 25 23:34:09 HISDB1 daemon:notice snmpd[7471306]: NOTICE: lost peer (SMUX ::1+32784+1)

Oct 25 23:34:09 HISDB1 user:notice PowerHA SystemMirror for AIX: clexit.rc : Unexpected termination of clstrmgrES.

Oct 25 23:34:09 HISDB1 user:notice PowerHA SystemMirror for AIX: clexit.rc : Halting system immediately!!!

报了IBM800,以他经验判断可能是RSCT的BUG,要打IV66606补丁。我又抓了snap日志给他,等他最终确定。

收起
银行 · 2016-10-26
浏览8357
zwz99999zwz99999系统工程师dcits
好像是进程被人为停止,导致a节点无法访问存储,a节点宕机切换b节点,看看cluster.log显示全部

好像是进程被人为停止,导致a节点无法访问存储,a节点宕机切换b节点,看看cluster.log

收起
系统集成 · 2016-10-26
浏览6612

提问者

rojack
系统工程师某软件股份公司
擅长领域: 系统管理灾备服务器

相关问题

问题状态

  • 发布时间:2016-10-26
  • 关注会员:4 人
  • 问题浏览:12993
  • 最近回答:2016-10-28
  • X社区推广