3. 解决:
手动启动这两个分区成功:
[DWE3:/db2home/db2inst1/sqllib]more db2nodes.cfg
0 DWE3 0
1 DWE3 1
2 DWE4 0
3 DWE4 1
4 DWE1 0
5 DWE1 1
6 DWE2 0
7 DWE2 1
[DWE3:/db2home/db2inst1/sqllib]db2start dbpartitionnum 4
2010-05-06 21:16:20 4 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
[DWE3:/db2home/db2inst1/sqllib]db2start dbpartitionnum 5
2010-05-06 21:16:48 5 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
查看诊断日志,node 5成功完成recovery恢复,其他分区与node 5 的FCM节点间通讯连接重建完成
2010-05-06-21.16.48.175650+480 I427300366A292 LEVEL: Severe
PID : 254236 TID : 1 PROC : db2pdbc 0
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, base sys utilities, sqleExecuteNodeRecovery, probe:200
MESSAGE : NODE RECOVERY COMPLETED FOR NODE 5
2010-05-06-21.16.57.147051+480 I427300659A286 LEVEL: Error
PID : 1778550 TID : 1 PROC : db2fcmr 1
INSTANCE: db2inst1 NODE : 001
FUNCTION: DB2 UDB, fast comm manager, sqkfRecvConduit::HandleAuthentEvent, probe:95
MESSAGE : Re-conn n:5; ls:219
2010-05-06-21.16.57.140333+480 I427300946A285 LEVEL: Error
PID : 1884500 TID : 1 PROC : db2fcmr 7
INSTANCE: db2inst1 NODE : 007
FUNCTION: DB2 UDB, fast comm manager, sqkfRecvConduit::HandleAuthentEvent, probe:95
MESSAGE : Re-conn n:5; ls:22
2010-05-06-21.16.57.140347+480 I427301232A285 LEVEL: Error
PID : 2212140 TID : 1 PROC : db2fcmr 6
INSTANCE: db2inst1 NODE : 006
FUNCTION: DB2 UDB, fast comm manager, sqkfRecvConduit::HandleAuthentEvent, probe:95
MESSAGE : Re-conn n:5; ls:21
2010-05-06-21.16.57.155593+480 I427301518A285 LEVEL: Error
PID : 152472 TID : 1 PROC : db2fcmr 3
INSTANCE: db2inst1 NODE : 003
FUNCTION: DB2 UDB, fast comm manager, sqkfRecvConduit::HandleAuthentEvent, probe:95
MESSAGE : Re-conn n:5; ls:83
2010-05-06-21.16.57.162432+480 I427301804A286 LEVEL: Error
PID : 176512 TID : 1 PROC : db2fcmr 2
INSTANCE: db2inst1 NODE : 002
FUNCTION: DB2 UDB, fast comm manager, sqkfRecvConduit::HandleAuthentEvent, probe:95
MESSAGE : Re-conn n:5; ls:210
查询验证表数据,各分区均能正常访问:
db2 => select dbpartitionnum(CITY_CDE) as dbpartition_num,count(*) as rows from MARTRPT.TB_RPT_MART_USER_M group by dbpartitionnum(CITY_CDE) order by dbpartitionnum(CITY_CDE) desc
DBPARTITION_NUM ROWS
--------------- -----------
7 3741122
6 4233836
5 2676695
4 3701567
3 2940725
2 4241913
1 3541748
7 record(s) selected.
db2 => select count(*) from pdw.t_dwu_user_stat_m1004
1
-----------
8483178
1 record(s) selected.
检查数据库管理器快照,8个分区几点都正常。
[DWE3:/db2home/db2inst1/sqllib]db2 get snapshot for dbm
Database Manager Snapshot
Node type = Enterprise Server Edition with local and remote clients
Instance name = db2inst1
Number of database partitions in DB2 instance = 8
Database manager status = Active
Product name = DB2 v9.1.0.5
Service level = s080512 (U815922)
Private Sort heap allocated = 0
Private Sort heap high water mark = 0
Post threshold sorts = Not Collected
Piped sorts requested = 15982939
Piped sorts accepted = 15982939
Start Database Manager timestamp = 2009-12-26 10:20:08.156485
Last reset timestamp =
Snapshot timestamp = 2010-05-06 21:38:43.115373
Remote connections to db manager = 14
Remote connections executing in db manager = 4
Local connections = 3
Local connections executing in db manager = 1
Active local databases = 1
High water mark for agents registered = 651
High water mark for agents waiting for a token = 0
Agents registered = 300
Agents waiting for a token = 0
Idle agents = 232
Committed private Memory (Bytes) = 2293760
Switch list for db partition number 0
Buffer Pool Activity Information (BUFFERPOOL) = OFF
Lock Information (LOCK) = ON 2009-12-26 10:20:08.156485
Sorting Information (SORT) = OFF
SQL Statement Information (STATEMENT) = ON 2009-12-26 10:20:08.156485
Table Activity Information (TABLE) = OFF
Take Timestamp Information (TIMESTAMP) = ON 2009-12-26 10:20:08.156485
Unit of Work Information (UOW) = ON 2009-12-26 10:20:08.156485
Agents assigned from pool = 22430770
Agents created from empty pool = 17392
Agents stolen from another application = 25957
High water mark for coordinating agents = 463
Max agents overflow = 0
Hash joins after heap threshold exceeded = 18
Total number of gateway connections = 76193
Current number of gateway connections = 0
Gateway connections waiting for host reply = 0
Gateway connections waiting for client request = 0
Gateway connection pool agents stolen = 0
Node FCM information corresponds to = 0
Free FCM buffers = 2670
Free FCM buffers low water mark = 533
Free FCM channels = 5330
Free FCM channels low water mark = 3563
Number of FCM nodes = 8
Node Total Buffers Total Buffers Connection
Number Sent Received Status
----------- ------------------ ------------------ -----------------
0 88048129 75347987 Active
1 1171406049 2886023437 Active
2 1158587140 2812298403 Active
3 1138563265 2796705837 Active
4 1174359057 2890568985 Active
5 1200402254 2976149013 Active
6 1167367047 2885679453 Active
7 1197547312 2974899195 Active
Memory usage for database manager:
Node number = 0
Memory Pool Type = Database Monitor Heap
Current size (bytes) = 131072
High water mark (bytes) = 655360
Configured size (bytes) = 393216
Node number = 0
Memory Pool Type = Other Memory
Current size (bytes) = 34734080
High water mark (bytes) = 46727168
Configured size (bytes) = 108462080
Node number = 0
Memory Pool Type = FCMBP Heap
Current size (bytes) = 19529728
High water mark (bytes) = 34209792
Configured size (bytes) = 27918336
经应用验证,没发现问题。
EDU调度进程异常crash原因不详,待查。
收起