操作系统:windows
数据库:db2 V9.7.0.11
问题:数据库不定期异常重启,检查日志也没有发现坏页,但是产生了大量的FODC文件,也没有找到相应的bug,因为dump文件无法破译,麻烦各位大佬帮忙看看,是什么问题
我大概的分析:是连接的时候,调用了内部函数的导致的内部错误,有点像bug,但是不敢确定
附件:
db2diag.txt (7.61 MB)
1、首先,去做一下内存检测,看 db2diag.log 有可能和内存问题有关,去windows日志再确认下。你可以使用第三方工具或windows自带工具做一下内存检测。
2、如果没有发现硬件问题的话,考虑打补丁到 DB2 v9.7.6,也有可能 DB2 的 bug。
2020-08-19-10.07.54.297000+480 I78827388F1686 LEVEL: Severe
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:10
MESSAGE : Possible memory corruption detected.
DATA #1 : ZRC, PD_TYPE_ZRC, 4 bytes
0x820F0002
DATA #2 : Corrupt block address, PD_TYPE_CORRUPT_BLK_PTR, 8 bytes
0x0000001226c3eac0
DATA #3 : Block header, PD_TYPE_BLK_HEADER, 24 bytes
0x0000001226C3EAA8 : E8AF 2CDB A100 0000 20C6 F734 1200 0000 ..,..... ..4....
0x0000001226C3EAB8 : 50C6 F734 1200 0000 P..4....
DATA #4 : Data header, PD_TYPE_BLK_DATA_HEAD, 48 bytes
0x0000001226C3EAC0 : 5359 5349 424D 2020 5359 5346 554E 2020 SYSIBM SYSFUN
0x0000001226C3EAD0 : 80C6 F734 1200 0000 B0C6 F734 1200 0000 ...4.......4....
0x0000001226C3EAE0 : 4D45 474F 5650 4953 2000 0000 0000 0000 MEGOVPIS .......
CALLSTCK:
[0] 0x0000000180110116 pdLog + 0x350
[1] 0x000000018005567D sqloDiagnoseFreeBlockFailure + 0xFB
[2] 0x000000018005503F sqlofmblkEx + 0x871
[3] 0x0000000001B12605 sqlra_sqlC_mem_free_block + 0x43
[4] 0x0000000001A933EC sqlra_fp_dealloc + 0x120
[5] 0x0000000001B04953 sqlra_free_section + 0x227
[6] 0x0000000001B13C4A sqlra_sqlC_free_section + 0x90
[7] 0x0000000001B13419 sqlra_sqlC_free_sibling + 0x95
[8] 0x0000000001B12F71 sqlra_sqlC_rmv_invoc_siblings + 0xD7
[9] 0x00000000019EBA4F sqlrr_destroy_invocation_cb + 0x1E7
2020-08-19-10.07.54.391000+480 E78829076F982 LEVEL: Critical
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred: "Panic".
The instance may have been shutdown as a result. "Automatic" FODC
(First Occurrence Data Capture) has been invoked and diagnostic
information has been recorded in directory
"C:\\PROGRAMDATA\\IBM\\DB2\\DB2COPY1\\DB2\\FODC_Panic_2020-08-19-10.07.54.3
91000_0000\\". Please look in this directory for detailed evidence
about what happened and contact IBM support if necessary to diagnose
the problem.
2020-08-19-10.07.54.406000+480 E78830060F1189 LEVEL: Severe
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:999
MESSAGE : Memory validation failure, diagnostic file dumped.
DATA #1 : String, 21 bytes
Invalid block header.
DATA #2 : File name, 31 bytes
12360.15640.mem_diagnostics.txt
CALLSTCK:
[0] 0x0000000180110116 pdLog + 0x350
[1] 0x000000018005A3E9 SQLO_MEM_POOL::diagnoseMemoryCorruptionAndCrash + 0x2A1
[2] 0x00000001800558AB sqloDiagnoseFreeBlockFailure + 0x329
[3] 0x000000018005503F sqlofmblkEx + 0x871
[4] 0x0000000001B12605 sqlra_sqlC_mem_free_block + 0x43
[5] 0x0000000001A933EC sqlra_fp_dealloc + 0x120
[6] 0x0000000001B04953 sqlra_free_section + 0x227
[7] 0x0000000001B13C4A sqlra_sqlC_free_section + 0x90
[8] 0x0000000001B13419 sqlra_sqlC_free_sibling + 0x95
[9] 0x0000000001B12F71 sqlra_sqlC_rmv_invoc_siblings + 0xD7
2020-08-19-10.07.54.484000+480 I78831251F498 LEVEL: Warning
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, RAS/PD component, pdEDUIsInDB2KernelOperation, probe:600
DATA #1 : String, 11 bytes
sqlofmblkEx
DATA #2 : String, 4 bytes
sqlo
2020-08-19-10.07.54.484000+480 I78831751F577 LEVEL: Severe
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, RAS/PD component, pdResilienceIsSafeToSustain, probe:800
DATA #1 : String, 37 bytes
Trap Sustainability Criteria Checking
DATA #2 : Hex integer, 8 bytes
0x0000000400003802
DATA #3 : Boolean, 1 bytes
false
2020-08-19-10.07.54.484000+480 I78832330F516 LEVEL: Event
PID : 12360 TID : 15640 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : PISDB
APPHDL : 0-26954 APPID: C0A6F90B.E30E.200819020012
AUTHID : EGOVPIS
EDUID : 15640 EDUNAME: db2agent (PISDB) 0
FUNCTION: DB2 UDB, trace services, pdInvokeCalloutScript, probe:10
START : Invoking D:\\IBM\\SQLLIB\\bin\\db2cos_trap.bat from oper system services sqloEDUExceptionFilter
收起请将数据库的运行环境交代清楚些,以便帮助指导,如OS的版本,db2是否是HRDA环境,出现问题时,主机内存,cpu,网络,数据库的连接会话等负荷信息。
根据你的diag日志
提供如下几个连接供参考
https://www.ibm.com/support/pages/fedstart-failed-message-appears-db2diaglog-periodically
https://www.ibm.com/support/pages/drda-wrapper-reports-sql1776nsql30108n-intermittently-when-accessing-read-only-hadr-standby-server