一个普通的变更,离奇的决定,崩溃的结果
用户存储根据厂商反馈需要升级微码,此次维护操作可以在线操作。向用户报告此次升级在线操作,对前端业务没有影响,但是用户内部经过讨论认为停机更为稳妥,毕竟存储端做操作怕影响数据。那就按照计划停机操作,停业务,数据库,RAC/HA/OS等。等存储端变更完毕,启动OS/HA/RAC/数据库。意外发生了,HA起不来了。崩溃死了,查原因吧,一查发现共享存储磁盘属性上reserve_policy变了。变成了默认值了。
关了下系统,这个值咋还变了呢,在检查了其他属性,发现都变了,我的天啊,升级微码会影响到这个吗?存储工程师也傻眼了,百年不遇啊,冤枉啊。当时也顾不了查找原因了,既然变了,改回来先启动业务再说吧,等业务起来后,先抓取AIX日志再说,发给IBM,看看厂商咋分析吧,这次IBM效率还挺快,2天就给了回复,非常抱歉的跟我们说,遇到了bug。
IZ92384: DISK ATTRIBUTES CHANGING TO DEFAULTS AFTER REBOOT APPLIES TO AIX 7100-00
A fix is available
Obtain the fix for this APAR.
APAR status
Closed as program error.
Error description
XIV hdisk attributes (like algorithm, reserve_policy and
queue_depth) are changing from custom values back to
defaults
(round_robin, no_reserve and 1) after reboot.
This defect is not limited to XIV only.
Local fix
Problem summary
Some disk attributes like algorithm, reserve_policy etc may
revert back to their default values upon rebooting the system.
So changes made by chdev command can be undone when on reboot.
Problem conclusion
Disk configuration code to be modified so that attributes are
not reverted to their default values upon reboot if they had
been changed.
Temporary fix
Comments
5300-10 - use AIX APAR IZ98324
5300-11 - use AIX APAR IZ98304
5300-12 - use AIX APAR IZ98065
6100-03 - use AIX APAR IZ97856
6100-04 - use AIX APAR IZ97578
6100-05 - use AIX APAR IZ97474
6100-06 - use AIX APAR IZ92228
7100-00 - use AIX APAR IZ92384
收起