作者fengsh·2011-05-09 14:17

系统工程师·电信行业

IBM DB2高可用性实施—高可用性灾难恢复（HADR）

字数 35902阅读 8761评论 4赞 1

1 引言

IBM DB2数据库的高可用性及灾难恢复主要有几种方式实现：

1，自带高可用性组件实现高可用性（db2 v9.5,9.7里面的db2haicu工具配置）；

2，第三方集群软件实现高可用性（如HACMP等）

3，配置DB2的HADR，实现高可用性灾难恢复，类似oracle的dataguard；

4，集群软件＋HADR 实现的高可用性及灾难恢复综合方案；

本文通过实例完整介绍了使用DB2 V9.7进行DB2高可用性灾难恢复的安装、配置、实施；以及客户端自动路由的配置，接管测试过程。

2 系统环境 2.1 实验环境

数据库主机：使用VMare workstation 6.5虚拟两台机器

操作系统：Red Hat EL 5.3

数据库软件：DB2 V9.7

虚拟机，操作系统环境安装过程省去。

2.2 逻辑拓扑结构

2.3 网络配置 2.3.1 主机的IP配置

Linux1 130.30.3.252（255.255.255.0）

Linux2 130.30.3.136（255.255.255.0）

2.3.2 /etc/hosts文件配置

Linux1的/etc/hosts

130.30.3.252 linux1

130.30.3.136 linux2

Linux2的/etc/hosts

130.30.3.252 linux1

130.30.3.136 linux2

2.3.3 验证网络连通性

验证，确保网络连通性是否正常及配置正确

Ping linux1

Ping linux2

Ping 130.30.3.252

Ping 130.30.3.136

3 DB2软件安装 3.1 系统内核参数调整

用root用户登录，编辑/etc/sysctl.conf文件，修改需调整的内核参数，增加如下：

kernel.sem=250 256000 32 1024

kernel.shmmax=268435456

kernel.shmall=8388608

kernel.msgmax=65535

kernel.msgmnb=65535

kernel.shmall = 2097152
       kernel.shmmax = 2147483648
       kernel.shmmni = 4096
       # semaphores: semmsl, semmns, semopm, semmni
       kernel.sem = 250 32000 100 128
       fs.file-max = 65536
       net.ipv4.ip_local_port_range = 1024 65000
       net.core.rmem_default=262144
       net.core.rmem_max=262144
       net.core.wmem_default=262144
       net.core.wmem_max=262144

执行sysctl –p，从/etc/sysctl.conf文件装入sysctl设置。

运行ipcs –l命令显示验证当前的内核参数设置

# ipcs -l

------ Shared Memory Limits --------

max number of segments = 4096 // SHMMNI

max seg size (kbytes) = 32768 // SHMMAX

max total shared memory (kbytes) = 8388608 // SHMALL

min seg size (bytes) = 1

------ Semaphore Limits --------

max number of arrays = 1024 // SEMMNI

max semaphores per array = 250 // SEMMSL

max semaphores system wide = 256000 // SEMMNS

max ops per semop call = 32 // SEMOPM

semaphore max value = 32767

------ Messages: Limits --------

max queues system wide = 1024 // MSGMNI

max size of message (bytes) = 65536 // MSGMAX

default max size of queue (bytes) = 65536 // MSGMNB

3.2 用户资源限制调整

设置DB2实例用户Data，nofiles，fsize资源的操作系统硬限制为无限制。

可通过修改文件/etc/security/limits.conf 设置

* soft nproc 3000

* hard nproc 16384

* soft nofile 65536

* hard nofile 65536

可通过ulimit –Hdnf命令查询值的限制

3.3 创建文件系统

在主节点创建以下文件系统：

创建PV，卷组

[root@linux1 db2]# pvcreate /dev/sdb1

[root@linux1 db2]# vgcreate datavg /dev/sdb1

新建LV，文件系统

[root@linux1 ~]# lvcreate -n db2homelv -L 1G /dev/datavg

Logical volume "db2homelv" created

[root@linux1 ~]# lvcreate -n hafs01lv -L 2G /dev/datavg

Logical volume "hafs01lv" created

# mkfs.ext3 /dev/datavg/db2homelv

# mkfs.ext3 /dev/datavg/hafs01lv

# mkdir /db2home

# mkdir /db2bak

挂载文件系统

# mount /dev/datavg/db2homelv /db2home

# mount /dev/datavg/hafs01lv /db2bak

增加挂载点条目

# vi /etc/fstab

/dev/datavg/db2homelv /db2home ext3 defaults 1 2

/dev/datavg/hafs01lv /db2bak ext3 defaults 1 2

在备用主机节点创建以下文件系统

[root@linux2 ~]# vgcreate datavg /dev/sdb1

Volume group "datavg" successfully created

[root@linux2 ~]# lvcreate -n db2homelv -L 1G /dev/datavg

Logical volume "db2homelv" created

[root@linux2 ~]# mkfs.ext3 /dev/datavg/db2homelv

[root@linux2 ~]# mkdir /db2home

[root@linux2 ~]# mount /dev/datavg/db2homelv /db2home

[root@linux2 ~]# vi /etc/fstab

/dev/datavg/db2homelv /db2home ext3 defaults 1 2

3.4 创建用户及组

分别在主、备节点机器上创建以下用户组及用户，确保两台机器使用完全相同的UID,GID。

新建用户组

通过输入下列命令，为实例所有者创建一个组（例如，db2iadm1），为将要执行 UDF 或存储过程的用户创建一个组（例如，db2fadm1），并为管理服务器创建一个组（例如，dasadm1）：

groupadd -g 999 db2iadm1

groupadd -g 998 db2fadm1

groupadd -g 997 dasadm1

新建用户

通过使用下列命令，为前一步骤中创建的每个组创建一个用户。每个用户的主目录将是先前创建的共享DB2主目录（db2home）。

useradd -u 999 -g db2iadm1 -m -d /db2home/db2inst1 db2inst1

useradd -u 998 -g db2fadm1 -m -d /db2home/db2fenc1 db2fenc1

useradd -u 997 -g dasadm1 -m -d /db2home/dasusr1 dasusr1

设置新建用户的初始密码

# passwd db2inst1

# passwd db2fenc1

# passwd dasusr1

修改文件系统属主

# chown db2inst1:db2iadm1 /db2home

# chown db2inst1:db2iadm1 /db2bak

3.5 安装DB2软件

依次在主、备节点分别完成DB2数据库软件的安装，一般建议使用root安装，非root用户安装会有一些限制。

解压安装下载的安装介质

[root@linux1 db2]# tar zxvf v9.7_linuxia32_server.tar.gz

使用db2_install命令执行安装

[root@linux1 server]# ./db2_install

选择DB2安装目录，可以更改安装目录，linux下缺省安装目录为/opt/ibm/db2/V9.7

用于安装产品的缺省目录－ /opt/ibm/db2/V9.7

***********************************************************

要选择另一个目录用于安装吗？[是/否]

否

安装DB2产品选择ESE

指定下列其中一个关键字以安装 DB2 产品。

ESE

CONSV

WSE

EXP

CLIENT

RTCL

按“帮助”以重新显示产品名称。

按“退出”以退出。

***********************************************************

ESE

等待安装执行过程，完成后会提示安装成功信息

正在初始化 DB2 安装。

要执行的任务总数为：46

要执行的所有任务的总估计时间为：1876

任务 #1 启动

描述：正在检查许可协议的接受情况

估计时间 1 秒

任务 #1 结束

…………………………………

…………………………………S

已成功完成执行。

有关更多信息，请参阅 "/tmp/db2_install.log.18940" 上的 DB2安装日志。

3.6 配置NTP网络时间同步

建议同步集群节点上的时间和日期（但不是必须的）。

4 配置NFS文件系统 4.1 配置NFS 服务器

在主服务器上通过在 /etc/exports 文件中添加以下条目，可以在启动时通过一个 NFS 服务导出这个文件系统

[root@linux1 db2bak]# vi /etc/exports

/db2bak *(rw,no_root_squash)

启动NFS服务

[root@linux1 db2bak]#service nfs start

[root@linux1 db2bak]#service portmap start

执行 exportfs 命令，使将挂载的 NFS 客户机能使用配置的/db2bak共享目录

/usr/sbin/exportfs -a

5 L: c' E+ `+ `% Z6 t2 l5 T( }其中选项 a 用于导出 /etc/exports 文件中列出的所有目录。

4.2 配置NFS 客户机

用以下命令在在备机上创建共享目录：

[root@linux2 ~]# mkdir /db2bak

添加一个条目到 /etc/fstab 文件，使 NFS 在启动时自动挂载文件系统：

130.30.3.252:/db2bak /db2bak nfs rw,timeo=300,retrans=5,hard,intr,bg,suid

用以下命令在备机数据库服务器上挂载导出的文件系统：

[root@linux2 ~]# mount -t nfs 130.30.3.252:/db2bak /db2bak

或[root@linux2 server]# mount /db2bak

5 创建示例数据库 5.1 创建数据库实例

分别在主、备节点完成数据库实例创建，修改参数

[root@linux1 instance]cd /opt/ibm/db2/V9.7/instance

[root@linux1 instance]# ./db2icrt -u db2fenc1 db2inst1

DBI1070I Program db2icrt completed successfully.

[root@linux1 instance]# ./db2ilist

db2inst1

[root@linux1 instance]# su - db2inst1

[db2inst1@linux1 ~]$ db2start

01/27/2011 17:37:46 0 0 SQL1063N DB2START processing was successful.

SQL1063N DB2START processing was successful.

[db2inst1@linux1 ~]$ db2set db2comm=tcpip

[db2inst1@linux1 ~]$ db2 update dbm cfg using svcename DB2_db2inst1

DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully.

[db2inst1@linux1 ~]$ cat /etc/services |grep DB2_

DB2_db2inst1 60000/tcp

DB2_db2inst1_1 60001/tcp

DB2_db2inst1_2 60002/tcp

DB2_db2inst1_END 60003/tcp

5.2 创建管理服务器

分别在主、备节点创建数据库管理服务器

[root@linux1 ~]# cd /opt/ibm/db2/V9.7/instance

[root@linux1 instance]# ./dascrt -u dasusr1

SQL4406W The DB2 Administration Server was started successfully.

DBI1070I Program dascrt completed successfully.

[root@linux1 instance]# su - dasusr1

[dasusr1@linux1 ~]$ db2admin start

SQL4409W The DB2 Administration Server is already active.

5.3 创建示例数据库

在主节点创建sample数据库

[db2inst1@linux1 ~]$ db2sampl -dbpath /db2home/db2inst1/sample

6 HADR数据库配置 6.1 修改主数据库参数

修改以下数据库参数：

db2 UPDATE DB CFG FOR sample USING INDEXREC RESTART LOGINDEXBUILD ON LOGARCHMETH1 DISK:/db2bak/logs LOGPRIMARY 20 LOGSECOND 20 AUTORESTART OFF TRACKMOD ON

[db2inst1@linux1 ~]$ db2 UPDATE DB CFG FOR sample USING INDEXREC RESTART LOGINDEXBUILD ON LOGARCHMETH1 DISK:/db2bak/logs LOGPRIMARY 20 LOGSECOND 20 AUTORESTART OFF

DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.

6.2 在主服务器上备份数据库

在主机上执行脱机备份数据库

[db2inst1@linux1 db2bak]$ db2 deactivate db sample

DB20000I The DEACTIVATE DATABASE command completed successfully.

[db2inst1@linux1 db2bak]$ db2 backup db sample to /db2bak/ with 2 buffers buffer 1024 parallelism 4 without prompting

Backup successful. The timestamp for this backup image is : 20110309104324

[db2inst1@linux1 db2bak]$ ls

logs lost+found SAMPLE.0.db2inst1.NODE0000.CATN0000.20110309104324.001

[db2inst1@linux1 db2bak]$ db2ckbkp SAMPLE.0.db2inst1.NODE0000.CATN0000.20110309104324.001

[1] Buffers processed: ###################################

Image Verification Complete - successful.

6.3 在备用服务器上恢复数据库

通过主数据库备份进行完全恢复，完成备用数据库的创建

[root@linux2 instance]# su - db2inst1

[db2inst1@linux2 ~]$ cd /db2bak

[db2inst1@linux2 ~]$ mkdir /db2home/db2inst1/sample

[db2inst1@linux2 ~]$ ls

sample sqllib

[db2inst1@linux2 ~]$ cd /db2bak

[db2inst1@linux2 db2bak]$ ls

logs lost+found SAMPLE.0.db2inst1.NODE0000.CATN0000.20110309104324.001 SAMPLE.0.db2inst1.NODE0000.CATN0000.20110310105829.001

[db2inst1@linux2 db2bak]$ db2 restore db sample from /db2bak taken at 20110310105829 replace history file

DB20000I The RESTORE DATABASE command completed successfully.

6.4 修改/etc/services配置文件

分别在主、备服务器上修改服务配置文件，增加用于HADR通讯使用的服务名称和端口。

– Service name: DB2_HADR_1

– Port number: 55001

– Service name: DB2_HADR_2

– Port number: 55002

在主服务器linux1上添加如下服务

[root@linux1 ~]# cat /etc/services|grep DB2_

DB2_db2inst1 60000/tcp

DB2_db2inst1_1 60001/tcp

DB2_db2inst1_2 60002/tcp

DB2_db2inst1_END 60003/tcp

DB2_HADR_1 55001/tcp

DB2_HADR_2 55002/tcp

在备服务器linux2上添加如下服务

[root@linux2 ~]# cat /etc/services|grep DB2_

DB2_db2inst1 60000/tcp

DB2_db2inst1_1 60001/tcp

DB2_db2inst1_2 60002/tcp

DB2_db2inst1_END 60003/tcp

DB2_HADR_1 55001/tcp

DB2_HADR_2 55002/tcp

注：这一步不是必须的，因为在下面配置HADR_LOCAL_SVC和HADR_REMOTE_SVC数据库参数的时候您可以直接使用端口号来替代服务名。

6.5 修改主服务器的HADR参数

db2 update db cfg for sample using HADR_LOCAL_HOST 130.30.3.252

db2 update db cfg for sample using HADR_LOCAL_SVC DB2_HADR_1

db2 update db cfg for sample using HADR_REMOTE_HOST 130.30.3.136

db2 update db cfg for sample using HADR_REMOTE_SVC DB2_HADR_2

db2 update db cfg for sample using HADR_REMOTE_INST DB2INST1

db2 update db cfg for sample using HADR_SYNCMODE NEARSYNC

db2 update db cfg for sample using HADR_TIMEOUT 10

db2 update db cfg for sample using HADR_PEER_WINDOW 120

db2 connect to sample

db2 quiesce database immediate force connections

db2 unquiesce database

db2 connect reset

6.6 修改备用服务器的HADR参数

db2 update db cfg for sample using HADR_LOCAL_HOST 130.30.3.136

db2 update db cfg for sample using HADR_LOCAL_SVC DB2_HADR_2

db2 update db cfg for sample using HADR_REMOTE_HOST 130.30.3.252

db2 update db cfg for sample using HADR_REMOTE_SVC DB2_HADR_1

db2 update db cfg for sample using HADR_REMOTE_INST db2inst1

db2 update db cfg for sample using HADR_SYNCMODE NEARSYNC

db2 update db cfg for sample using HADR_TIMEOUT 10

db2 update db cfg for sample using HADR_PEER_WINDOW 120

6.7 启动HADR

在备用服务器上启动HADR备用数据库

db2 deactivate database sample

db2 start hadr on database sample as standby

[db2inst1@linux2 db2bak]$ db2 deactivate database sample

SQL1496W Deactivate database is successful, but the database was not activated.

[db2inst1@linux2 db2bak]$ db2 start hadr on database sample as standby

DB20000I The START HADR ON DATABASE command completed successfully.

在主服务器上启动HADR主数据库

db2 deactivate database sample

db2 start hadr on database sample as primary

[db2inst1@linux1 db2bak]$ db2 deactivate database sample

SQL1496W Deactivate database is successful, but the database was not activated.

[db2inst1@linux1 db2bak]$ db2 start hadr on database sample as primary

DB20000I The START HADR ON DATABASE command completed successfully.

注意：

在启动 HADR 时，一定要先在备用服务器上启动 HADR 服务，然后在主服务器上启动。同样，在停止 HADR 时，要先在主服务器上停止服务，然后是备用服务器。

如果你先启动主数据库服务器HADR，那么你必须保证在HADR_TIMEOUT参数指定的时间内（单位为秒）启动备用数据库服务器HADR。否则将启动失败。

6.8 查看HADR运行状态

现在，已经完成了 HADR 设置，就应该检查它是否能够正常工作。

在主服务器linux1上执行以下命令：

db2 get snapshot for db on sample

输出如下：

HADR Status

Role = Primary

State = Peer

Synchronization mode = Nearsync

Connection status = Connected, 2011-03-09 11:01:55.144160

Peer window end = 2011-03-09 11:06:34.000000 (1299639994)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.252

Local service = DB2_HADR_1

Remote host = 130.30.3.136

Remote service = DB2_HADR_2

Remote instance = DB2INST1

timeout(seconds) = 3

Primary log position(file, page, LSN) = S0000000.LOG, 0, 00000000045F81C1

Standby log position(file, page, LSN) = S0000000.LOG, 0, 00000000045F81C1

Log gap running average(bytes) = 0

在主服务器linux1上执行以下命令：

db2 get snapshot for db on sample

输出如下：

HADR Status

Role = Standby

State = Peer

Synchronization mode = Nearsync

Connection status = Connected, 2011-03-09 11:01:54.978888

Peer window end = 2011-03-09 11:08:24.000000 (1299640104)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.136

Local service = DB2_HADR_2

Remote host = 130.30.3.252

Remote service = DB2_HADR_1

Remote instance = db2inst1

timeout(seconds) = 3

Primary log position(file, page, LSN) = S0000000.LOG, 0, 00000000045F81C1

Standby log position(file, page, LSN) = S0000000.LOG, 0, 00000000045F81C1

Log gap running average(bytes) = 0

也可以通过db2pd –db sample –hadr命令来查看HADR运行状态。

7 HADR接管测试

设置过程的最后一步是测试 HADR 的故障恢复功能。可以手工在备用服务器上执行普通接管命令

db2 takeover hadr on database sample

如果一般的接管不起作用，就需要指定 BY FORCE 选项，强迫 db2 切换到备用服务器上的HADR（如主数据库主机异常情况下启用备用服务器接管数据库）。

7.1 配置客户端重新路由

在主服务器linux1上，执行以下命令启用 HADR 的自动客户机重路由特性：

db2 update alternate server for database sample using hostname 130.30.3.136 port 60000

[db2inst1@linux1 db2bak]$ db2 update alternate server for database sample using hostname 130.30.3.136 port 60000

DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed

successfully.

DB21056W Directory changes may not be effective until the directory cache is

refreshed.

[db2inst1@linux1 db2bak]$ db2 list db directory

System Database Directory

Number of entries in the directory = 1

Database 1 entry:

Database alias = SAMPLE

Database name = SAMPLE

Local database directory = /db2home/db2inst1/sample

Database release level = d.00

Comment =

Directory entry type = Indirect

Catalog database partition number = 0

Alternate server hostname = 130.30.3.136

Alternate server port number = 60000

在备用服务器linux2上，执行以下命令启用 HADR 的自动客户机重路由特性：

db2 update alternate server for database sample using hostname 130.30.3.252 port 60000

[db2inst1@linux2 db2bak]$ db2 update alternate server for database sample using hostname 130.30.3.252 port 60000

DB20000I The UPDATE ALTERNATE SERVER FOR DATABASE command completed

successfully.

DB21056W Directory changes may not be effective until the directory cache is

refreshed.

[db2inst1@linux2 db2bak]$ db2 list db directory

System Database Directory

Number of entries in the directory = 1

Database 1 entry:

Database alias = SAMPLE

Database name = SAMPLE

Local database directory = /db2home/db2inst1

Database release level = d.00

Comment =

Directory entry type = Indirect

Catalog database partition number = 0

Alternate server hostname = 130.30.3.252

Alternate server port number = 60000

7.2 在客户端机器编目

db2 catalog tcpip node testhadr remote 130.30.3.252 server 60000

db2 catalog db sample as testhadr at node testhadr

7.3 备用服务器正常接管

1. 在主数据库上创建测试表

C:Documents and Settingsfengsh>db2 connect to testhadr user db2inst1 using db2inst1

db2 => CREATE TABLE HADRTEST(ID INTEGER NOT NULL WITH DEFAULT,NAME VARCHAR(10),PRIMARY KEY (ID))

DB20000I SQL命令成功完成。

db2 => INSERT INTO HADRTEST (ID,NAME) VALUES (1,'张三')

DB20000I SQL命令成功完成。

db2 => INSERT INTO HADRTEST (ID,NAME) VALUES (2,'李四')

DB20000I SQL命令成功完成。

db2 => select * from hadrtest

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

db2 =>

2. 在备用服务器上接管主数据库

[db2inst1@linux2 db2bak]$ db2 takeover hadr on database sample

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

获取快照发现，备用服务器上HADR已经接管，成为primay数据库

[db2inst1@linux2 db2bak]$ db2 get snapshot for db on sample|more

HADR Status

Role = Primary

State = Peer

Synchronization mode = Nearsync

Connection status = Connected, 2011-03-09 14:55:55.852957

Peer window end = 2011-03-09 15:01:08.000000 (1299654068)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.136

Local service = DB2_HADR_2

Remote host = 130.30.3.252

Remote service = DB2_HADR_1

Remote instance = db2inst1

timeout(seconds) = 3

Primary log position(file, page, LSN) = S0000001.LOG, 0, 00000000049DB3A1

Standby log position(file, page, LSN) = S0000001.LOG, 0, 00000000049DB3A1

Log gap running average(bytes) = 0

而对应原来主服务器上数据库成为standby数据库

[db2inst1@linux1 ~]$ db2 get snapshot for database on sample

HADR Status

Role = Standby

State = Peer

Synchronization mode = Nearsync

Connection status = Connected, 2011-03-09 14:55:56.222980

Peer window end = 2011-03-09 15:03:16.000000 (1299654196)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.252

Local service = DB2_HADR_1

Remote host = 130.30.3.136

Remote service = DB2_HADR_2

Remote instance = DB2INST1

timeout(seconds) = 3

Primary log position(file, page, LSN) = S0000001.LOG, 0, 00000000049DB3A1

Standby log position(file, page, LSN) = S0000001.LOG, 0, 00000000049DB3A1

Log gap running average(bytes) = 0

3. 客户端应用程序验证

客户端程序无法连原主数据库后，会自动重新路由连接到备用数据库，不用客户端重新连库。

C:Documents and Settingsfengsh>db2 "select * from hadrtest"

SQL30108N 连接失败，但是已经重新建立了连接。主机名或 IP 地址为

"130.30.3.136"，服务名称或端口号为

"60000"。可能会也可能不会重新尝试专用寄存器（原因码 = "1"）。 SQLSTATE=08506

C:Documents and Settingsfengsh>db2 "select * from hadrtest"

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

重新在原主服务器上接管数据库，恢复最初主备数据库服务器状态，验证客户程序的连接，也自动完成重新路由。

C:Documents and Settingsfengsh>db2 "select * from hadrtest"

SQL30108N 连接失败，但是已经重新建立了连接。主机名或 IP 地址为

"130.30.3.252"，服务名称或端口号为

"60000"。可能会也可能不会重新尝试专用寄存器（原因码 = "1"）。 SQLSTATE=08506

C:Documents and Settingsfengsh>db2 "select * from hadrtest"

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

7.4 主数据库异常时接管

在主数据库宕库，主数据库主机故障，网络故障等异常情况下导致主数据库问题是的接管测试。

1. 用 db2_kill 命令手工关闭主服务器

[db2inst1@linux1 db2dump]$ db2_kill

ipclean: Removing DB2 engine and client's IPC resources for db2inst1.

在备库服务器上执行db2 get snapshot for db on sample获取快照信息

HADR Status

Role = Standby

State = Disconnected peer

Synchronization mode = Nearsync

Connection status = Disconnected, 2011-03-10 16:54:28.400277

Peer window end = 2011-03-10 17:00:29.000000 (1299747629)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.136

Local service = DB2_HADR_2

Remote host = 130.30.3.252

Remote service = DB2_HADR_1

Remote instance = db2inst1

timeout(seconds) = 10

Primary log position(file, page, LSN) = S0000006.LOG, 59, 0000000005D860C4

Standby log position(file, page, LSN) = S0000006.LOG, 59, 0000000005D860C4

Log gap running average(bytes) = 0

2. 在备用服务器上接管HADR主数据库

[db2inst1@linux2 sample]$ db2 takeover hadr on database sample

SQL1770N Takeover HADR cannot complete. Reason code = "1".

[db2inst1@linux2 sample]$ db2 takeover hadr on database sample by force

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

备用机器HADR接管为primary数据库

HADR Status

Role = Primary

State = Disconnected

Synchronization mode = Nearsync

Connection status = Disconnected, 2011-03-10 16:54:28.400277

Peer window end = Null (0)

Peer window (seconds) = 120

Heartbeats missed = 0

Local host = 130.30.3.136

Local service = DB2_HADR_2

Remote host = 130.30.3.252

Remote service = DB2_HADR_1

Remote instance = db2inst1

timeout(seconds) = 10

Primary log position(file, page, LSN) = S0000006.LOG, 59, 0000000005D860C4

Standby log position(file, page, LSN) = S0000000.LOG, 0, 0000000000000000

Log gap running average(bytes) = 0

3. 客户端应用程序验证

客户端程序无法连原主数据库后，会自动重新路由连接到备用数据库，不用客户端重新连库。

C:Documents and Settingsfengsh>db2 select * from hadrtest

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

C:Documents and Settingsfengsh>db2 select * from hadrtest

SQL30108N 连接失败，但是已经重新建立了连接。主机名或 IP 地址为

"130.30.3.136"，服务名称或端口号为

"60000"。可能会也可能不会重新尝试专用寄存器（原因码 = "1"）。 SQLSTATE=08506

C:Documents and Settingsfengsh>db2 select * from hadrtest

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

C:Documents and Settingsfengsh>db2 select * from hadrtest

SQL30108N 连接失败，但是已经重新建立了连接。主机名或 IP 地址为

"130.30.3.252"，服务名称或端口号为

"60000"。可能会也可能不会重新尝试专用寄存器（原因码 = "1"）。 SQLSTATE=08506

C:Documents and Settingsfengsh>db2 select * from hadrtest

ID NAME

----------- ----------

1 张三

2 李四

2 条记录已选择。

7.5 原主服务器恢复接管

在主数据库宕库，主数据库主机故障，网络故障等异常情况下导致HADR数据库故障接管；待主服务器恢复后，重新进行hadr主数据库接管，恢复最初主备状态。

1. 启动实例

[db2inst1@linux1 db2dump]$ db2start

03/10/2011 17:05:54 0 0 SQL1063N DB2START processing was successful.

SQL1063N DB2START processing was successful.

2. 启动hadr到standby数据库

[db2inst1@linux1 db2dump]$ db2 start hadr on database sample as standby

DB20000I The START HADR ON DATABASE command completed successfully.

3. 重新接管恢复为primary数据库

[db2inst1@linux1 db2dump]$ db2 takeover hadr on db sample

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

4．客户端应用程序验证

客户端程序自动重新路由连接到主服务器的primary数据库。

8 其他问题 8.1 HADR设计限制

1. 仅在 DB2 UDB 企业服务器版本（ESE）上支持 HADR。但是，当 ESE 上有多个数据库分区时，不支持 HADR。

2. 主数据库和备用数据库主机的操作系统、db2软件的版本，级别必须相同，交替卷动升级过程中较短时间除外。

3. 主数据库和备用数据库上的 DB2 UDB 发行版必须具有相同的位大小（32 位或 64 位）。

4. 日志要使用独立的、高性能的磁盘或文件系统归档日志的位置，主从节点必须都能访问的到

5. HADR通讯最好使用独立的网络

6. 考虑使用多块网卡

7. 考虑提供一个虚拟IP访问数据库服务器

8. 考虑使用客户端自动路由――虚拟IP和客户端自动路由二选一即可，不可同时使用。

客户端自动路由与虚拟IP相比，它是DB2软件集成的，不需其他硬件或软件支持。

9. 选择合适的hadr_syncmode模式

10. 在V9.7前版本不支持备用数据库上的读操作，客户机无法与备用数据库连接，V9.7支持备用数据库查询模式打开。

8.2 基本维护操作 8.2.1 启动HADR

1. 在备用服务器上启动备用数据库上的HADR

db2 deactivate database sample

db2 start hadr on database sample as standby

2. 在主服务器上启动主数据库上的HADR

db2 deactivate database sample

db2 start hadr on database sample as primary

8.2.2 停止HADR

1. 在主服务器上停止主数据库的HADR

db2 deactivate database sample

db2 stop hadr on database sample

2. 在备用服务器上停止备用数据库的HADR

db2 deactivate database sample

db2 stop hadr on database sample

8.2.3 HADR接管

在备用服务器上执行数据库普通接管命令

db2 takeover hadr on database sample

如果普通接管失败，或者主数据库故障时，需执行强制接管命令

db2 takeover hadr on database sample by force

8.3 HADR复制的操作

以下操作将会通过HADR从主数据库复制到standby库

Data Definition Language (DDL):

– Includes create and drop table, index, and more

Data Manipulation Language (DML):

– Insert, update, or delete

Buffer pool:

– Create and drop

Table space:

– Create, drop, or alter.

– Container sizes, file types (raw device or file system), and paths must be identical on the primary and the standby.

– Table space types (DMS or SMS) must be identical on both the servers.

– If the database is enabled for automatic storage, then the storage paths must be identical.

Online REORG:

– Replicated. The standby’s Peer state with the primary is rarely impacted

by this, although it can impact the system because of the increase in the number of log records generated.

Offline REORG:

- Replicated.

Stored procedures and user defined functions (UDF):

– Creation statements are replicated.

– HADR does not replicate the external code for stored procedures or UDF object library files. In order to keep these in synchronization between the primary and the standby, it is the user’s responsibility to set these up on identical paths on both the primary and the standby servers. If the paths are not identical, invocation fails on the standby.

8.4 HADR不复制的操作

以下操作HADR不会复制standby数据库：

_ NOT LOGGED INITIALLY tables:

– Tables created or altered with the NOT LOGGED INITIALLY option

specified are not replicated. The tables are invalid on the standby and

after a takeover, attempts to access these tables result in an error.

_ BLOBs and CLOBs:

– Non-logged Large Objects (LOBs) are not replicated, but the space is

allocated and LOB data is set to binary zeroes on the standby.

_ Data links:

– NOT SUPPORTED in HADR.

_ Database configuration changes:

– UPDATE DB CFG is not replicated.

– Dynamic database configuration parameters can be updated to both the

primary and the standby without interruption. For the database

configuration parameters that need a recycle of the database we

recommend you follow the rolling upgrade steps. See 7.1, “DB2 fix pack

rolling upgrades” on page 177.

_ Recovery history file:

– NOT automatically shipped to standby.

– You can place an initial copy of the history file by using a primary backup

image to restore the history file to the standby using the RESTORE

command with the REPLACE HISTORY FILE option for example:

db2 restore database sample replace history file

– You can also update the history file on the standby from a primary backup

image by the RESTORE command for example:

db2 restore database sample history file

– If a takeover operation occurs and the standby database has an

up-to-date history file, backup and restore operations on the new primary

generate new records in the history file and blend seamlessly with the

records generated on the original primary. If the history file is out-of-date

or has missing entries, an automatic incremental restore might not be

possible; instead, a manual incremental restore operation is required.

We think that the recovery history file for the primary database is not

relevant to the standby database, because it contains DB2 recovery

information specific to that primary server; DB2 keeps records of which

backup file to use for recovery to prior points in time, among other things.

102 High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows

In most HADR implementations, the standby system is only used in the

primary mode as a temporary measure, while the old primary server is

being fixed, making the primary version of the recovery history file

irrelevant; also, because of potentially different log file numbers and

locations between the primary and the standby, incorrect for use on the

standby server.

8.5 参考资料

1. IBM 红皮书：Linux、UNIX 和 Windows 上的 DB2 高可用性和灾难恢复选项

http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg247363.html

2. 针对 Linux、UNIX 和 Windows 的 IBM DB2 9.5 和 DB2 9.7 网上信息中心

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp

http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

style office

著作权归作者所有

如果觉得我的文章对您有用，请点赞。您的支持将鼓励我继续创作！

添加新评论4 条评论

cherry_zhao数据库管理员pingtech
2013-05-31 15:23

咨询一个问题，看到您用的是nfs文件系统共享的目录存放归档文件。但如果当252这台服务器坏掉后，此目录不能访问，那日志归档怎么办？

cherry_zhao数据库管理员pingtech
2013-05-30 15:25

非常详细，辛苦了！

2011-11-09 18:15

非常好的资料！

study123系统架构师ERICSSON
2011-05-11 12:23

谢谢分享！
很值得学习！

Ctrl+Enter 发表

匿名评论

IBM DB2高可用性实施—高可用性灾难恢复（HADR）

添加新评论4 条评论

作者其他文章

相关文章

相关问题