Oracle DataGuard 学习之 DataGuard FailOver案例
日期:2015-03-10点击次数:1511
Oracle DG(Dataguard)是目前比较常见的数据库HA配置策略。通过实现Physical Standby和Logical Standby,可以实现数据冗余容错机制。防止在主库出现严重故障,不能支持服务的时候,没有快速的后备支持环境。
在DG中,switchover和failover是两个重要的概念,也是DG实现的核心。两者共同点都是Primary和Standby角色切换,差异在于Planned和UnPlanned之分。Switchover关键点在于Planned,这个切换动作是在运维机构规划范围内的动作。比如,进行定期系统软硬件升级、设备维修等动作。而Failover是真正出现严重系统故障,如数据库宕机、软硬件故障导致的Primary不能支持服务,从而进行的切换动作。
根据不同的DG配置,switchover和failover也是有差异的。理论上,Switchover是不会造成数据丢失的,Primary在切换之后也是在DG配置环境中,作为Standby存在的。但是Failover则不同,除了运行在最大保护(Maximum Protection)模式下,Primary突发的故障可能引起一部分Redo Log不能及时的传递到Standby端,切换之后很可能有数据损失的情况。更重要的是,Primary端在发生Failover之后,是不能够直接加入回DG配置的!也就是说,Failover之后,Primary实际上就是被“抛出”了DG环境。
那么,有什么方法实现Primary回到原有的环境呢?这个问题的困难在于保持Primary和Standby一致。在正常情况下,Primary和Standby之间是关联同步的,即使发生了Switchover,也在可控情况下。Failover过程中有数据的缺失,还有Primary修复问题。在目前流行版本(11g)中,有三个方法:
-
环境重建:一种最简单的方法就是直接删除原来的Primary库,引用DG重建方法,重新搭建Standby端;
-
RMAN备份恢复:如果Primary端保留过一份Failover之前的备份,则可以强制原来的Primary端恢复到进行Failover的时间点,之后作为Standby接收当前Primary的redo log传递,应用后可以跟上进度;
-
Flashback Database恢复:Flashback技术是作为传统备份还原技术的补充,提供了更加便捷的恢复策略。使用flashback,可以将数据库恢复到failover之前的时间点。之后的过程和RMAN备份恢复策略相同;
案例分析:
一、在主库端模拟数据库意外宕机
-
7scott@bjdb>conn /as sysdba
-
Connected.
-
sys@bjdb>alter system switch logfile;
-
System altered.
-
sys@bjdb>shutdown abort
-
ORACLE instance shut down.
二、在备库端
1、查看切换信息
-
5sys@shdb>select name,database_role,switchover_status from v$database;
-
NAME DATABASE_ROLE SWITCHOVER_STATUS
-
--------- ---------------- --------------------
-
TESTDB12 PHYSICAL STANDBY NOT ALLOWED
-
可以看到此时备库处于无法切换状态
2、直接切换
-
sys@shdb>alter database commit to switchover to primary;
-
alert_log:(告警日志)
-
Fatal NI connect error 12514, connecting to:
-
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=shsrv)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=shdb)(CID=(PROGRAM=oracle)(HOST=bjsrv)(USER=oracle))))
-
VERSION INFORMATION:
-
TNS for Linux: Version 11.2.0.3.0 - Production
-
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
-
Time: 04-MAR-2015 21:25:13
-
Tracing not turned on.
-
Tns error struct:
-
ns main err code: 12564
-
TNS-12564: TNS:connection refused
-
ns secondary err code: 0
-
nt main err code: 0
-
nt secondary err code: 0
-
nt OS err code: 0
-
Error 12514 received logging on to the standby
-
FAL[client, MRP0]: Error 12514 connecting to shdb for fetching gap sequence
-
Wed Mar 04 21:26:00 2015
-
alter database commit to switchover to primary
-
ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
-
Maximum wait for role transition is 15 minutes.
-
Switchover: Media recovery is still active
-
Database not available for switchover
-
End-Of-REDO archived log file has not been recovered
-
Database not available for switchover
-
End-Of-REDO archived log file has not been recovered
-
Database not available for switchover
3、关闭standby MPR进程
-
35sys@shdb>ALTER DATABASE RECOVER managed standby database finish;
-
ALTER DATABASE RECOVER managed standby database finish
-
Terminal Recovery: request posted (TestDB12)
-
Wed Mar 04 21:34:34 2015
-
Begin: Standby Redo Logfile archival
-
End: Standby Redo Logfile archival
-
Terminal Recovery timestamp is '03/04/2015 21:34:34'
-
Terminal Recovery: applying standby redo logs.
-
Terminal Recovery: thread 1 seq# 34 redo required
-
Media Recovery Waiting for thread 1 sequence 34
-
Terminal Recovery: End-Of-Redo log allocation
-
Terminal Recovery: standby redo logfile 4 created '/dsk4/arch_bj/arch_1_0_820054583.log'
-
This standby redo logfile is being created as part of the
-
failover operation. This standby redo logfile should be
-
deleted after the switchover to primary operation completes.
-
Media Recovery Log /dsk4/arch_bj/arch_1_0_820054583.log
-
Terminal Recovery: log 4 reserved for thread 1 sequence 34
-
Recovery of Online Redo Log: Thread 1 Group 4 Seq 34 Reading mem 0
-
Mem# 0: /dsk4/arch_bj/arch_1_0_820054583.log
-
Identified End-Of-Redo (failover) for thread 1 sequence 34 at SCN 0xffff.ffffffff
-
Incomplete Recovery applied until change 1234252 time 03/04/2015 21:23:43
-
MRP0: Media Recovery Complete (TestDB12)
-
Terminal Recovery: successful completion
-
Wed Mar 04 21:34:35 2015
-
ARCH: Archival stopped, error occurred. Will continue retrying
-
ORACLE Instance TestDB12 - Archival Error
-
ORA-16014: log 4 sequence# 34 not archived, no available destinations
-
ORA-00312: online log 4 thread 1: '/dsk4/arch_bj/arch_1_0_820054583.log'
-
Forcing ARSCN to IRSCN for TR 0:1234252
-
Attempt to set limbo arscn 0:1234252 irscn 0:1234252
-
Resetting standby activation ID 2865247982 (0xaac836ee)
-
MRP0: Background Media Recovery process shutdown (TestDB12)
-
Terminal Recovery: completion detected (TestDB12)
-
Completed: ALTER DATABASE RECOVER managed standby database finish
4、切换数据库到Primary
-
sys@shdb>select status from v$instance;
-
STATUS
-
------------
-
OPEN
-
sys@shdb>select name,database_role,switchover_status from v$database;
-
NAME DATABASE_ROLE SWITCHOVER_STATUS
-
--------- ---------------- --------------------
-
TESTDB12 PHYSICAL STANDBY TO PRIMARY
-
sys@shdb>alter database commit to switchover to primary;
-
Database altered.
-
sys@shdb>alter database open;
-
Database altered.
-
告警日志:
-
alter database commit to switchover to primary
-
ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
-
Maximum wait for role transition is 15 minutes.
-
All dispatchers and shared servers shutdown
-
CLOSE: killing server sessions.
-
CLOSE: all sessions shutdown successfully.
-
Wed Mar 04 21:35:47 2015
-
SMON: disabling cache recovery
-
Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/bjdb/TestDB12/trace/TestDB12_ora_3146.trc
-
Standby terminal recovery start SCN: 1234251
-
RESETLOGS after incomplete recovery UNTIL CHANGE 1234252
-
Online log /dsk2/oradata/bjdb/redo01b.log: Thread 1 Group 1 was previously cleared
-
Online log /dsk1/oradata/bjdb/redo01a.log: Thread 1 Group 1 was previously cleared
-
Online log /dsk2/oradata/bjdb/redo02b.log: Thread 1 Group 2 was previously cleared
-
Online log /dsk1/oradata/bjdb/redo02a.log: Thread 1 Group 2 was previously cleared
-
Online log /dsk2/oradata/bjdb/redo03b.log: Thread 1 Group 3 was previously cleared
-
Online log /dsk1/oradata/bjdb/redo03a.log: Thread 1 Group 3 was previously cleared
-
Standby became primary SCN: 1234250
-
Wed Mar 04 21:35:47 2015
-
Setting recovery target incarnation to 3
-
AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
-
Switchover: Complete - Database mounted as primary
-
Completed: alter database commit to switchover to primary
三、原主库修复后,开机
-
sys@bjdb>startup
-
ORACLE instance started.
-
Total System Global Area 442601472 bytes
-
Fixed Size 2229184 bytes
-
Variable Size 281021504 bytes
-
Database Buffers 155189248 bytes
-
Redo Buffers 4161536 bytes
-
Database mounted.
-
Database opened.
-
sys@bjdb>select name,database_role,switchover_status from v$database;
-
NAME DATABASE_ROLE SWITCHOVER_STATUS
-
--------- ---------------- --------------------
-
TESTDB12 PRIMARY FAILED DESTINATION
现在原来的主库被修复后,整个DataGuara架构已经被破坏了,所以必须把原来的主库构建成新的备库,重新恢复DataGuard的环境。
四、重新构建DataGuard
-
1sys@bjdb>select name,database_role from v$database;
NAME DATABASE_ROLE
-------------------------------------------------- ----------------
TESTDB12 PHYSICAL STANDBY (文/天涯客的blog 51CTO)