您的位置：首页 > Web前端

11.2.0.2 后crs 新节点隔离机制（IO fencing）

2015-11-06 16:47 405 查看

An important service provided by Oracle Clusterware is node fencing. Node fencing is a technique

used by clustered environments to evict nonresponsive or malfunctioning hosts from the cluster.

Allowing affected nodes to remain in the cluster increases the probability of data corruption due to unsynchronized database writes.

Traditionally, Oracle Clusterware uses a STONITH (Shoot The Other Node In The Head)

comparable fencing algorithm to ensure data integr ity in cases, in which cluster integrity is

endangered and split-brain scenarios need to be prevented. For Oracle Clusterware this means

that a local process enforces the removal of one or more nodes from the cluster (fencing). This

approach traditionally involved a forced “fast” reboot of the offending node. A fast reboot is a

shutdown and restart procedure that does not wait for any I/O to finish or for file systems to

synchronize on shutdown. Starting with Oracle Clusterware 11 g Release 2 (11.2.0.2), this

mechanism has been changed to prevent such a reboot as much as possible by introducing

rebootless node fencing.

Now, when a decision is made to evict a node from the cluster, Oracle Clusterware will first

attempt to shut down all resources on the machine that was chosen to be the subject of an

eviction. Specifically, I/O generating processes ar e killed and Oracle Clusterware ensures that
those processes are completely st pped before continuing . If all resources can be stopped and all I/O generating processes can be killed, Oracle Clusterware

will shut itself down on the respective node, but will attempt to restart after the stack has been stopped.

If, for some reason, not all resources can be stopped or I/O generating processes cannot be stopped completely, Oracle Clusterware will still perform a reboot

STONITH：先尝试关闭集群，如果遇到异常无法关闭集群资源时，oracle crs node fencing 机制就会升级：直接节点重启;新算法的核心是reboot less；

个人认为在出现集群节点异常时，直接重启节点，可能会存在数据丢失，数据一致性被破坏，在对oracle cssdmonitor 跟踪分析时发现节点的重启是通过 /proc/sysrq-trigger来完成的；

SysRq 经常被称为 Magic System Request，它被定义为一系列按键组合。之所以说它神奇，是因为它在系统挂起，大多数服务已无法响应的情况下，还能通过按键组合来完成一系列预先定义的系统操作。通过它，不但可以在保证磁盘数据安全的情况下重启一台挂起的服务器，避免数据丢失和重启后长时间的文件系统检查，还可以收集包括系统内存使用，CPU
任务处理，进程运行状态等系统运行信息，甚至还可能在无需重启的情况下挽回一台已经停止响应的服务器。

但是具体据不知道oracle crs 是通过那个参数来关机的啦

B - 立即重启系统

SysRq: Resetting

该操作会立即重启系统，比想象中要快。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航