您的位置:首页 > 产品设计 > UI/UE

RabbitMQ(五)高可用 -- Highly Available Queues

2014-04-08 11:06 295 查看
为了方便工作中使用,对 RabbitMQ 的【高可用】相关文档进行了翻译,鉴于自己水平有限,翻译中难免有纰漏产生,如果疑问,欢迎指出探讨。此文以中英对照方式呈现。

官方原文:http://www.rabbitmq.com/ha.html

转载出处:http://my.oschina.net/moooofly/blog/94113

============== 我是分割线 ===============


Highly Available Queues

高可用 queue

If your RabbitMQ broker consists of a single node, then a failure of that node will cause downtime, temporary unavailability of service, and potentially loss of messages (especially
non-persistent messages held by non-durable queues). You could publish all messages persistent, todurable queues, but even then, due tobuffering there is an amount of time between the message being sent and the message being written todisk and fsync'd.
Using publisher confirms is one means toensure the client understands which messages have been written todisk, but even so, you may not wish tosuffer the downtime and inconvenience of the unavailability of service caused by a node failure, or the performance
degradation of having towrite every message todisk.
如果你的 RabbitMQ broker 只是由单独一个 node 构成,那么该 node 的失效将导致整个服务临时性的不可用,并且可能会导致 message 丢失(尤其是在非持久化 message 存储于非持久化的 queue 中的时候)。你当然可以将所有 publish
的 message 都设置为持久化的,并且使用持久化的 queue ,但是你仍然无法避免 由于
buffering 导致的问题 :因为在 message
被发出后 和被 写入磁盘 并 fsync之间存在一个虽然短暂但是会产生问题的时间窗。通过
[ publisher
confirms ]
机制能够确保客户端知道哪些消息已经写入磁盘,尽管如此,你一定仍不希望遇到因为单点故障导致的服务器停用,进而导致服务不可用的尴尬局面,同样,你也一定不喜欢将每一条 message 都写入磁盘导致的服务器性能退化。

You could use a cluster of RabbitMQ nodes toconstruct your RabbitMQ broker. This will be resilient tothe loss of individual nodes in terms of the overall availability of
service, but some important caveats apply: whilst exchanges and bindings survive the loss of individual nodes, queues and their messages do not. This is because a queue and its contents reside on exactly one node, thus the loss of a node will render its queues
unavailable.
你可以使用 RabbitMQ node 构成的 [ cluster ]
来构建属于你的 RabbitMQ broker 。如此做,从服务的整体可用性上来讲,该 cluster 对于单独 node 的失效是具有弹性的,但是同时存在一些 需要重点注意的点 :尽管
exchange 和 binding 能够在单点失效的问题上幸免于难,但是 queue 和其上持有的 messgage 却不行。这是 因为
queue 及其内容仅仅贮存于单个 node 之上,所以一个 node 的失效将表现为其对应的 queue 的不可用。

You could use an active/passive pair of nodes such that should one node fail, the passive node will be able tocome up and take over from the failed node. This can even be
combined with clustering. Whilst this approach ensures that failures are quickly detected and recovered from, there can be reasons why the passive node can take a long time tostart up, or potentially even fail tostart. This can cause at best, temporary unavailability
of queues which were located on the failed node.
你可以使用 [ active/passive ]
形式的 node 对,一旦 active node 失效了,passive node 将会启用并从失效 node 处接管工作。这种方式甚至可以同 cluster 应用结合起来使用。尽管该方式可以确保对失效问题的快速检测和恢复,但是仍然会 存在问题 :如
passive node 会花费很长时间才能完全启动起来或者甚至根本启动不起来。这在最坏情况下会导致存在于失效 node 上的 queue 的临时不可用状态。

To solve these various problems, we have developed active/active high availability for queues. This works by allowing queues tobe mirrored on other nodes within a RabbitMQ
cluster. The result is that should one node of a cluster fail, the queue can automatically switch toone of the mirrors and continue tooperate, with no unavailability of service. This solution still requires a RabbitMQ cluster, which means that it will not
cope seamlessly with network partitions within the cluster and, for that reason, is not recommended for use across a WAN (though of course, clients can still connect from as near and as far as needed).
为了解决上述各种问题,我们开发了 active/active
HA queue 。从原理上讲,是 采用将
queue 镜像到 cluster 中的其他 node 上的方式实现 的。在该实现下,如果
cluster 中的一个 node 失效了,queue 能够自动地切换到镜像 queue 中的一个继续工作以保证服务的可用性。该解决方案仍然要求使用 RabbitMQ cluster ,这也意味着其无法在 cluster 内无缝地处理 network
partition 问题。因此,不推荐跨
WAN 使用(虽然如此,客户端当然可以从远端或者近端进行连接)。


Mirrored Queue Behaviour

镜像 queue 的行为

In normal operation, for each mirrored-queue, there is one master and several slaves, each on a different node. The slaves apply the operations that occur tothe master in
exactly the same order as the master and thus maintain the same state. All actions other than publishes go only tothe master, and the master then broadcasts the effect of the actions tothe slaves. Thus clients consuming from a mirrored queue are in fact
consuming from the master.
在通常的用法中,针对 每一个
mirrored-queue 都包含一个 master 和多个 slave , 分别对应于不同的
node 。 slave
会准确地按照 master 执行命令的顺序进行命令执行,故 slave 与 master 上维护的状态应该是相同的。 除
publish 外的所有动作都只会向 master 发送,然后再由 master 将命令执行的结果广播给 slave 们 ,故看似从
mirrored queue 中 consuming 的客户端实际上是从 master 上进行的 consuming 。

Should a slave fail, there is little tobe done other than some bookkeeping: the master remains the master and no client need take any action or be informed of the failure. Note
that slave failures may not be detected immediately and the interruption of the per-connection flow control mechanism can delay message publication. The details are described here.
如果某个 slave 失效 了,系统除了做些许记录外几乎啥都不做:master
仍旧是 master ,客户端不需要采取任何行动,或者被通知 slave 的失效。值得注意的是,slave 的失效可能不会被立即检测出来,并且以每个连接为作用对象的 flow control 机制的中止,将导致 message 被延迟发送。细节在这里描述。

If the master fails, then one of the slaves must be promoted. At this point, the following happens:
如果 master 失效 了,那么
slave 中的一个必须被提升为 master ,在这种情况下,将发生下面的事情:

A slave is promoted tobecome the new master. The slave chosen for promotion is the eldest slave. As such, it has the best chance of being synchronised with the master. However,
note that should there be no slave that is synchronised with the master, messages that only the master held will be lost.
某一个
slave 会被提升为新的 master 。 被选中作为新的
master 的 slave 通常是看哪个 slave 最老(这个也论资排辈!~),因为最老的 slave 与前任 master 之间的同步状态应该是最好的(估计一起吃过不少饭喝过不少酒)。然而,需要注意的是,如果存在没有任何一个 slave 与 master 进行过 [ synchronised ]的情况(新官上任没干多就
master 就挂了),那么前任 master “私有的” message 将会丢失。

The slave considers all previous consumers tohave been abruptly disconnected. As such, it requeues all messages that have been delivered toclients but are pending acknowledgement.
This can include messages for which a client has issued acknowledgements: either the acknowledgement was lost on the wire before reaching the master, or it was lost during broadcast from the master tothe slaves. In either case, the new master has no choice
but torequeue all messages it thinks have not been acknowledged.
slave
会认为所有之前有联系的 consumer 都被突然地(粗鲁地)断开了。 在这种情况下,slave
(根据上下文补充翻译:被提升为 master 的那个家伙)会 requeue 所有已经投递给客户端但尚未收到 acknowledgement 的 message 。这类 message 中将包括那些客户端已发送过 acknowledgement 进行确认的消息:或者因为 acknowledgement 抵达 master 之前其丢失了,或者因为 acknowledgement 在 master 向 slave 进行广播的时候丢失了。无论是上述哪种情况,新 master 除了将其认为尚未 acknowledged
的消息进行 requeue 外没有更好的处理办法。

Clients that were consuming from the mirrored-queue and support our Consumer Cancellation Notifications extension will receive a notification that their subscription tothe
mirrored-queue has been abruptly cancelled. At this point they should re-consume from the queue, which will pick up the new master. The reason for sending this notification is that informing clients of the loss of the master is essential: otherwise the client
may continue toissue acknowledgements for messages they were sent by the old, failed master, and not expect that they might be about tosee the same messages again, this time sent by the new master. Of course, clients that were connected tothe failed node
will find their connections failed, and will need toreconnect toa surviving node of the cluster.
从 mirrored-queue
处 consume 消息的客户端如果支持我们提供的 [ Consumer
Cancellation Notifications ]机制,将可以收到关于他们对
mirrored-queue 的订阅被突然地(粗鲁地)取消掉的通知。在这种情况下,他们应该对
queue 执行 re-consume ,此时将会从新任 master 处获取消息。发送该通知的原因是:通知客户端“master 已经失效”这个结果是必要的;否则客户端可能继续对之前过时的、已经失效的前任 master 发送确认消息,并期望能够再次从前任 master 处收 message (此处为本人按照自己的理解进行解读),然后之后的 message 将是从新的 master 处发来。可以确定的是,连接到失效 node 上的客户端必然会发现之前的连接已经断开,之后也必然需要重新连接到 cluster 中的其他存活的
node 上。

As a result of the requeuing, clients that re-consume from the queue must be aware that they are likely tosubsequently receive messages that they have seen previously.
作为 requeue
的结果,从 queue 中 re-consume 的客户端必须意识到自己非常可能在随后的交互过程中收到自己之前已经收到过的消息。

As the chosen slave becomes the master, no messages that are published tothe mirrored-queue during this time will be lost: messages published toa mirrored-queue are always
published directly tothe master and all slaves. Thus should the master fail, the messages continue tobe sent tothe slaves and will be added tothe queue once the promotion of a slave tothe master completes.
一旦完成了选中的 slave 被 提升成
master 的动作,发送到 mirrored-queue 的所有 message 将不会再丢失:publish 到 mirrored-queue 的所有消息总是被直接 publish 到 master 和所有的 slave 上。这样一旦 master 失效了,message 仍然可以继续发送到其他 slave 上,并且在新 slave 被提升为 master 之后,将这些 message 添加到(该 master 所在的) queue 中。

Similarly, messages published by clients using publisher confirms will still be confirmed correctly even if the master (or any slaves) fail between the message being published
and the message being able tobe confirmed tothe publisher. Thus from the point of view of the publisher, publishing toa mirrored-queue is no different from publishing toany other sort of queue. It is only consumers that need tobe aware of the possibility
of needing tore-consume from a mirrored-queue upon receipt of a Consumer Cancellation Notification.
同样地, 如果客户端使用了
[ publisher
confirm ]
机制,即使“在 message 被 publish 后和 message 被确认前”的期间,出现 master(或者任何 slave)失效的情况,其所 publish 的 message 仍旧可以被正确无误地确认。 故从
publisher 的角度来看,将消息 publish 到 mirrored-queue 与 publish 到任何种类的 queue 中没有任何差别。只有 consumer 需要意识到当收到 [ Consumer
Cancellation Notification ]
时,自己可能需要再次从 mirrored-queue 中 re-consume 。

If you are consuming from a mirrored-queue with noAck=true (i.e. the client is not sending message acknowledgements) then messages can be lost. This is no different from the
norm of course: the broker considers a message acknowledged as soon as it has been sent toa noAck=true consumer, and should the client disconnect abruptly, the message may never be received. In the case of a mirrored-queue, should the master die, messages
that are in-flight on their way tonoAck=true consumers may never be received by those clients, and will not be requeued by the new master. Because of the possibility that the consuming client is connected toa node that survives, the Consumer Cancellation
Notification is useful in identifying when such events may have occurred. Of course, in practise, if you care about not losing messages then you are advised toconsume with noAck=false.
如果你使用 noAck=true 属性从
mirrored-queue 中 consume message(即客户端不发送 message 确认),则消息存在丢失的可能。这个和标准情形没有任何差别: broker
认为 message 一旦向具有noAck=true 属性的 consumer
执行了发送行为,broker 就认为该消息已经被确认了。 此时如果客户端突然地断开了,message
将会丢失(假设客户端此时尚未收到该 message)。在采用 mirrored-queue 的情况下,如果 master 失效了,那些仍处于发送给具有 noAck=true 属性的 consumer 路上的 message 将不会被这些客户端接收到,并且不会被新任 master 执行 requeue 操作。因为有可能处于 consuming 状态的客户端是与存活着的 node 连接着的,此时可以采用 [ Consumer
Cancellation Notification ]
机制在此类事件发生时用于进行相关处理。当然,在实际中,如果你比较关心丢失 message 的问题,则建议你在 consume 时使用 noAck=false 。


Publisher Confirms and Transactions

发布者确认机制和事务机制

Mirrored queues support both Publisher Confirms and Transactions. The semantics chosen are that in the case of both confirms and transactions, the action spans all mirrors
of the queue. So in the case of a transaction, a tx.commit-ok will only be returned toa client when the transaction has been applied across all mirrors of the queue. Equally, in the case of publisher confirms, a message will only be confirmed tothe publisher
when it has been accepted by all of the mirrors. It is correct tothink of the semantics as being the same as a message being routed tomultiple normal queues, and of a transaction with publications within that similarly are routed tomultiple queues.

Mirrored queue 同时支持 [Publisher
Confirm] 和 [Transaction]
两种机制。在两种机制中进行选择的依据是,其在 queue 的全部镜像中产生波及的范围。在
Transaction 机制中,只有在当前事务在全部镜像 queue 中执行后,客户端才会在收到 tx.commit-ok 消息。同样地,在
publisher confirm 机制中,向 publisher 进行当前 message 确认的前提是该 message 被全部镜像所 accept 了。你可以按照如下语义对上述机制进行理解:即 message 被路由到多个普通的 queue 中,更进一步,在带有 publish 的事务中也同样会路由到多个
queue 中。


Flow Control

流控

RabbitMQ uses a credit-based algorithm tolimit
the rate of message publication. Publishers are permitted topublish when they receive credit from all mirrors of a queue. Credit in this context means permission
topublish. Slaves that fail toissue credit can cause publishers tostall. Publishers will remain stalled until all slaves issue credit or until the remaining nodes consider the slave tobe disconnected from the cluster. Erlang detects such disconnections
by periodically sending a tick toall nodes. The tick interval can be controlled with the net_ticktime configuration
setting.
RabbitMQ 使用了一种基于 credit 的算法来 限制
message 被 publish 的速率 。Publisher
只有在其从某个 queue 的 全部镜像处收到
credit 之后才被允许继续 publish 。在这个上下文中,Credit 意味着对 publish 行为的允许。如果存在没能成功发送出 credit 的 slaves ,则将导致 publisher 停止 publish 动作。Publisher 会一直保持停止的状态,直到所有 slave 都成功发送了 credit 或者直到剩余的 node 都认为某 slave 已经从 cluster 中断开了。Erlang 会周期性地发送 tick 到所有的 node 上来检测是否出现连接断开。 tick
的间隔时间可以通过配置 net_ticktime 的值来控制。


Unsynchronised Slaves

非同步 Slave

A node may join a cluster at any time. Depending on the configuration of a queue, when a node joins a cluster, queues may add a slave on the new node. At this point, the new
slave will be empty: it will not contain any existing contents of the queue, and currently, there is no synchronisation protocol. Such a slave will receive new messages published tothe queue, and thus over time will accurately represent the tail of the mirrored-queue.
As messages are drained from the mirrored-queue, the size of the head of the queue for which the new slave is missing messages, will shrink until eventually the slave's contents precisely match the master's contents. At this point, the slave can be considered
fully synchronised, but it is important tonote that this has occured because of actions of clients in terms of draining the pre-existing head of the queue.
一个 node 可以在任意时刻加入到一个 cluster 中 。按照
queue 的自身配置信息,当一个 node 加入到一个 cluster 中时,可能将当前 node 设置成 slave 。如果是这样,新增 slave 上(的 queue)将会是空的:其不会包含任何当前(cluster 中) queue 上业已存在的内容,且当前也没有任何同步协议可用。新增 slave 将可以收到 publish 到其对应 queue 的新 message ,并且在一定的运行时间后,(其 queue 中的内容)将可以准确呈现(当前 cluster 中的) mirrored-queue 的尾部
message 的“面貌”。随着(当前 cluster 中) mirrored-queue 上的 message 被逐渐 consume ,新增 slave 之前“错失”的 message 数量(以 queue 头上的 size 表示 - 即 message 的多少)将会逐步缩减,直到 slave 的内容与 master 的内容 最终 完全变成一致。此时,我们可以认为
slave 已经处于完全同步状态了,需要注意的是,上述同步行为的产生是基于客户端的动作触发,即其会逐步消耗光 queue 中业已存在的 message 。(这段译文实在是费了我很多心思,前后修改了3遍,最后译稿自己算是比较满意了,嘿嘿)

Thus a newly added slave provides no additional form of redundancy or availability of the queue's contents until the contents of the queue
that existed before the slave was added have been removed. As a result of this, it is preferable tobring up all nodes on which slaves
will exist prior tocreating mirrored queues, or even better toensure that your use of messaging generally results in very short or empty queues that rapidly drain. or
the queue has been explicitly synchronised (see below). Since the queue becomes unresponsive while synchronisation is occurring, it is preferable toallow active queues from which messages are being drained tosynchronise naturally, and only explicitly synchronise
inactive queues.
故新增 slave 并没有为提高 queue 内容的冗余性或可用性提供额外的好处,只有在新增 slave 前,就已将存在于 queue 中的内容移除的情况下才能产生确实的好处。鉴于这个原因,推荐你最好在创建
mirrored queue 前先设置相应的 node 为 slave ,或者更进一步,你可以确保你进行 message 相关操作时,只会导致产生“存活期非常短或者空的 queue”,因为这种 queue 中的 message 会很快被 consume 光。 或者将该
queue 显式进行同步(参考下面)。既然该 queue 会在同步进行的时候无任何反应,最好能够让被消费内容的 active queue

进行自然地同步动作,而仅仅显式同步 inactive queue 。

You can determine which slaves have synchronised with the following rabbitmqctl invocation, or through
the management plugin:

你可以通过如下 rabbitmqctl 命令或者管理插件来确定哪些 slave 已经进行了同步:

1
rabbitmqctl
list_queues name slave_pids synchronised_slave_pids
Explicit synchronisation can be triggered in two ways: manually or automatically. If a queue is set toautomatically synchronise it will synchronise whenever a new slave joins - becoming unresponsive until it has done so.

显式同步可以使用两种方式触发:手动方式或者自动方式。如果一个 queue 设置为自动同步,则只要有新 slave 加入到当前 cluster ,其就会进行同步 -- 并且会保持无任何反应的状态直到同步的完成。


Starting and Stopping Nodes

启动和停止 node

If you stop a RabbitMQ node which contains the master of a mirrored-queue, some slave on some other node will be promoted tothe master (assuming there is one). If you continue
tostop nodes then you will reach a point where a mirrored-queue has no more slaves: it exists only on one node, which is now its master. If the mirrored-queue was declared durable then, if its last remaining node is shutdown, durable messages in the queue
will survive the restart of that node. In general, as you restart other nodes, if they were previously part of a mirrored-queue then they will rejoin the mirrored queue.
如果你停止了 mirrored-queue 中具有 master 行为的 RabbitMQ node ,那么将会发生某个作为 slave 的 node 被提升为 master 的情况(假定确实存在一个这样的 slave)。如果你继续停止(具有 master 行为的)node
,你最终会面临 mirrored-queue 中没有任何 slave 的情况:即只存在一个 node ,且其为 master 。 在
mirrored-queue 被声明为持久的情况下 ,如果其所包含的最后一个可用
node( 需要注意:此时该
node 已经成为了 master )被停止,那么位于该
queue 中的持久化 message 将在该 node 重启后得到恢复。通常来说, 当你重启一些
node - 如果这些 node 当初 mirrored-queue 的一部分 - 那么这些 node 将会在重启后重新加入到该 mirrored-queue 中。

However, there is currently no way for a slave toknow whether or not its queue contents have diverged from the master towhich it is rejoining (this could happen during a
network partition, for example). As such, when a slave rejoins a mirrored-queue, it throws away any durable local contents it already has and starts empty. Its behaviour is at this point the same as if it were a new
node joining the cluster.
然而, 当前没有任何方式可以让重新加入到
mirrored-queue 中的 slave 确认是否自身拥有的 queue 的内容与 master 的不同 (例如,可能出现在
network partition 的情况中)。所以,当一个 slave 重新加入到 mirrored-queue 中时,它将果断抛弃任何自身之前拥有的本地的持久化内容,并以空( queue )的状态启动。该 slave 的行为从某种意义上来说像是一个新加入到
cluster 中的 node 。


Configuring Mirroring

配置镜像

Queues have mirroring enabled via policy. Policies can change at any time; it is valid tocreate a non-mirrored queue, and then make it mirrored at some later point (and vice
versa). There is a difference between a non-mirrored queue and a mirrored queue which does not have any slaves - the former lacks the extra mirroring infrastructure and will run faster.
queue 可以通过[ policy ]对镜像功能进行控制。任何时候策略都是可以改变的;你可以首先创建一个 non-mirrored queue
,然后在之后的某一个时候将其再变成镜像的(或者相反操作)。在 non-mirrored queue
和不包含任何 slave 的镜像 queue 之间存在一点差别 - 前者因为不需要使用 额外 支持镜像功能的基础组件,故可以运行的更快。

You should be aware of the behaviour of adding
mirrors toa queue.
(相对来讲)你更应该关注为
queue 添加镜像的行为。

To cause queues tobecome mirrored, you should create a policy which matches them and sets policy keys ha-mode and (optionally) ha-params. The following table explains the
options for these keys:
为了使 queue 具有镜像功能,你需要设计一个可以达成你想要功能的策略,并对 key 值 ha-mode 和(可选)ha-params 进行 相应 设置 。下表中解释了这些
key 值的含义:

ha-mode

ha-params

Result

all

(absent)

Queue is mirrored across all nodes in the cluster.When
a new node is added tothe cluster, the queue will be mirrored tothat node.

exactly

count

Queue is mirrored tocount nodes in the cluster.If
there are less than count nodes in the cluster, the queue is mirrored toall nodes.If there are more than count nodes in the cluster, and a node containing a mirror goes down, then
a new mirror will not be created on another node. (This is toprevent queues migrating across a cluster as it is brought down.)

nodes

node names

Queue is mirrored tothe nodes listed in node names.If
any of those node names are not a part of the cluster, this does not constitute an error.If none of the nodes in the list are online at the time when the queue is declared then the
queue will be created on the node that the declaring client is connected to.

ha-mode

ha-params

Result

all

(absent)

Queue 在 cluster 中被镜像到全部 node 上。当新 node
被添加到 cluster 中时,queue 将会自动被镜像到该 node 。

exactly

count

Queue 在 cluster 中被镜像到数目为 count 个 node 上。如果
cluster 中的 node 数目少于 count 值,当前 queue 会被镜像到全部 node 上。如果 cluster 中的 node 数目多于 count 值, 并且含有镜像 queue 的一个 node 失效了,那么并不会在另外一个 node 上创建一个新的镜像。(这时为了防止由于 node 失效引起的 queue 在 cluster 中的迁移行为)

nodes

node names

Queue 被镜像到通过参数 node names 指定的 node 上。如果给出的
node names 均不属于当前的 cluster ,这种情况并不构成错误。如果在 queue 被客户端声明时,参数 node names 指定的 node 均不在线,那么该 queue 将会在当前客户端所连接的那个 node 上被创建。

Whenever the HA policy for a queue changes it will endeavour tokeep its existing mirrors as far as this fits with the new policy.
无论何时用于 queue 的 HA 策略发生了改变,原则上会尽力保持已存在于 queue 中的镜像不变,除非新策略适用于当前 queue 。


"nodes" policy and migrating masters

"nodes" 策略以及 master 迁移问题

Note that setting or modifying a "nodes" policy will never cause an existing master togo away, even if you ask it to. For example, if
a queue is on [A], and you give it a nodes policy telling it tobe on [B C], it will end up on [A B C]. If node A then fails or is shut down, the mirror on that node will not come back and the queue will stay on [B C].
值得注意的是, 设置或者修改
"nodes" 策略将不会导致业已存在的 master 的迁移 ,即使你想要这种结果。例如,如果某个(镜像)
queue 位于 [A] 上,然后你通过设置 nodes 策略打算使该 queue 出现在 [B C] 之上,最终得到的结果会是 [A B C] 上都有该 queue (的镜像)。如果此时 node A 失效或者被关闭,则 node A 上的镜像 queue 将不会再恢复,而只(剩)存在于 [B C] 之上(的镜像 queue)。

Note that setting or modifying a "nodes" policy can cause the existing master togo away if it is not listed in the new policy. In order toprevent message
loss, RabbitMQ will keep the existing master around until at least one other slave has synchronised (even if this is a long time). However, once synchronisation has occured things will proceed just as if the node had failed: consumers will be disconnected
from the master and will need toreconnect.

值得注意的是,设置或者修改 "nodes" policy 的值可能导致已存在的 master 消失,前提条件为该 master 未出现在新 policy 的列表中。为了阻止出现 message 丢失的情况,RabbitMQ 将维持已存在 master 的生存状况,直到至少有一个 slave 与其完成同步(即使这个同步过程要花费很多时间)。然而,一旦同步成功了,就会像前面所说的那样,该 master 将会失效:consumer 将会与该 master 断开连接,之后需要执行重连动作。

For example, if a queue is on [A B] (with A the master), and you give it a nodes policy telling it tobe on[C D], it will initially end up on [A C D].
As soon as the queue synchronises on its new mirrors [C D], the master on A will shut down.

例如,如果一个 queue 位于 [A B] mirrored-queue cluster 中(其中 A 为 master),然后你设置了 policy 为 nodes 以使该 queue 将镜像转移到 [C D] 上面。第一步将完成 [A C D] 的转变。一旦 queue 完成了其与 [C D] 的镜像同步动作,之前作为 master 的 A 将会令自身失效。


Synchronising Queues

同步 queue

Queues can be set toautomatically synchronise by setting the ha-sync-mode policy key toautomatic. ha-sync-mode can
also be set tomanual. If it is not set then manual is assumed.

可以通过设置 ha-sync-mode policy 的值为 automatic
以令 queue 具有自动同步功能。同样,可以设置 ha-sync-mode 的值为 manual
以令其具有手动同步功能。如果没有设置该值则默认为 manual 。

You can determine which slaves are synchronised with the following rabbitmqctl invocation:

你可以使用下面的 rabbitmqctl 命令来查看哪些 slaves 已经完成同步:

1
rabbitmqctl
list_queues name slave_pids synchronised_slave_pids
You can manually synchronise
a queue with:

你可以通过手动同步方式同步一个 queue :

1
rabbitmqctl
sync_queue name
And you can cancel synchronisation
with:

同样,你可以取消某个 queue 的同步同能:

1
rabbitmqctl
cancel_sync_queue name
These features are also available through the management plugin.

这些特性同样可以通过
management 插件来设置。


Some examples

一些例子

Policy where queues whose names begin with "ha." are mirrored toall nodes in the cluster:
【策略】名字以 "ha." 开头的 queue 被镜像到 cluster 中的全部 node 上:

rabbitmqctl 方式 --
rabbitmqctl
set_policy ha-all "^ha\." '{"ha-mode":"all"}'

HTTP API 方式
-- PUT
/api/parameters/policy/%2f/ha-all {"pattern":"^ha\.", "definition":{"ha-mode":"all"}}

Web UI 方式
--

首先 Navigate 到 Admin > Policies > Add / update a policy 。

其次在 Name 栏设置 "ha-all" ,在 Pattern 栏设置 "^ha\.",并在 Policy 栏首行上设置 "ha-mode" = "all" 。

最后点击 Add policy 。

Policy where queues whose names begin with "two." are mirrored toany two nodes in the cluster:
【策略】名字以 "two." 开头的 queue 被镜像到 cluster 中的任意两个 node 上:

rabbitmqctl 方式 --
rabbitmqctl
set_policy ha-two "^two\." '{"ha-mode":"exactly","ha-params":2}'

HTTP API 方式
-- PUT
/api/parameters/policy/%2f/ha-two {"pattern":"^two\.", "definition":{"ha-mode":"exactly", "ha-params":2}}

Web UI 方式
--

首先 Navigate 到 Admin > Policies > Add / update a policy 。

其次在 Name 栏设置 "ha-two" ,在 Pattern 栏设置 "^two\.",并在 Policy 栏首行上设置 "ha-mode" = "exactly",第二行上设置 "ha-params" = 2 并且设置第二行的类型为 "Number" 。

最后点击 Add policy 。

============== 我是分割线 ===============

在 Release:
RabbitMQ 3.0.0 中 对
Mirrored-Queue 做了如下改变:

24908 allow queue mirroring tobe defined by broker-wide policy, not

queue declaration, and add "exactly" mode

所以从 3.0.0 版本开始,将不能再通过使用 Queue.Declare 中 arguments 的方式来声明 Mirrored-Queue 。而改为采用 policy 的方式设置 broker-wide 的策略。

============== 我是分割线 ===============

查看 cluster 中已存在的 queue 的配置信息

1
[root@Betty
~]
#
rabbitmqctl list_queues name durable auto_delete arguments policy pid slave_pids status
2
Listing
queues ...
3
test_queue
false
false
[]
<rabbit@Betty.1.724.0>          running
4
...
done
.
5
[root@Betty
~]
#
设置 policy 令以 test 作为 queue 名开头 queue 在 cluster 中全部 node 上镜像。

view
source

print?

1
[root@Betty
~]
#
rabbitmqctl set_policy ha-all "^test" '{"ha-mode":"all"}'
2
Setting
policy
"ha-all"
for
pattern
"^test"
to
"{\"ha-mode\":\"all\"}"
...
3
...
done
.
4
[root@Betty
~]
#
查看通过设置 policy 构造镜像后 cluster 中的 queue 的配置信息,此时已经生成镜像队列

1
[root@Betty
~]
#
rabbitmqctl list_queues name durable auto_delete arguments policy pid slave_pids status
2
Listing
queues ...
3
test_queue
false
false
[]
ha-all  <rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
4
...
done
.
5
[root@Betty
~]
#
查看 cluster 中各个 node 上所展现的 queue 的信息

01
[root@Betty
~]
#
rabbitmqctl -n rabbit2 list_queues name durable auto_delete policy pid slave_pids status
02
Listing
queues ...
03
haha_queue
false
false
<rabbit@Betty.1.861.0>
running
04
test_queue
false
false
ha-all
<rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
05
...
done
.
06
[root@Betty
~]
#
rabbitmqctl -n rabbit1 list_queues name durable auto_delete policy pid slave_pids status
07
Listing
queues ...
08
haha_queue
false
false
<rabbit@Betty.1.861.0>
running
09
test_queue
false
false
ha-all
<rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
10
...
done
.
11
[root@Betty
~]
#
rabbitmqctl -n rabbit list_queues name durable auto_delete policy pid slave_pids status
12
Listing
queues ...
13
haha_queue
false
false
<rabbit@Betty.1.861.0>
running
14
test_queue
false
false
ha-all
<rabbit@Betty.1.724.0>  [<rabbit2@Betty.1.891.0>, <rabbit1@Betty.1.941.0>]      running
15
...
done
.
16
[root@Betty
~]
#
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  RabbitMQ HA Mirrored Queue