您的位置:首页 > 大数据 > Hadoop

The way of enabling HDFS HA by using Cloudera-manager

2016-11-24 11:47 459 查看

Enabling HDFS HA Using Cloudera Manager

Minimum Required Role: Cluster
Administrator (also provided by Full Administrator)
You can use Cloudera Manager to configure your CDH 4 or CDH 5 cluster for HDFS HA and automatic failover. In Cloudera Manager 5, HA is implemented using Quorum-based storage. Quorum-based storage relies upon a set of
JournalNodes, each of which maintains a local edits directory that logs the modifications to the namespace metadata. Enabling HA enables automatic failover as part of the same command.
Important:

Enabling or disabling HA causes the previous monitoring history to become unavailable.

Some parameters will be automatically set as follows once you have enabled
JobTracker HA. If you want to change the value from the default for these parameters, use an advanced configuration snippet.

mapred.jobtracker.restart.recover: true

mapred.job.tracker.persist.jobstatus.active: true

mapred.ha.automatic-failover.enabled: true

mapred.ha.fencing.methods: shell(/bin/true)


Enabling High Availability and Automatic Failover

The Enable High Availability workflow leads you through adding a second (standby) NameNode and configuring JournalNodes. During the workflow,
Cloudera Manager creates a federated
namespace.

Perform all the configuration and setup tasks described under Configuring
Hardware for HDFS HA.

Ensure that you have a ZooKeeper service.

Go to the HDFS service.

Select Actions > Enable High Availability. A screen showing
the hosts that are eligible to run a standby NameNode and the JournalNodes displays.

Specify a name for the nameservice or accept the default name nameservice1 and click Continue.

In the NameNode Hosts field, click Select a host. The host selection dialog box displays.

Check the checkbox next to the hosts where you want the standby NameNode to be set up and clickOK. The standby NameNode cannot be on the same host as the active NameNode, and the host that is chosen
should have the same hardware configuration (RAM, disk space, number of cores, and so on) as the active NameNode.

In the JournalNode Hosts field, click Select hosts. The host selection dialog box displays.

Check the checkboxes next to an odd number of hosts (a minimum of three) to act as JournalNodes and click OK. JournalNodes should be hosted on hosts with similar hardware specification
as the NameNodes. Cloudera recommends that you put a JournalNode each on the same hosts as the active and standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.

Click Continue.

In the JournalNode Edits Directory property, enter a directory location for the JournalNode edits directory into the fields for each JournalNode host.

You may enter only one directory for each JournalNode. The paths do not need to be the same on every JournalNode.

The directories you specify should be empty, and must have the appropriate permissions.

Extra Options: Decide whether Cloudera Manager should clear existing data in ZooKeeper, standby NameNode, and JournalNodes. If the directories are not empty (for example, you are re-enabling
a previous HA configuration), Cloudera Manager will not automatically delete the contents—you can select to delete the contents by keeping the default checkbox selection. The recommended default is to clear the directories. If you choose not to do so, the
data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.

Click Continue.

Cloudera Manager executes a set of commands that will stop the dependent services, delete, create, and configure roles and directories as appropriate, create a nameservice and failover controller, and restart the dependent services and deploy the new client
configuration.

If you want to use other services in a cluster with HA configured, follow the procedures in Configuring
Other CDH Components to Use HDFS HA.

If you are running CDH 4.0 or 4.1, the standby NameNode may fail at the bootstrapStandby command
with the error Unable to read transaction ids 1-7 from the configured shared edits storage. Use rsync or
a similar tool to copy the contents of the dfs.name.dir directory from the active NameNode to
the standby NameNode and start the standby NameNode.
Important: If you change the NameNode Service RPC Port (dfs.namenode.servicerpc-address)
while automatic failover is enabled, this will cause a mismatch between the NameNode address saved in the ZooKeeper /hadoop-ha znode
and the NameNode address that the Failover Controller is configured with. This will prevent the Failover Controllers from restarting. If you need to change the NameNode Service RPC Port after Auto Failover has been enabled, you must do the following to re-initialize
the znode:

Stop the HDFS service.

Configure the service RPC port:

Go to the HDFS service.

Click the Configuration tab.

Select Scope > NameNode.

Select Category > Ports and Addresses.

Locate the NameNode Service RPC Port property or search for it by typing its name in the Search box.

Change the port value as needed.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying
Configuration Properties Using Cloudera Manager.

On a ZooKeeper server host, run zookeeper-client.

Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation
and High Availability
 section on the HDFS Instances tab:
rmr /hadoop-ha/nameservice1


Click the Instances tab.

Select Actions > Initialize High Availability State in ZooKeeper.

Start the HDFS service.


Fencing Methods

In order to ensure that only one NameNode is active at a time, a fencing method is required for the shared edits directory. During a failover, the fencing method is responsible for ensuring that the previous active NameNode
no longer has access to the shared edits directory, so that the new active NameNode can safely proceed writing to it.
By default, Cloudera Manager configures HDFS to use a shell fencing method (shell(./cloudera_manager_agent_fencer.py))
that takes advantage of the Cloudera Manager Agent. However, you can configure HDFS to use the sshfence method,
or you can add your own shell fencing scripts, instead of or in addition to the one Cloudera Manager provides.
The fencing parameters are found in the Service-Wide > High
Availability category under the configuration properties for your HDFS service.
For details of the fencing methods supplied with CDH 5, and how fencing is configured, see Fencing
Configuration.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐