您的位置:首页 > Web前端 > Node.js

hdfs增加ns之后,重启DN报clusterId不匹配错误

2014-12-09 22:23 330 查看
       在测试环境准备测试FastCopy,因为之前只有一个NS,准备增加一个NS也便于测试,一切都准备妥当之后,重启DN,但是DN死活连接不上新的NN,报以下错误:
java.io.IOException: Incompatible clusterIDs in /data0/hadoop/dfs/data: namenode clusterID = CID-79c6e55b-5897-4a30-b278-149827ac200f; datanode clusterID = CID-1561e550-a7b9-4886-8a9a-cc2328b82912
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:944)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:915)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815)
at java.lang.Thread.run(Thread.java:745)</span>

错误提示DN的clusterID和NN的clusterID不匹配,同事提醒说,格式化新增的NN的时候指定DN也有的clusterID(CID-1561e550-a7b9-4886-8a9a-cc2328b82912)就可以了,一个NN节点上执行:

hdfs namenode -format -clusterid CID-1561e550-a7b9-4886-8a9a-cc2328b82912

根据提示格式化完NN和JN之后,启动该NN,新增的另外一个NN不需要格式化,只需要执行以下命令就能将之前启动的NN所有信息同步到自己目录下:

hdfs namenode -bootstrapStandby
同步完成之后,启动NN,然后重启所有的DN,发现在NS1和NS2对应的NN上面都能看到所有的DN了。

以下来说一下什么是clusterID,也即clusterID的作用:

      clusterID,也即是集群唯一的ID,其作用是确保可信任的DN连接到集群,DN中clusterID是DN第一次启动的时候从NN获取:

private void connectToNNAndHandshake() throws IOException {
// get NN proxy
bpNamenode = dn.connectToNN(nnAddr);

// First phase of the handshake with NN - get the namespace
// info.
NamespaceInfo nsInfo = retrieveNamespaceInfo();

// Verify that this matches the other NN in this HA pair.
// This also initializes our block pool in the DN if we are
// the first NN connection for this BP.
bpos.verifyAndSetNamespaceInfo(nsInfo);

// Second phase of the handshake with the NN.
register();
}
NamespaceInfo retrieveNamespaceInfo() throws IOException {
NamespaceInfo nsInfo = null;
while (shouldRun()) {
try {
nsInfo = bpNamenode.versionRequest();
LOG.debug(this + " received versionRequest response: " + nsInfo);
break;
} catch(SocketTimeoutException e) {  // namenode is busy
LOG.warn("Problem connecting to server: " + nnAddr);
} catch(IOException e ) {  // namenode is not available
LOG.warn("Problem connecting to server: " + nnAddr);
}

// try again in a second
sleepAndLogInterrupts(5000, "requesting version info from NN");
}

if (nsInfo != null) {
checkNNVersion(nsInfo);
} else {
throw new IOException("DN shut down before block pool connected");
}
return nsInfo;
}
void initBlockPool(BPOfferService bpos) throws IOException {
NamespaceInfo nsInfo = bpos.getNamespaceInfo();
if (nsInfo == null) {
throw new IOException("NamespaceInfo not found: Block pool " + bpos
+ " should have retrieved namespace info before initBlockPool.");
}

// Register the new block pool with the BP manager.
blockPoolManager.addBlockPool(bpos);

setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());

// In the case that this is the first block pool to connect, initialize
// the dataset, block scanners, etc.
initStorage(nsInfo);
initPeriodicScanners(conf);

data.addBlockPool(nsInfo.getBlockPoolID(), conf);
}

并持久化到本地每一个存储目录下的VERSION文件中的:
cat /data0/hadoop/dfs/data/current/VERSION

#Thu Oct 23 14:06:21 CST 2014
storageID=DS-35e3967e-51e4-4a6c-a3da-d2be044c8522
clusterID=CID-1561e550-a7b9-4886-8a9a-cc2328b82912
cTime=0
datanodeUuid=1327c11f-984c-4c07-a44a-70ba5e84621c
storageType=DATA_NODE
layoutVersion=-55


所以如果HDFS在也有NS的基础上再增加NS,新的NN在格式化的时候必须指定之前也有的clusterID,这样DN才能成功连接上新的DN。

说明:

DN:DataNode

NN:NameNode

JN:JournalNode

NS:NameService
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  namenode fastcopy format
相关文章推荐