您的位置:首页 > 运维架构

Hadoop 配置参数摘要和默认端口整理

2015-12-03 16:34 633 查看
core-site.xml

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
配置项中文含义英文含义示例官方默认值
fs.defaultFSHDFS分布式文件系统访问URIThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation
class. The uri's authority is used to determine the host, port, etc. for a filesystem.
<property>

<name>fs.defaultFS</name>

<value>hdfs://192.168.100.200:9000</value>

</property>
file:///
hadoop.tmp.dir其他临时文件的根目录A base for other temporary directories.<property>

<name>hadoop.tmp.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/tmp</value>

</property>
/tmp/hadoop-${user.name}
io.file.buffer.size读写操作时的缓存大小。一般为硬件page size的整数倍The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.<property>

<name>io.file.buffer.size</name>

<value>4096</value>

</property>
4096
hadoop.http.staticuser.user The user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI (user to be used for browsing files).<property>

<name>hadoop.http.staticuser.user</name>

<value>bruce</value>

</property>
dr.who
     
     
     
     
     
     
     
     
     
配置项默认地址/端口英文含义备注 
hadoop.registry.zk.quorumlocalhost:2181List of hostname:port pairs defining the zookeeper quorum binding for the registry  
fs.defaultFSfile:///The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation
class. The uri's authority is used to determine the host, port, etc. for a filesystem.
一般配置形式为,hdfs://192.168.100.200:9000。

这样NamdeNode中会启动ipc.Server监听端口9000
 
hdf-site.xml:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
配置项中文含义英文含义示例官方默认值
dfs.namenode.name.dirHDFS name node 存放 name table 的目录。

如果有多个目录,则name table会在每个目录下放1份拷贝。
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories,
for redundancy.
<property>

<name>dfs.namenode.name.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/name</value>

<final>true</final>

</property>
file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dirHDFS data node 存放数据 block 的目录。可以配置多个目录,则每个目录保存一份拷贝。Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different
devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that
do not exist will be created if local filesystem permission allows.
<property>

<name>dfs.datanode.data.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/data</value>

<final>true</final>

</property>
file://${hadoop.tmp.dir}/dfs/data
dfs.replication数据块的副本数量Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. <property>

<name>dfs.replication</name>

<value>1</value>

</property>
3
dfs.namenode.secondary.http-address备份name node的 地址和端口The secondary namenode http server address and port. <property>

<name>dfs.namenode.secondary.http-address</name>

<value>192.168.100.200:50090</value>

</property>
0.0.0.0:50090
dfs.namenode.checkpoint.dir备份name node存放临时快照的目录。可以配置多个目录,则每个目录保存一份拷贝。Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of
the directories for redundancy.
<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/namesecondary</value>

<final>true</final>

</property>
file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.permissions.enabledHDFS文件访问权限控制开关If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the
mode, owner or group of files or directories.
<property>

<name>dfs.permissions.enabled</name>

<value>false</value>

</property>
true
dfs.datanode.addressdata node用作数据传输的地址和端口The datanode server address and port for data transfer.<property>

<name>dfs.datanode.address</name>

<value>192.168.100.200:50010</value>

</property>
0.0.0.0:50010
dfs.webhdfs.enabledWebHDFS特性开关Enable WebHDFS (REST API) in Namenodes and Datanodes.<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>
true
dfs.support.appendYarn框架中无此配置项 <property>

<name>dfs.support.append</name>

<value>true</value>

</property>
 
dfs.permissions.superusergroup The name of the group of super-users.<property>

<name>dfs.permissions.superusergroup</name>

<value>oinstall</value>

</property>
supergroup
dfs.block.invalidate.limit每次删除block的数量。建议默认设置为1000   
配置项默认地址/端口英文含义
dfs.namenode.rpc-address RPC address that handles all clients requests. In the case of HA/Federation where multiple namenodes exist, the name service id is added to the name e.g. dfs.namenode.rpc-address.ns1 dfs.namenode.rpc-address.EXAMPLENAMESERVICE
The value of this property will take the form of nn-host1:rpc-port.
dfs.namenode.rpc-bind-host The actual address the RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.servicerpc-address RPC address for HDFS Services communication. BackupNode, Datanodes and all other services should be connecting to this address if it is configured. In the case of HA/Federation where multiple namenodes exist, the
name service id is added to the name e.g. dfs.namenode.servicerpc-address.ns1 dfs.namenode.rpc-address.EXAMPLENAMESERVICE The value of this property will take the form of nn-host1:rpc-port. If the value of this property is unset the value of dfs.namenode.rpc-address
will be used as the default.
dfs.namenode.servicerpc-bind-host The actual address the service RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. It can also be specified per name node or name service
for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.secondary.http-address0.0.0.0:50090The secondary namenode http server address and port.
dfs.namenode.secondary.https-address0.0.0.0:50091The secondary namenode HTTPS server address and port.
dfs.namenode.http-address0.0.0.0:50070The address and the base port where the dfs namenode web ui will listen on.
dfs.namenode.https-address0.0.0.0:50470The namenode secure http server address and port.
dfs.namenode.backup.address0.0.0.0:50100The backup node server address and port. If the port is 0 then the server will start on a free port.
dfs.namenode.backup.http-address0.0.0.0:50105The backup node http server address and port. If the port is 0 then the server will start on a free port.
dfs.datanode.address0.0.0.0:50010The datanode server address and port for data transfer.
dfs.datanode.http.address0.0.0.0:50075The datanode http server address and port.
dfs.datanode.ipc.address0.0.0.0:50020The datanode ipc server address and port.
dfs.datanode.https.address0.0.0.0:50475The datanode secure http server address and port.
dfs.journalnode.rpc-address0.0.0.0:8485The JournalNode RPC server address and port.
dfs.journalnode.http-address0.0.0.0:8480The address and port the JournalNode HTTP server listens on. If the port is 0 then the server will start on a free port.
dfs.journalnode.https-address0.0.0.0:8481The address and port the JournalNode HTTPS server listens on. If the port is 0 then the server will start on a free port.
mapred-site.xml:

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
配置项中文含义英文含义示例官方默认值
mapreduce.framework.nameMapReduce框架The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>
local
mapreduce.shuffle.portShuffleHandler运行的端口Default port that the ShuffleHandler will run on. ShuffleHandler is a service run at the NodeManager to facilitate transfers of intermediate Map outputs to requesting Reducers. <property>

<name>mapreduce.shuffle.port</name>

<value>13562</value>

</property>
13562
mapred.system.dirYarn不支持 <property>

<name>mapred.system.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/mapred/system</value>

<final>true</final>

</property>
 
mapred.local.dirYarn不支持 <property>

<name>mapred.local.dir</name>

<value>file:///uloc/hadoopdata/hadoop-${user.name}/mapred/local</value>

<final>true</final>

</property>
 
mapred.child.java.optsTask任务进程的Java运行参数选项Java opts for the task processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable
verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc Usage of -Djava.library.path can cause programs to no longer function if hadoop native
libraries are used. These values should instead be set as part of LD_LIBRARY_PATH in the map / reduce JVM env using the mapreduce.map.env and mapreduce.reduce.env config settings.
<property>

<name>mapred.child.java.opts</name>

<value>-Xmx3072M</value>

</property>
-Xmx200m
mapreduce.reduce.java.optsYarn不支持 <property>

<name>mapreduce.reduce.java.opts</name>

<value>-Xmx1024M</value>

</property>
 
mapreduce.map.memory.mb The amount of memory to request from the scheduler for each map task. <property>

<name>mapreduce.map.memory.mb</name>

<value>1024</value>

</property>
1024
mapreduce.reduce.memory.mb The amount of memory to request from the scheduler for each reduce task. <property>

<name>mapreduce.reduce.memory.mb</name>

<value>1024</value>

</property>
1024
mapreduce.task.io.sort.mb The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.<property>

<name>mapreduce.task.io.sort.mb</name>

<value>1024</value>

</property>
100
mapreduce.task.io.sort.factor The number of streams to merge at once while sorting files. This determines the number of open file handles.<property>

<name>mapreduce.task.io.sort.factor</name>

<value>100</value>

</property>
10
mapreduce.reduce.shuffle.parallelcopies The default number of parallel transfers run by reduce during the copy(shuffle) phase. <property>

<name>mapreduce.reduce.shuffle.parallelcopies</name>

<value>50</value>

</property>
5
mapreduce.jobhistory.address MapReduce JobHistory Server IPC host:port<property>

<name>mapreduce.jobhistory.address</name>

<value>192.168.100.200:10020</value>

</property>
0.0.0.0:10020
配置项默认地址/端口英文含义
mapreduce.jobtracker.http.address0.0.0.0:50030The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port.
mapreduce.tasktracker.report.address127.0.0.1:0The interface and port that task tracker server listens on. Since it is only connected to by the tasks, it uses the local interface. EXPERT ONLY. Should only be changed if your host does not have the loopback interface.
mapreduce.tasktracker.http.address0.0.0.0:50060The task tracker http server address and port. If the port is 0 then the server will start on a free port.
mapreduce.jobhistory.address0.0.0.0:10020MapReduce JobHistory Server IPC host:port
mapreduce.jobhistory.webapp.address0.0.0.0:19888MapReduce JobHistory Server Web UI host:port
mapreduce.jobhistory.admin.address0.0.0.0:10033The address of the History server admin interface.
yarn-site.xml:

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml 
配置项中文含义英文含义示例官方默认值 
yarn.resourcemanager.address The address of the applications manager interface in the RM.<property>

<name>yarn.resourcemanager.address</name>

<value>192.168.100.200:8032</value>

</property>
${yarn.resourcemanager.hostname}:8032  
yarn.resourcemanager.scheduler.address The address of the scheduler interface.<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>192.168.100.200:8030</value>

</property>
${yarn.resourcemanager.hostname}:8030 
yarn.resourcemanager.resource-tracker.address  <property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>192.168.100.200:8031</value>

</property>
${yarn.resourcemanager.hostname}:8031 
yarn.resourcemanager.admin.address The address of the RM admin interface. <property>

<name>yarn.resourcemanager.admin.address</name>

<value>192.168.100.200:8033</value>

</property>
${yarn.resourcemanager.hostname}:8033 
yarn.resourcemanager.webapp.address The http address of the RM web application.<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>192.168.100.200:8088</value>

</property>
${yarn.resourcemanager.hostname}:8088 
yarn.nodemanager.aux-services A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

</property>
  
yarn.nodemanager.aux-services.mapreduce.shuffle.classYarn不支持 <property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>
  
yarn.scheduler.maximum-allocation-mb The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this will throw a InvalidResourceRequestException.<property>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>10000</value>

</property>
8192 
yarn.scheduler.minimum-allocation-mb The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this will throw a InvalidResourceRequestException.<property>

<name>yarn.scheduler.minimum-allocation-mb</name>

<value>1000</value>

</property>
1024 
mapreduce.reduce.memory.mbYarn不支持 <property>

<name>mapreduce.reduce.memory.mb</name>

<value>1000</value>

</property>
  
yarn.nodemanager.local-dirs List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual
containers' work directories, called container_${contid}, will be subdirectories of this.
<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/nmlocal</value>

</property>
${hadoop.tmp.dir}/nm-local-dir  
yarn.nodemanager.resource.memory-mb Amount of physical memory, in MB, that can be allocated for containers.<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>4096</value>

</property>
8192不能设置过小(比如,1024)。否则job可以提交,但是无法运行。

实验中,2048都无法进行job运行。选择3172可以运行。
yarn.nodemanager.remote-app-log-dir Where to aggregate logs to.<property>

<name>yarn.nodemanager.remote-app-log-dir</name>

<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/logs</value>

</property>
/tmp/logs  
yarn.nodemanager.log-dirs Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this,
in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.
<property>

<name>yarn.nodemanager.log-dirs</name>

<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/userlogs</value>

</property>
${yarn.log.dir}/userlogs  
yarn.web-proxy.address The address for the web proxy as HOST:PORT, if this is not given then the proxy will run as part of the RM<property>

<name>yarn.web-proxy.address</name>

<value>192.168.100.200:54315</value>

</property>
  
yarn.resourcemanager.hostname The hostname of the RM. <property>

<name>yarn.resourcemanager.hostname</name>

<value>robot123</value>

</property>
0.0.0.0 
yarn.nodemanager.address The address of the container manager in the NM. <property>

<name>yarn.nodemanager.address</name>

<value>192.168.100.200:11000</value>

</property>
${yarn.nodemanager.hostname}:0 
配置项默认地址/端口英文含义
yarn.resourcemanager.hostname0.0.0.0The hostname of the RM.
yarn.resourcemanager.address${yarn.resourcemanager.hostname}:8032The address of the applications manager interface in the RM.
yarn.resourcemanager.scheduler.address${yarn.resourcemanager.hostname}:8030The address of the scheduler interface.
yarn.resourcemanager.webapp.address${yarn.resourcemanager.hostname}:8088The http address of the RM web application.
yarn.resourcemanager.webapp.https.address${yarn.resourcemanager.hostname}:8090The https adddress of the RM web application.
yarn.resourcemanager.resource-tracker.address${yarn.resourcemanager.hostname}:8031 
yarn.resourcemanager.admin.address${yarn.resourcemanager.hostname}:8033The address of the RM admin interface.
yarn.nodemanager.hostname0.0.0.0The hostname of the NM.
yarn.nodemanager.address${yarn.nodemanager.hostname}:0The address of the container manager in the NM.
yarn.nodemanager.localizer.address${yarn.nodemanager.hostname}:8040Address where the localizer IPC is.
yarn.nodemanager.webapp.address${yarn.nodemanager.hostname}:8042NM Webapp address.
yarn.timeline-service.hostname0.0.0.0The hostname of the timeline service web application.
yarn.timeline-service.address${yarn.timeline-service.hostname}:10200This is default address for the timeline server to start the RPC server.
yarn.timeline-service.webapp.address${yarn.timeline-service.hostname}:8188The http address of the timeline service web application.
yarn.timeline-service.webapp.https.address${yarn.timeline-service.hostname}:8190The https address of the timeline service web application.
yarn.sharedcache.admin.address0.0.0.0:8047The address of the admin interface in the SCM (shared cache manager)
yarn.sharedcache.webapp.address0.0.0.0:8788The address of the web application in the SCM (shared cache manager)
yarn.sharedcache.uploader.server.address0.0.0.0:8046The address of the node manager interface in the SCM (shared cache manager)
yarn.sharedcache.client-server.address0.0.0.0:8045The address of the client interface in the SCM (shared cache manager)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: