Hadoop基础---MapReduce的几种运行模式(方便调试)

注:该文章承接上一篇:Hadoop基础---MapReduce实现

一:YARN框架:进行资源调度

(一)YARN框架流程图

 

注意:yarn框架只做资源的管理,如果要运行一个程序,则会为该程序分配节点、内存、cpu等资源,至于该程序如何运行,yarn框架不进行管理。故也不会知道mapreduce的运行逻辑 。同样因为这样的松耦合,yarn框架的使用范围更加广泛,可以兼容其他运行程序。

补充:MapReduce框架知道我们写的map-reduce程序的运行逻辑。我们写的map-reduce中并没有管理层的任务运行分配逻辑,该逻辑被封装在MapReduce框架里面,被封装为MRAppMaster类,该类用于管理整个map-reduce的运行逻辑。(map-reduce程序的管理者)

重点:步骤6中,由NodeManager主动发送心跳包,去ResourceManager检测是否有job任务,只当该NodeManager(即DataNode)有相关资源时,才会领取该job

MRAppMaster由YARN框架启动(动态启动,随机选取)

(二)补充:RunJar上传到HDFS中的资源:

1.其中job.jar就是我们生成的wc.jar包。

2.job.split中数据如下:

SPL/org.apache.hadoop.mapreduce.lib.input.FileSplit(hdfs://hadoopH1:9000/wc/input/wcdata.txt;

含有输入数据所在HDFS路径及文件名。

3.job.splitmetainfo数据如下:

META-SPhadoopH1;

为节点主机名。

4.job.xml含有集群的各种配置信息

<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>dfs.journalnode.rpc-address</name><value>0.0.0.0:8485</value><source>hdfs-default.xml</source></property>
<property><name>yarn.ipc.rpc.class</name><value>org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.maxtaskfailures.per.tracker</name><value>3</value><source>mapred-default.xml</source></property>
<property><name>yarn.client.max-cached-nodemanagers-proxies</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.speculative.retry-after-speculate</name><value>15000</value><source>mapred-default.xml</source></property>
<property><name>ha.health-monitor.connect-retry-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.work-preserving-recovery.enabled</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.mmap.cache.size</name><value>256</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.markreset.buffer.percent</name><value>0.0</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.data.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/data</value><source>hdfs-site.xml</source></property>
<property><name>mapreduce.jobhistory.max-age-ms</name><value>604800000</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.lazypersist.file.scrub.interval.sec</name><value>300</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.ubertask.enable</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.delegation.token.renew-interval</name><value>86400000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.input.fileinputformat.numinputfiles</name><value>1</value><source>programatically</source></property>
<property><name>yarn.nodemanager.log-aggregation.compression-type</name><value>none</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.replication.considerLoad</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.complete.cancel.delegation.tokens</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.datestring.cache.size</name><value>200000</value><source>mapred-default.xml</source></property>
<property><name>hadoop.security.kms.client.authentication.retry-count</name><value>1</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled.protocols</name><value>TLSv1</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.retrycache.heap.percent</name><value>0.03f</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.top.window.num.buckets</name><value>10</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.scheduler.address</name><value>${yarn.resourcemanager.hostname}:8030</value><source>yarn-default.xml</source></property>
<property><name>fs.s3a.fast.buffer.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>dfs.client.file-block-storage-locations.num-threads</name><value>10</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.balance.bandwidthPerSec</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.proxy-user-privileges.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.decommission.max.concurrent.tracked.nodes</name><value>100</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.fetch.retry.enabled</name><value>${yarn.nodemanager.recovery.enabled}</value><source>mapred-default.xml</source></property>
<property><name>io.mapfile.bloom.error.rate</name><value>0.005</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.resourcemanager.minimum.version</name><value>NONE</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property>
<property><name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name><value>${dfs.web.authentication.kerberos.principal}</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.delete.debug-delay-sec</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.read.shortcircuit.streams.cache.size</name><value>256</value><source>hdfs-default.xml</source></property>
<property><name>dfs.image.transfer.bandwidthPerSec</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>yarn.scheduler.maximum-allocation-vcores</name><value>32</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.address</name><value>${yarn.timeline-service.hostname}:10200</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.hdfs-servers</name><value>${fs.defaultFS}</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.task.profile.reduce.params</name><value>${mapreduce.task.profile.params}</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.min-block-size</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>ftp.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>dfs.client.use.legacy.blockreader.local</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.short.circuit.shared.memory.watcher.interrupt.check.ms</name><value>60000</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.directoryscan.threads</name><value>1</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3a.buffer.dir</name><value>${hadoop.tmp.dir}/s3a</value><source>core-default.xml</source></property>
<property><name>yarn.client.application-client-protocol.poll-interval-ms</name><value>200</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.path</name><value>${hadoop.tmp.dir}/yarn/timeline</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.split.metainfo.maxsize</name><value>10000000</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.edits.noeditlogchannelflush</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>s3native.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>yarn.client.failover-retries-on-socket-timeouts</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.startup.delay.block.deletion.sec</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>dfs.webhdfs.user.provider.user.pattern</name><value>^[A-Za-z_][A-Za-z0-9._-]*[$]?$</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.tasktracker.tasks.sleeptimebeforesigkill</name><value>5000</value><source>mapred-default.xml</source></property>
<property><name>yarn.timeline-service.client.retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property>
<property><name>dfs.encrypt.data.transfer.cipher.key.bitlength</name><value>128</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.http.authentication.type</name><value>simple</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.path.based.cache.refresh.interval.ms</name><value>30000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.local.clientfactory.class.name</name><value>org.apache.hadoop.mapred.LocalClientFactory</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.cache.revocation.timeout.ms</name><value>900000</value><source>hdfs-default.xml</source></property>
<property><name>ipc.client.connection.maxidletime</name><value>10000</value><source>core-default.xml</source></property>
<property><name>ipc.server.max.connections</name><value>0</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.recovery.store.leveldb.path</name><value>${hadoop.tmp.dir}/mapred/history/recoverystore</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.safemode.threshold-pct</name><value>0.999f</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3a.multipart.purge.age</name><value>86400</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.num.checkpoints.retained</name><value>2</value><source>hdfs-default.xml</source></property>
<property><name>yarn.timeline-service.client.best-effort</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.ubertask.maxmaps</name><value>9</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.stale.datanode.interval</name><value>30000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name><value>90.0</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.tasktracker.http.address</name><value>0.0.0.0:50060</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.ifile.readahead.bytes</name><value>4194304</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.admin.address</name><value>0.0.0.0:10033</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.uploader.server.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>s3.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>dfs.block.access.token.lifetime</name><value>600</value><source>hdfs-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.input.lineinputformat.linespermap</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.num.extra.edits.retained</name><value>1000000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.input.buffer.percent</name><value>0.70</value><source>mapred-default.xml</source></property>
<property><name>hadoop.http.staticuser.user</name><value>dr.who</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.maxattempts</name><value>4</value><source>mapred-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.user</name><value>(&amp;(objectClass=user)(sAMAccountName={0}))</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.admin.acl</name><value>*</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.context</name><value>default</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.map.maxattempts</name><value>4</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.zk-retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobhistory.cleaner.interval-ms</name><value>86400000</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.drop.cache.behind.reads</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.permissions.superusergroup</name><value>supergroup</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3n.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.registry.system.acls</name><value>sasl:yarn@, sasl:mapred@, sasl:hdfs@</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.list.cache.pools.num.responses</name><value>100</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.slow.io.warning.threshold.ms</name><value>300</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.store.in-memory.check-period-mins</name><value>720</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.max-blocks-per-file</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.vmem-check-enabled</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.map.class</name><value>cn.hadoop.mr.wc.WCMapper</value><source>programatically</source></property>
<property><name>hadoop.security.authentication</name><value>simple</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.cpu.vcores</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>net.topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value><source>core-default.xml</source></property>
<property><name>fs.s3.sleepTimeSeconds</name><value>10</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.ttl-ms</name><value>604800000</value><source>yarn-default.xml</source></property>
<property><name>yarn.sharedcache.root-dir</name><value>/sharedcache</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.heartbeats.in.second</name><value>100</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts</name><value>3</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name><value>/hadoop-yarn</value><source>yarn-default.xml</source></property>
<property><name>s3.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.require.client.cert</name><value>false</value><source>core-default.xml</source></property>
<property><name>dfs.journalnode.http-address</name><value>0.0.0.0:8480</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.output.fileoutputformat.compress</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.ha.automatic-failover.enabled</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.shuffle.max.threads</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.invalidate.work.pct.per.iteration</name><value>0.32f</value><source>hdfs-default.xml</source></property>
<property><name>s3native.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>dfs.client.block.write.replace-datanode-on-failure.policy</name><value>DEFAULT</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.client.submit.file.replication</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.job.committer.commit-window</name><value>10000</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name><value>250</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.acls.enabled</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.secondary.http-address</name><value>0.0.0.0:50090</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.map.speculative</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.speculative.slowtaskthreshold</name><value>1.0</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.cgroups.mount</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.tasktracker.http.threads</name><value>40</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.http.policy</name><value>HTTP_ONLY</value><source>mapred-default.xml</source></property>
<property><name>fs.s3a.paging.maximum</name><value>5000</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.nodemanager-connect-retries</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value><source>core-default.xml</source></property>
<property><name>io.native.lib.available</name><value>true</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.heartbeat.recheck-interval</name><value>300000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobhistory.done-dir</name><value>${yarn.app.mapreduce.am.staging-dir}/history/done</value><source>mapred-default.xml</source></property>
<property><name>hadoop.registry.zk.retry.interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.threads.core</name><value>15</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.avoid.write.stale.datanode</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.txns</name><value>1000000</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.ssl.hostname.verifier</name><value>DEFAULT</value><source>core-default.xml</source></property>
<property><name>mapreduce.task.timeout</name><value>600000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.jar</name><value>/tmp/hadoop-yarn/staging/hadoop/.staging/job_1582165983362_0009/job.jar</value><source>programatically</source></property>
<property><name>yarn.nodemanager.disk-health-checker.interval-ms</name><value>120000</value><source>yarn-default.xml</source></property>
<property><name>dfs.journalnode.https-address</name><value>0.0.0.0:8481</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.groups.cache.secs</name><value>300</value><source>core-default.xml</source></property>
<property><name>mapreduce.input.fileinputformat.split.minsize</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.sync.behind.writes</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.shuffle.port</name><value>13562</value><source>mapred-default.xml</source></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value><source>core-default.xml</source></property>
<property><name>dfs.client.https.keystore.resource</name><value>ssl-client.xml</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.list.encryption.zones.num.responses</name><value>100</value><source>hdfs-default.xml</source></property>
<property><name>yarn.client.failover-proxy-provider</name><value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.retiredjobs.cache.size</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>dfs.ha.tail-edits.period</name><value>60</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.drop.cache.behind.writes</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3.maxRetries</name><value>4</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobtracker.address</name><value>local</value><source>mapred-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>HTTP/_HOST@LOCALHOST</value><source>core-default.xml</source></property>
<property><name>nfs.server.port</name><value>2049</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.webapp.address</name><value>${yarn.resourcemanager.hostname}:8088</value><source>yarn-default.xml</source></property>
<property><name>mapred.mapper.new-api</name><value>true</value><source>programatically</source></property>
<property><name>mapreduce.task.profile.reduces</name><value>0-2</value><source>mapred-default.xml</source></property>
<property><name>yarn.timeline-service.client.max-retries</name><value>30</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.am.max-attempts</name><value>2</value><source>yarn-default.xml</source></property>
<property><name>nfs.dump.dir</name><value>/tmp/.hdfs-nfs</value><source>hdfs-default.xml</source></property>
<property><name>dfs.bytes-per-checksum</name><value>512</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.end-notification.max.retry.interval</name><value>5000</value><source>mapred-default.xml</source></property>
<property><name>ipc.client.connect.retry.interval</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.size</name><value>104857600</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx1024m</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.process-kill-wait.ms</name><value>2000</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.state-store-class</name><value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.safemode.min.datanodes</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.speculative.minimum-allowed-tasks</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.write.stale.datanode.ratio</name><value>0.5f</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.jetty.logs.serve.aliases</name><value>true</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.fetch.retry.timeout-ms</name><value>30000</value><source>mapred-default.xml</source></property>
<property><name>fs.du.interval</name><value>600000</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.dns.nameserver</name><value>default</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.admin.address</name><value>0.0.0.0:8047</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.random.device.file.path</name><value>/dev/urandom</value><source>core-default.xml</source></property>
<property><name>mapreduce.task.merge.progress.records</name><value>10000</value><source>mapred-default.xml</source></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.registry.secure</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.client.conf</name><value>ssl-client.xml</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.counters.max</name><value>120</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.localizer.fetch.thread-count</name><value>4</value><source>yarn-default.xml</source></property>
<property><name>io.mapfile.bloom.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.localizer.client.thread-count</name><value>5</value><source>yarn-default.xml</source></property>
<property><name>fs.automatic.close</name><value>true</value><source>core-default.xml</source></property>
<property><name>mapreduce.task.profile</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.edit.log.autoroll.multiplier.threshold</name><value>2.0</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.task.combine.progress.records</name><value>10000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.shuffle.ssl.file.buffer.size</name><value>65536</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.client.job.max-retries</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>fs.swift.impl</name><value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.container.log.backups</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name><value>0.75f</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.backup.address</name><value>0.0.0.0:50100</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.https.need-auth</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.app-submission.cross-platform</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.name</name><value>wc.jar</value><source>programatically</source></property>
<property><name>yarn.timeline-service.ttl-enable</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>dfs.user.home.dir.prefix</name><value>/user</value><source>hdfs-default.xml</source></property>
<property><name>mapred.reducer.new-api</name><value>true</value><source>programatically</source></property>
<property><name>yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.xattrs.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.write.exclude.nodes.cache.expiry.interval.millis</name><value>600000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobtracker.restart.recover</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.client-server.address</name><value>0.0.0.0:8045</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.map.skip.proc.count.autoincr</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>dfs.image.transfer.chunksize</name><value>65536</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.instrumentation.requires.admin</name><value>false</value><source>core-default.xml</source></property>
<property><name>io.compression.codec.bzip2.library</name><value>system-native</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.name.dir.restore</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.resource.checked.volumes.minimum</name><value>1</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.ssl.keystores.factory.class</name><value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.list.cache.directives.num.responses</name><value>100</value><source>hdfs-default.xml</source></property>
<property><name>fs.ftp.host</name><value>0.0.0.0</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.containerlauncher.threadpool-initial-size</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>s3.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>s3native.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobtracker.taskscheduler</name><value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.dns.nameserver</name><value>default</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.resource.memory-mb</name><value>8192</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.task.userlog.limit.kb</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>hadoop.security.crypto.codec.classes.aes.ctr.nopadding</name><value>org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.speculative</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.container-monitor.interval-ms</name><value>3000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.submithostname</name><value>hadoopH1</value><source>programatically</source></property>
<property><name>dfs.replication.max</name><value>512</value><source>hdfs-default.xml</source></property>
<property><name>dfs.replication</name><value>1</value><source>hdfs-site.xml</source></property>
<property><name>yarn.client.failover-retries</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.resource.cpu-vcores</name><value>8</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobhistory.recovery.enable</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>nfs.exports.allowed.hosts</name><value>* rw</value><source>core-default.xml</source></property>
<property><name>yarn.sharedcache.checksum.algo.impl</name><value>org.apache.hadoop.yarn.sharedcache.ChecksumSHA256Impl</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.memory.limit.percent</name><value>0.25</value><source>mapred-default.xml</source></property>
<property><name>file.replication</name><value>1</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name><value>org.apache.hadoop.mapreduce.task.reduce.Shuffle</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.jvm.numtasks</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name><value>4</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.am.max-attempts</name><value>2</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.shuffle.connection-keep-alive.timeout</name><value>5</value><source>mapred-default.xml</source></property>
<property><name>hadoop.fuse.timer.period</name><value>5</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.reduces</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>30</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value><source>yarn-default.xml</source></property>
<property><name>s3native.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.reduce.tasks.maximum</name><value>2</value><source>mapred-default.xml</source></property>
<property><name>fs.permissions.umask-mode</name><value>022</value><source>core-default.xml</source></property>
<property><name>mapreduce.cluster.local.dir</name><value>${hadoop.tmp.dir}/mapred/local</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.client.output.filter</name><value>FAILED</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.pmem-check-enabled</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>mapred.queue.default.acl-administer-jobs</name><value>*</value><source>programatically</source></property>
<property><name>dfs.client.failover.connection.retries.on.timeouts</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobtracker.instrumentation</name><value>org.apache.hadoop.mapred.JobTrackerMetricsInst</value><source>mapred-default.xml</source></property>
<property><name>ftp.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>mapreduce.map.output.key.class</name><value>org.apache.hadoop.io.Text</value><source>programatically</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.member</name><value>member</value><source>core-default.xml</source></property>
<property><name>fs.s3a.max.total.tasks</name><value>1000</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.replication.work.multiplier.per.iteration</name><value>2</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.fs.state-store.num-retries</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-state-store.path</name><value>${hadoop.tmp.dir}/yarn/timeline</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.resource-tracker.address</name><value>${yarn.resourcemanager.hostname}:8031</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.tasktracker.outofband.heartbeat</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.edits.dir</name><value>${dfs.namenode.name.dir}</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.scheduler.monitor.enable</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>fs.trash.checkpoint.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>hadoop.registry.zk.retry.times</name><value>5</value><source>core-default.xml</source></property>
<property><name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name><value>300000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>s3.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.maximum</name><value>15</value><source>core-default.xml</source></property>
<property><name>file.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.healthchecker.script.timeout</name><value>600000</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.max-directory-items</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.tasktracker.taskcontroller</name><value>org.apache.hadoop.mapred.DefaultTaskController</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.path.based.cache.block.map.allocation.percent</name><value>0.25</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3a.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.windows-container.memory-limit.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.dir</name><value>file://${hadoop.tmp.dir}/dfs/namesecondary</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>/tmp/logs</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.retry-delay.max.ms</name><value>60000</value><source>mapred-default.xml</source></property>
<property><name>io.map.index.interval</name><value>128</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.replication.interval</name><value>3</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.block.write.replace-datanode-on-failure.enable</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.ssl.server.conf</name><value>ssl-server.xml</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.client.max-retries</name><value>3</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.address</name><value>${yarn.nodemanager.hostname}:0</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.max.transfer.threads</name><value>4096</value><source>hdfs-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>dfs.datanode.ipc.address</name><value>0.0.0.0:50020</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.delayed.delegation-token.removal-interval-ms</name><value>30000</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.kerberos.principal.pattern</name><value>*</value><source>hdfs-default.xml</source></property>
<property><name>yarn.timeline-service.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.cached.conn.retry</name><value>3</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.backup.http-address</name><value>0.0.0.0:50105</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.tasktracker.report.address</name><value>127.0.0.1:0</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.period</name><value>3600</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.shared.file.descriptor.paths</name><value>/dev/shm,/tmp</value><source>hdfs-default.xml</source></property>
<property><name>dfs.http.policy</name><value>HTTP_ONLY</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.groups.cache.warn.after.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.fs.state-store.retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.max-xattrs-per-inode</name><value>32</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.zk-acl</name><value>world:anyone:rwcda</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.support.allow.format</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.app-checker.class</name><value>org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.max-retries</name><value>3</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name><value>2000, 500</value><source>yarn-default.xml</source></property>
<property><name>fs.s3a.fast.upload</name><value>false</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.committer.setup.cleanup.needed</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.cache.revocation.polling.ms</name><value>500</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.end-notification.retry.attempts</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.state-store.max-completed-applications</name><value>${yarn.resourcemanager.max-completed-applications}</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.map.output.compress</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.cleaner.enable</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.running.reduce.limit</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>io.seqfile.local.dir</name><value>${hadoop.tmp.dir}/io/local</value><source>core-default.xml</source></property>
<property><name>dfs.blockreport.split.threshold</name><value>1000000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.read.timeout</name><value>180000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.queuename</name><value>default</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.scan.period.hours</name><value>504</value><source>hdfs-default.xml</source></property>
<property><name>ipc.client.connect.max.retries</name><value>10</value><source>core-default.xml</source></property>
<property><name>io.seqfile.lazydecompress</name><value>true</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.staging-dir</name><value>/tmp/hadoop-yarn/staging</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.resources-handler.class</name><value>org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler</value><source>yarn-default.xml</source></property>
<property><name>yarn.app.mapreduce.client.job.retry-interval</name><value>2000</value><source>mapred-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name><value>104857600</value><source>yarn-default.xml</source></property>
<property><name>io.file.buffer.size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs</name><value>86400</value><source>yarn-default.xml</source></property>
<property><name>ha.zookeeper.parent-znode</name><value>/hadoop-ha</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.indexcache.mb</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>tfile.io.chunk.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.submithostaddress</name><value>192.168.58.100</value><source>programatically</source></property>
<property><name>yarn.acl.enable</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>rpc.engine.org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB</name><value>org.apache.hadoop.ipc.ProtobufRpcEngine</value><source>programatically</source></property>
<property><name>hadoop.security.group.mapping.ldap.directory.search.timeout</name><value>10000</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.token.tracking.ids.enabled</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.block-pinning.enabled</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.map.output.compress.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>s3.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>hadoop.registry.zk.root</name><value>/registry</value><source>core-default.xml</source></property>
<property><name>tfile.fs.input.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.http-authentication.type</name><value>simple</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.user.name</name><value>hadoop</value><source>programatically</source></property>
<property><name>ha.failover-controller.graceful-fence.connection.retries</name><value>1</value><source>core-default.xml</source></property>
<property><name>net.topology.script.number.args</name><value>100</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>yarn.sharedcache.admin.thread-count</name><value>1</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.recovery.dir</name><value>${hadoop.tmp.dir}/yarn-nm-recovery</value><source>yarn-default.xml</source></property>
<property><name>hadoop.ssl.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.ftp.impl</name><value>org.apache.hadoop.fs.ftp.FtpFs</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.handler-thread-count</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.reject-unresolved-dn-topology-mapping</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobhistory.recovery.store.class</name><value>org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.log.retain-seconds</name><value>10800</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.admin.address</name><value>${yarn.resourcemanager.hostname}:8033</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.slow.io.warning.threshold.ms</name><value>30000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name><value>/yarn-leader-election</value><source>yarn-default.xml</source></property>
<property><name>fs.AbstractFileSystem.viewfs.impl</name><value>org.apache.hadoop.fs.viewfs.ViewFs</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.dns.interface</name><value>default</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobtracker.handler.count</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>dfs.blockreport.initialDelay</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>fs.AbstractFileSystem.hdfs.impl</name><value>org.apache.hadoop.fs.Hdfs</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.top.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.retrycache.expirytime.millis</name><value>600000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.speculative.speculative-cap-total-tasks</name><value>0.01</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.failover.sleep.max.millis</name><value>15000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.output.value.class</name><value>org.apache.hadoop.io.LongWritable</value><source>programatically</source></property>
<property><name>yarn.sharedcache.nm.uploader.thread-count</name><value>20</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.blocks.per.postponedblocks.rescan</name><value>10000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.max-completed-applications</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.log-dirs</name><value>${yarn.log.dir}/userlogs</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.failover.sleep.base.millis</name><value>500</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.user-pattern</name><value>^[_.A-Za-z0-9][-@_.A-Za-z0-9]{0,255}?[$]?$</value><source>yarn-default.xml</source></property>
<property><name>dfs.default.chunk.view.size</name><value>32768</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.read.shortcircuit</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>ftp.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.acl-modify-job</name><value> </value><source>mapred-default.xml</source></property>
<property><name>fs.defaultFS</name><value>hdfs://hadoopH1:9000/</value><source>programatically</source></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer</value><source>programatically</source></property>
<property><name>fs.s3n.multipart.copy.block.size</name><value>5368709120</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.connect.max-wait.ms</name><value>900000</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.ssl</name><value>false</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.max.extra.edits.segments.retained</name><value>10000</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.https-address</name><value>0.0.0.0:50470</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value><source>yarn-site.xml</source></property>
<property><name>dfs.block.scanner.volume.bytes.per.second</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.store.class</name><value>org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.decommission.blocks.per.interval</name><value>500000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.admin.client.thread-count</name><value>1</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.size</name><value>500</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.shuffle.log.separate</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>ipc.client.kill.max</name><value>10</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.group</name><value>(objectClass=group)</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.file.impl</name><value>org.apache.hadoop.fs.local.LocalFs</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>${user.home}/hadoop.keytab</value><source>core-default.xml</source></property>
<property><name>yarn.client.nodemanager-connect.max-wait-ms</name><value>180000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.map.output.collector.class</name><value>org.apache.hadoop.mapred.MapTask$MapOutputBuffer</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.path.based.cache.retry.interval.ms</name><value>30000</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.uid.cache.secs</name><value>14400</value><source>core-default.xml</source></property>
<property><name>mapreduce.map.cpu.vcores</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>-1</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.map.log.level</name><value>INFO</value><source>mapred-default.xml</source></property>
<property><name>mapred.child.java.opts</name><value>-Xmx200m</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.hard-kill-timeout-ms</name><value>10000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.output.key.class</name><value>org.apache.hadoop.io.Text</value><source>programatically</source></property>
<property><name>hadoop.registry.zk.session.timeout.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.running.map.limit</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.store.in-memory.initial-delay-mins</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>yarn.sharedcache.client-server.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.local-cache.max-files-per-directory</name><value>8192</value><source>yarn-default.xml</source></property>
<property><name>dfs.https.server.keystore.resource</name><value>ssl-server.xml</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobtracker.taskcache.levels</name><value>2</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.handler.count</name><value>10</value><source>hdfs-default.xml</source></property>
<property><name>s3native.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>mapreduce.client.completion.pollinterval</name><value>5000</value><source>mapred-default.xml</source></property>
<property><name>dfs.stream-buffer-size</name><value>4096</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.delegation.key.update-interval</name><value>86400000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.maps</name><value>1</value><source>programatically</source></property>
<property><name>mapreduce.job.acl-view-job</name><value> </value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.working.dir</name><value>hdfs://hadoopH1:9000/user/hadoop</value><source>programatically</source></property>
<property><name>dfs.namenode.enable.retrycache</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>30000</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name><value>300000</value><source>yarn-default.xml</source></property>
<property><name>fs.s3a.multipart.threshold</name><value>2147483647</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.decommission.interval</name><value>30</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.shuffle.max.connections</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.input.fileinputformat.inputdir</name><value>hdfs://hadoopH1:9000/wc/input</value><source>programatically</source></property>
<property><name>yarn.log-aggregation-enable</name><value>true</value><source>yarn-site.xml</source></property>
<property><name>dfs.client-write-packet-size</name><value>65536</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.file-block-storage-locations.timeout.millis</name><value>1000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobtracker.expire.trackers.interval</name><value>600000</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.block.write.retries</name><value>3</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.task.io.sort.factor</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>ha.health-monitor.sleep-after-disconnect.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.session-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.input.fileinputformat.list-status.num-threads</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>io.skip.checksum.errors</name><value>false</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.scheduler.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.safemode.extension</name><value>30000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobhistory.move.thread-count</name><value>3</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.zk-state-store.parent-path</name><value>/rmstore</value><source>yarn-default.xml</source></property>
<property><name>ipc.client.idlethreshold</name><value>4000</value><source>core-default.xml</source></property>
<property><name>yarn.sharedcache.cleaner.initial-delay-mins</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.accesstime.precision</name><value>3600000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.task.profile.params</name><value>-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.keytab</name><value>/etc/security/keytab/jhs.service.keytab</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.hdfs-blocks-metadata.enabled</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs</name><value>86400</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.fetch.retry.interval-ms</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>hadoop.user.group.static.mapping.overrides</name><value>dr.who=;</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.low-watermark</name><value>0.3f</value><source>core-default.xml</source></property>
<property><name>dfs.datanode.directoryscan.interval</name><value>21600</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3a.connection.ssl.enabled</name><value>true</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.scheduler.monitor.policies</name><value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.output.fileoutputformat.outputdir</name><value>hdfs://hadoopH1:9000/wc/output</value><source>programatically</source></property>
<property><name>ipc.server.listen.queue.size</name><value>128</value><source>core-default.xml</source></property>
<property><name>rpc.metrics.quantile.enable</name><value>false</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name><value>-1</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.persist.jobstatus.dir</name><value>/jobtracker/jobsInfo</value><source>mapred-default.xml</source></property>
<property><name>yarn.client.nodemanager-client-async.thread-pool-max-size</name><value>500</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.system-metrics-publisher.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.name.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/name</value><source>hdfs-site.xml</source></property>
<property><name>yarn.am.liveness-monitor.expiry-interval-ms</name><value>600000</value><source>yarn-default.xml</source></property>
<property><name>yarn.nm.liveness-monitor.expiry-interval-ms</name><value>600000</value><source>yarn-default.xml</source></property>
<property><name>ftp.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>yarn.sharedcache.nested-level</name><value>3</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.max.objects</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.emit-timeline-data</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.map.memory.mb</name><value>1024</value><source>mapred-default.xml</source></property>
<property><name>yarn.client.nodemanager-connect.retry-interval-ms</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.edits.journal-plugin.qjournal</name><value>org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.tasktracker.healthchecker.interval</name><value>60000</value><source>mapred-default.xml</source></property>
<property><name>nfs.wtmax</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.speculative.retry-after-no-speculate</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>hadoop.registry.zk.connection.timeout.ms</name><value>15000</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.address</name><value>${yarn.resourcemanager.hostname}:8032</value><source>yarn-default.xml</source></property>
<property><name>dfs.cachereport.intervalMsec</name><value>10000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.task.skip.start.attempts</name><value>2</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.zk-timeout-ms</name><value>10000</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.edits.dir</name><value>${dfs.namenode.checkpoint.dir}</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.hdfs.configuration.version</name><value>1</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.cleaner.resource-sleep-ms</name><value>0</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.map.skip.maxrecords</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name><value>10737418240</value><source>hdfs-default.xml</source></property>
<property><name>nfs.allow.insecure.ports</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobtracker.system.dir</name><value>${hadoop.tmp.dir}/mapred/system</value><source>mapred-default.xml</source></property>
<property><name>yarn.timeline-service.hostname</name><value>0.0.0.0</value><source>yarn-default.xml</source></property>
<property><name>hadoop.registry.rm.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.reducer.preempt.delay.sec</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.shuffle.ssl.enabled</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>2.1</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.container-manager.thread-count</name><value>20</value><source>yarn-default.xml</source></property>
<property><name>dfs.encrypt.data.transfer</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.block.access.key.update.interval</name><value>600</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.tmp.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/tmp</value><source>core-site.xml</source></property>
<property><name>dfs.namenode.audit.loggers</name><value>default</value><source>hdfs-default.xml</source></property>
<property><name>fs.AbstractFileSystem.har.impl</name><value>org.apache.hadoop.fs.HarFs</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.localizer.cache.target-size-mb</name><value>10240</value><source>yarn-default.xml</source></property>
<property><name>yarn.app.mapreduce.shuffle.log.backups</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>yarn.http.policy</name><value>HTTP_ONLY</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.short.circuit.replica.stale.threshold.ms</name><value>1800000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.timeline-service.webapp.https.address</name><value>${yarn.timeline-service.hostname}:8190</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.amlauncher.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.persist.jobstatus.hours</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>tfile.fs.output.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.checkpoint.check.period</name><value>60</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.dns.interface</name><value>default</value><source>hdfs-default.xml</source></property>
<property><name>fs.ftp.host.port</name><value>21</value><source>core-default.xml</source></property>
<property><name>mapreduce.task.io.sort.mb</name><value>100</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.inotify.max.events.per.rpc</name><value>1000</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.group.name</name><value>cn</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.avoid.read.stale.datanode</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.output.fileoutputformat.compress.type</name><value>RECORD</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.reduce.skip.proc.count.autoincr</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>file.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>mapreduce.job.userlog.retain.hours</name><value>24</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.http.address</name><value>0.0.0.0:50075</value><source>hdfs-default.xml</source></property>
<property><name>dfs.image.compress</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>ha.health-monitor.check-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>dfs.permissions.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.resource-tracker.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.domain.socket.data.traffic</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.image.compression.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.address</name><value>0.0.0.0:50010</value><source>hdfs-default.xml</source></property>
<property><name>dfs.block.access.token.enable</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.input.buffer.percent</name><value>0.0</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.tasktracker.local.dir.minspacestart</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>dfs.blockreport.intervalMsec</name><value>21600000</value><source>hdfs-default.xml</source></property>
<property><name>ha.health-monitor.rpc-timeout.ms</name><value>45000</value><source>core-default.xml</source></property>
<property><name>dfs.client.failover.connection.retries</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.kerberos.internal.spnego.principal</name><value>${dfs.web.authentication.kerberos.principal}</value><source>hdfs-default.xml</source></property>
<property><name>yarn.scheduler.maximum-allocation-mb</name><value>8192</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.leveldb-state-store.path</name><value>${hadoop.tmp.dir}/yarn/system/rmstore</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.task.files.preserve.failedtasks</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.delete.thread-count</name><value>4</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.output.fileoutputformat.compress.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>mapred-default.xml</source></property>
<property><name>map.sort.class</name><value>org.apache.hadoop.util.QuickSort</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.classloader</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>hadoop.registry.zk.retry.ceiling.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobtracker.tasktracker.maxblacklists</name><value>4</value><source>mapred-default.xml</source></property>
<property><name>io.seqfile.compress.blocksize</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>dfs.blocksize</name><value>134217728</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.task.profile.maps</name><value>0-2</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobtracker.staging.root.dir</name><value>${hadoop.tmp.dir}/mapred/staging</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.http.address</name><value>0.0.0.0:50030</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.reduce.class</name><value>cn.hadoop.mr.wc.WCReducer</value><source>programatically</source></property>
<property><name>mapreduce.job.dir</name><value>/tmp/hadoop-yarn/staging/hadoop/.staging/job_1582165983362_0009</value><source>programatically</source></property>
<property><name>dfs.client.mmap.cache.timeout.ms</name><value>3600000</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.security.java.secure.random.algorithm</name><value>SHA1PRNG</value><source>core-default.xml</source></property>
<property><name>fs.client.resolve.remote.symlinks</name><value>true</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.local.dir.minspacekill</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>nfs.mountd.port</name><value>4242</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name><value>0.25</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</name><value>5000</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.resource.du.reserved</name><value>104857600</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.end-notification.retry.interval</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.loadedjobs.cache.size</name><value>5</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.datanode-restart.timeout</name><value>30</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.local-dirs</name><value>${hadoop.tmp.dir}/nm-local-dir</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.block.id.layout.upgrade.threads</name><value>12</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.registry.jaas.context</name><value>Client</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.webapp.address</name><value>${yarn.timeline-service.hostname}:8188</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobhistory.address</name><value>0.0.0.0:10020</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobtracker.persist.jobstatus.active</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>file.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>dfs.datanode.readahead.bytes</name><value>4194304</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.cleaner.period-mins</name><value>1440</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.http-address</name><value>0.0.0.0:50070</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.work.around.non.threadsafe.getpwuid</name><value>false</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.LocalConfigurationProvider</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.hostname</name><value>hadoopH1</value><source>yarn-site.xml</source></property>
<property><name>fs.s3n.multipart.uploads.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.max-component-length</name><value>255</value><source>hdfs-default.xml</source></property>
<property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>20000</value><source>core-default.xml</source></property>
<property><name>ftp.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.parallelcopies</name><value>5</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobhistory.principal</name><value>jhs/_HOST@REALM.TLD</value><source>mapred-default.xml</source></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>true</value><source>core-default.xml</source></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.windows-container.cpu-limit.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.secondary.https-address</name><value>0.0.0.0:50091</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.job.ubertask.maxreduces</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>fs.s3a.connection.establish.timeout</name><value>5000</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.health-checker.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.fs-limits.max-xattr-size</name><value>16384</value><source>hdfs-default.xml</source></property>
<property><name>fs.s3a.multipart.purge</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.num.refill.threads</name><value>2</value><source>core-default.xml</source></property>
<property><name>yarn.timeline-service.store-class</name><value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.shuffle.transfer.buffer.size</name><value>131072</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.zk-num-retries</name><value>1000</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobtracker.jobhistory.task.numberprogresssplits</name><value>12</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.store.in-memory.staleness-period-mins</name><value>10080</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.webapp.address</name><value>${yarn.nodemanager.hostname}:8042</value><source>yarn-default.xml</source></property>
<property><name>yarn.app.mapreduce.client-am.ipc.max-retries</name><value>3</value><source>mapred-default.xml</source></property>
<property><name>ha.failover-controller.new-active.rpc-timeout.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.client.thread-count</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>fs.trash.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>mapreduce.fileoutputcommitter.algorithm.version</name><value>1</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.reduce.skip.maxgroups</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.map.output.value.class</name><value>org.apache.hadoop.io.LongWritable</value><source>programatically</source></property>
<property><name>dfs.namenode.top.windows.minutes</name><value>1,5,25</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.memory.mb</name><value>1024</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.health-checker.script.timeout-ms</name><value>1200000</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.du.reserved</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.resource.check.interval</name><value>5000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.client.progressmonitor.pollinterval</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.hostname</name><value>0.0.0.0</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.ha.enabled</name><value>false</value><source>yarn-default.xml</source></property>
<property><name>dfs.ha.log-roll.period</name><value>120</value><source>hdfs-default.xml</source></property>
<property><name>yarn.scheduler.minimum-allocation-vcores</name><value>1</value><source>yarn-default.xml</source></property>
<property><name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.container.log.limit.kb</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>hadoop.http.authentication.signature.secret.file</name><value>${user.home}/hadoop-http-auth-signature-secret</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.move.interval-ms</name><value>180000</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.container-executor.class</name><value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.authorization</name><value>false</value><source>core-default.xml</source></property>
<property><name>dfs.storage.policy.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>dfs.datanode.https.address</name><value>0.0.0.0:50475</value><source>hdfs-default.xml</source></property>
<property><name>yarn.nodemanager.localizer.address</name><value>${yarn.nodemanager.hostname}:8040</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.jobhistory.recovery.store.fs.uri</name><value>${hadoop.tmp.dir}/mapred/history/recoverystore</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.replication.min</name><value>1</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.shuffle.connection-keep-alive.enable</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>dfs.namenode.top.num.users</name><value>10</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.common.configuration.version</name><value>0.23.0</value><source>core-default.xml</source></property>
<property><name>yarn.app.mapreduce.task.container.log.backups</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>hadoop.security.groups.negative-cache.secs</name><value>30</value><source>core-default.xml</source></property>
<property><name>mapreduce.ifile.readahead</name><value>true</value><source>mapred-default.xml</source></property>
<property><name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name><value>100</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.max.split.locations</name><value>10</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.max.locked.memory</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.registry.zk.quorum</name><value>localhost:2181</value><source>core-default.xml</source></property>
<property><name>fs.s3a.threads.keepalivetime</name><value>60</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.joblist.cache.size</name><value>20000</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.end-notification.max.attempts</name><value>5</value><source>mapred-default.xml</source></property>
<property><name>dfs.image.transfer.timeout</name><value>60000</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.read.shortcircuit.skip.checksum</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>nfs.rtmax</name><value>1048576</value><source>hdfs-default.xml</source></property>
<property><name>dfs.namenode.edit.log.autoroll.check.interval.ms</name><value>300000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.connect.timeout</name><value>180000</value><source>mapred-default.xml</source></property>
<property><name>dfs.datanode.failed.volumes.tolerated</name><value>0</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobhistory.webapp.address</name><value>0.0.0.0:19888</value><source>mapred-default.xml</source></property>
<property><name>fs.s3a.connection.timeout</name><value>50000</value><source>core-default.xml</source></property>
<property><name>dfs.client.mmap.retry.timeout.ms</name><value>300000</value><source>hdfs-default.xml</source></property>
<property><name>yarn.sharedcache.nm.uploader.replication.factor</name><value>10</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.data.dir.perm</name><value>700</value><source>hdfs-default.xml</source></property>
<property><name>hadoop.http.authentication.token.validity</name><value>36000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries.on.timeouts</name><value>45</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.docker-container-executor.exec-name</name><value>/usr/bin/docker</value><source>yarn-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.job.committer.cancel-timeout</name><value>60000</value><source>mapred-default.xml</source></property>
<property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><source>core-default.xml</source></property>
<property><name>mapreduce.reduce.log.level</name><value>INFO</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.reduce.shuffle.merge.percent</name><value>0.66</value><source>mapred-default.xml</source></property>
<property><name>ipc.client.fallback-to-simple-auth-allowed</name><value>false</value><source>core-default.xml</source></property>
<property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value><source>core-default.xml</source></property>
<property><name>fs.s3.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name><value>nobody</value><source>yarn-default.xml</source></property>
<property><name>hadoop.kerberos.kinit.command</name><value>kinit</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.expiry</name><value>43200000</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.fs.state-store.uri</name><value>${hadoop.tmp.dir}/yarn/system/rmstore</value><source>yarn-default.xml</source></property>
<property><name>yarn.admin.acl</name><value>*</value><source>yarn-default.xml</source></property>
<property><name>dfs.namenode.delegation.token.max-lifetime</name><value>604800000</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.reduce.merge.inmem.threshold</name><value>1000</value><source>mapred-default.xml</source></property>
<property><name>net.topology.impl</name><value>org.apache.hadoop.net.NetworkTopology</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.ha.automatic-failover.enabled</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>dfs.heartbeat.interval</name><value>3</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value><source>yarn-default.xml</source></property>
<property><name>io.map.index.skip</name><value>0</value><source>core-default.xml</source></property>
<property><name>dfs.namenode.handler.count</name><value>10</value><source>hdfs-default.xml</source></property>
<property><name>yarn.resourcemanager.webapp.https.address</name><value>${yarn.resourcemanager.hostname}:8090</value><source>yarn-default.xml</source></property>
<property><name>yarn.nodemanager.admin-env</name><value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value><source>yarn-default.xml</source></property>
<property><name>hadoop.security.crypto.cipher.suite</name><value>AES/CTR/NoPadding</value><source>core-default.xml</source></property>
<property><name>mapreduce.task.profile.map.params</name><value>${mapreduce.task.profile.params}</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobtracker.jobhistory.block.size</name><value>3145728</value><source>mapred-default.xml</source></property>
<property><name>hadoop.security.crypto.buffer.size</name><value>8192</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.cluster.acls.enabled</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>yarn.sharedcache.uploader.server.address</name><value>0.0.0.0:8046</value><source>yarn-default.xml</source></property>
<property><name>fs.s3a.threads.max</name><value>256</value><source>core-default.xml</source></property>
<property><name>fs.har.impl.disable.cache</name><value>true</value><source>core-default.xml</source></property>
<property><name>mapreduce.tasktracker.map.tasks.maximum</name><value>2</value><source>mapred-default.xml</source></property>
<property><name>ipc.client.connect.timeout</name><value>20000</value><source>core-default.xml</source></property>
<property><name>yarn.nodemanager.remote-app-log-dir-suffix</name><value>logs</value><source>yarn-default.xml</source></property>
<property><name>fs.df.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>hadoop.util.hash.type</name><value>murmur</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobhistory.minicluster.fixed.ports</name><value>false</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.jobtracker.jobhistory.lru.cache.size</name><value>5</value><source>mapred-default.xml</source></property>
<property><name>yarn.app.mapreduce.shuffle.log.limit.kb</name><value>0</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.failover.max.attempts</name><value>15</value><source>hdfs-default.xml</source></property>
<property><name>dfs.client.use.datanode.hostname</name><value>false</value><source>hdfs-default.xml</source></property>
<property><name>ha.zookeeper.acl</name><value>world:anyone:rwcda</value><source>core-default.xml</source></property>
<property><name>mapreduce.jobtracker.maxtasks.perjob</name><value>-1</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.job.speculative.speculative-cap-running-tasks</name><value>0.1</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.map.sort.spill.percent</name><value>0.80</value><source>mapred-default.xml</source></property>
<property><name>file.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>yarn.resourcemanager.ha.automatic-failover.embedded</name><value>true</value><source>yarn-default.xml</source></property>
<property><name>yarn.resourcemanager.nodemanager.minimum.version</name><value>NONE</value><source>yarn-default.xml</source></property>
<property><name>hadoop.fuse.connection.timeout</name><value>300</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.tasktracker.instrumentation</name><value>org.apache.hadoop.mapred.TaskTrackerMetricsInst</value><source>mapred-default.xml</source></property>
<property><name>io.seqfile.sorter.recordlimit</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>yarn.sharedcache.webapp.address</name><value>0.0.0.0:8788</value><source>yarn-default.xml</source></property>
<property><name>yarn.app.mapreduce.am.resource.mb</name><value>1536</value><source>mapred-default.xml</source></property>
<property><name>mapreduce.framework.name</name><value>yarn</value><source>mapred-site.xml</source></property>
<property><name>mapreduce.job.reduce.slowstart.completedmaps</name><value>0.05</value><source>mapred-default.xml</source></property>
<property><name>yarn.resourcemanager.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property>
<property><name>mapreduce.cluster.temp.dir</name><value>${hadoop.tmp.dir}/mapred/temp</value><source>mapred-default.xml</source></property>
<property><name>dfs.client.mmap.enabled</name><value>true</value><source>hdfs-default.xml</source></property>
<property><name>mapreduce.jobhistory.intermediate-done-dir</name><value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value><source>mapred-default.xml</source></property>
<property><name>fs.s3a.attempts.maximum</name><value>10</value><source>core-default.xml</source></property>
</configuration>
job.xml

二:MapReduce框架 (结合YARN框架)

补充:MapReduce框架知道我们写的map-reduce程序的运行逻辑。我们写的map-reduce中并没有管理层的任务运行分配逻辑,该逻辑被封装在MapReduce框架里面,被封装为MRAppMaster类,该类用于管理整个map-reduce的运行逻辑。(map-reduce程序的管理者)

MRAppMaster由YARN框架启动(动态启动,随机选取)

(一)框架流程图

注:MRAppMaster和yarnChild(包括map task和reduce task)都是动态产生的。

三:本地运行模式

job是在本地执行还是在集群中执行,是由配置文件mapred-site.xml决定的:

<configuration>
        <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
       </property>
</configuration>

当mapreduce.framework.name值为空时(不进行配置),则会提交给本地运行,当mapreduce.framework.name值为yarn时,就会提交给集群中yarn框架进行调度

(一)在windows的eclipse中直接运行main方法,就会将job提交给本地执行器localjobrunner执行(Windows本地配置在上一篇文章中)

1.输入输出数据在本地路径下

WCMapper:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

//4个泛型中,前两个是指定mapper输入数据的类型,后两个是mapper输出结果数据的类型
//map和reduce的数据输入输出都是以<k,v>键值对的形式封装的
//默认情况下,mapper的数据输入中,key是要处理的文本中一行的起始偏移量,value则是该行的内容
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
    @Override    //自定义数据类型在网络编码中不会出现多余数据,提高网络传输效率。提高集群数据通信能力
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context)
            throws IOException, InterruptedException {
        //获取这一行的内容
        String lineval = value.toString();
        
        //将这一行的文本按分隔符进行切分
        String words[] = StringUtils.split(lineval, " ");
        
        //遍历数组,转换为输出模式的<k,v>形式
        for(String word:words) {
            //输出数据写入上下文中
            context.write(new Text(word), new LongWritable(1));
        }
        
        //实际上不是写入一个<k,v>就发送给集群节点,而是将key相同的一类的数据遍历完成后,从缓存中发送出去
        //结果传送形式是<k,<v1,v2,v3,...,vn>>,这里的实际形式是<k,<1,1,1,1,1,....,1>>
    }
}

WCReducer:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values,
            Reducer<Text, LongWritable, Text, LongWritable>.Context context) throws IOException, InterruptedException {
        long count = 0;
        //遍历values,统计每个key出现的总次数
        for(LongWritable value:values) {
            count+=value.get();
        }
        //输出这一个单词的统计结果
        context.write(key, new LongWritable(count));
    }
}

WCRunner:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //System.setProperty("hadoop.home.dir", "E:\\Hadoop\\hadoop-2.7.1");
        
        Configuration conf = new Configuration();
        Job wcjob = Job.getInstance(conf);
        
        //设置整个job所用的那些类在那些jar包中
        wcjob.setJarByClass(WCRunner.class);
        
        //设置job使用的mapper和reducer的类
        wcjob.setMapperClass(WCMapper.class);
        wcjob.setReducerClass(WCReducer.class);
        
        //设置map和reduce的输出类
        wcjob.setOutputKeyClass(Text.class);
        wcjob.setOutputValueClass(LongWritable.class);
        
        //这里可以单独设置map的输出数据k,v类型。如果和上面类型不同,则下面则有用
        wcjob.setMapOutputKeyClass(Text.class);
        wcjob.setMapOutputValueClass(LongWritable.class);
        
        //指定要处理的输入数据存放的路径
        FileInputFormat.setInputPaths(wcjob, new Path("E:/Hadoop/hadoop001/input/"));
        //指定输出结构存放路径
        FileOutputFormat.setOutputPath(wcjob, new Path("E:/Hadoop/hadoop001/output/"));
        
        //将job提交给集群运行
        wcjob.waitForCompletion(true);    //传参是:是否显示程序运行状态及进度
    }
}

这里:输入输出数据都是放在本地E盘下(没有使用hdfs)

2.输入输出数据放在hdfs中(先开启hdfs)

并且数据文件已存在:

注意:权限问题处理:hadoop fs -chmod 777 /wc/

我们这里只需要改变WCRunner文件,修改文件存放位置为hdfs系统:

WCRunner:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //System.setProperty("hadoop.home.dir", "E:\\Hadoop\\hadoop-2.7.1");
        
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://hadoopH1:9000/");
        
        Job wcjob = Job.getInstance(conf);
        
        //设置整个job所用的那些类在那些jar包中
        wcjob.setJarByClass(WCRunner.class);
        
        //设置job使用的mapper和reducer的类
        wcjob.setMapperClass(WCMapper.class);
        wcjob.setReducerClass(WCReducer.class);
        
        //设置map和reduce的输出类
        wcjob.setOutputKeyClass(Text.class);
        wcjob.setOutputValueClass(LongWritable.class);
        
        //这里可以单独设置map的输出数据k,v类型。如果和上面类型不同,则下面则有用
        wcjob.setMapOutputKeyClass(Text.class);
        wcjob.setMapOutputValueClass(LongWritable.class);
        
        //指定要处理的输入数据存放的路径
        FileInputFormat.setInputPaths(wcjob, new Path("hdfs://hadoopH1:9000/wc/input/"));
        //指定输出结构存放路径
        FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://hadoopH1:9000/wc/output/"));
        
        //将job提交给集群运行
        wcjob.waitForCompletion(true);    //传参是:是否显示程序运行状态及进度
    }
}

(二)在linux的eclipse中直接运行main方法,但是不要添加yarn的相关配置,就会提交给localjobrunner执行

1.输入输出数据放在本地路径下

只需要修改文件路径为Linux下文件路径即可:

WCRunner:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //System.setProperty("hadoop.home.dir", "E:\\Hadoop\\hadoop-2.7.1");
        
        Configuration conf = new Configuration();
        
        Job wcjob = Job.getInstance(conf);
        
        //设置整个job所用的那些类在那些jar包中
        wcjob.setJarByClass(WCRunner.class);
        
        //设置job使用的mapper和reducer的类
        wcjob.setMapperClass(WCMapper.class);
        wcjob.setReducerClass(WCReducer.class);
        
        //设置map和reduce的输出类
        wcjob.setOutputKeyClass(Text.class);
        wcjob.setOutputValueClass(LongWritable.class);
        
        //这里可以单独设置map的输出数据k,v类型。如果和上面类型不同,则下面则有用
        wcjob.setMapOutputKeyClass(Text.class);
        wcjob.setMapOutputValueClass(LongWritable.class);
        
        //指定要处理的输入数据存放的路径
        FileInputFormat.setInputPaths(wcjob, new Path("/home/hadoop/workspace/Hadoop/input"));
        //指定输出结构存放路径
        FileOutputFormat.setOutputPath(wcjob, new Path("/home/hadoop/workspace/Hadoop/output"));
        
        //将job提交给集群运行
        wcjob.waitForCompletion(true);    //传参是:是否显示程序运行状态及进度
    }
}

2.输入输出数据放在hdfs中

修改WCRunner文件即可:

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //System.setProperty("hadoop.home.dir", "E:\\Hadoop\\hadoop-2.7.1");
        
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://hadoopH1:9000/");
        
        Job wcjob = Job.getInstance(conf);
        
        //设置整个job所用的那些类在那些jar包中
        wcjob.setJarByClass(WCRunner.class);
        
        //设置job使用的mapper和reducer的类
        wcjob.setMapperClass(WCMapper.class);
        wcjob.setReducerClass(WCReducer.class);
        
        //设置map和reduce的输出类
        wcjob.setOutputKeyClass(Text.class);
        wcjob.setOutputValueClass(LongWritable.class);
        
        //这里可以单独设置map的输出数据k,v类型。如果和上面类型不同,则下面则有用
        wcjob.setMapOutputKeyClass(Text.class);
        wcjob.setMapOutputValueClass(LongWritable.class);
        
        //指定要处理的输入数据存放的路径
        FileInputFormat.setInputPaths(wcjob, new Path("hdfs://hadoopH1:9000/wc/input/"));
        //指定输出结构存放路径
        FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://hadoopH1:9000/wc/output/"));
        
        //将job提交给集群运行
        wcjob.waitForCompletion(true);    //传参是:是否显示程序运行状态及进度
    }
}

四:集群模式运行

记得开启yarn服务:

(一)将工程打成jar包,上传至服务器,然后使用Hadoop命令提交:

hadoop jar wc.jar cn.hadoop.mr.wc.WCRunner

wc.jar是我们导出的jar包,cn.hadoop.mr.wc.WCRunner是我们运行的主类

补充:可以使用web查看各个job集群节点信息

查看任务:

 

注:注意在集群中运行的job才可以在此查看,若是在本地运行则无法查询 

(二)在Linux的eclipse中直接运行main方法,提交给集群中运行

但是必须采取以下措施:

1.在工程src目录下加入mapred-site.xml和yarn-site.xml

job从配置文件中读取mapreduce.framework.name信息,调用集群。读取ResourceNode信息
[hadoop@hadoopH1 hadoop]$ cat mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
       </property>
</configuration>
[hadoop@hadoopH1 hadoop]$ cat yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoopH1</value>
    </property> 
    <!--Yarn打印工作日志-->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
</configuration>
全部配置

2.将工程打成jar包,放入项目根目录下

3.在main方法下添加下面代码:

conf.set("mapreduce.job.jar", "wc.jar");

4.WCRunner文件内容

package cn.hadoop.mr.wc;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WCRunner {

        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
                //System.setProperty("hadoop.home.dir", "E:\\Hadoop\\hadoop-2.7.1");

                Configuration conf = new Configuration();
                conf.set("fs.defaultFS", "hdfs://hadoopH1:9000");
                conf.set("mapreduce.job.jar", "wc.jar");
                Job wcjob = Job.getInstance(conf);

                //设置整个job所用的那些类在那些jar包中
                wcjob.setJarByClass(WCRunner.class);

                //设置job使用的mapper和reducer的类
                wcjob.setMapperClass(WCMapper.class);
                wcjob.setReducerClass(WCReducer.class);

                //设置map和reduce的输出类
                wcjob.setOutputKeyClass(Text.class);
                wcjob.setOutputValueClass(LongWritable.class);

                //这里可以单独设置map的输出数据k,v类型。如果和上面类型不同,则下面则有用
                wcjob.setMapOutputKeyClass(Text.class);
                wcjob.setMapOutputValueClass(LongWritable.class);

                //指定要处理的输入数据存放的路径
                FileInputFormat.setInputPaths(wcjob, new Path("hdfs://hadoopH1:9000/wc/input"));
                //指定输出结构存放路径
                FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://hadoopH1:9000/wc/output"));

                //将job提交给集群运行
                wcjob.waitForCompletion(true);  //传参是:是否显示程序运行状态及进度
        }
}

(三)在windows的eclipse中直接运行main方法,提交给集群中运行(太麻烦,用到再查)

posted @ 2020-02-20 20:14  山上有风景  阅读(1021)  评论(0编辑  收藏  举报