cdh上的hadoopshell操作及wordcount测试
1、CDH页面装完hadoop后,执行报错
[root@node1 bin]# hadoop fs -ls
/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/bin/../lib/hadoop/bin/hadoop: line 144: /usr/java/jdk1.7.0_67-clouderaexport/bin/java: No such file or directory
/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/bin/../lib/hadoop/bin/hadoop: line 144: exec: /usr/java/jdk1.7.0_67-clouderaexport/bin/java: cannot execute: No such file or directory
原因:CDH上hdfs安装完后还是需要该对应配置文件和profile的(增加HADOOP_home等),因该机器原来装过一个hadoop,hadoop_home没作更改,所以报错
修改/etc/profile后ok,修改如下:
#java
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
#export JAVA_HOME=/usr/local/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.7.0
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5
2、常见操作:
#java
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
#export JAVA_HOME=/usr/local/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#export HADOOP_HOME=/usr/local/hadoop-2.6.0-cdh5.7.0
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5
3、使用wordcount 统计功能
hdfs上上传一个文件:文件内容如下:
[root@node1 examples]# more /home/test/test1.sh
#!/bin/bash
#edate=$(chage -l $USER|grep "Password expires" |awk '{print $4,$5,$6,$7}')
edate=$(chage -l test|grep "Password expires" |awk '{print $4,$5,$6,$7}')
date3=$(date -d "+3 day"|awk '{print $2,$3,$6}')
if [[ $edate = "never" ]]; then
echo "never expired"
elif [[ $date3 = $edate ]]; then
echo "3 days"
else
echo "unexpired"
fi
上传文件:
[root@node1 test]# hadoop fs -put /home/test/test1.sh /tmp
查看:
[root@node1 test]# hadoop fs -ls /tmp
Found 2 items
drwxrwxrwx - hdfs supergroup 0 2019-12-24 16:35 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r-- 3 root supergroup 346 2019-12-24 16:35 /tmp/test1.sh
[root@node1 test]#
执行wordcount的mapreduce 需在 hdfs用户下,否则创建输出路径时报错:
hadoop jar /opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/share/doc/hadoop-0.20-mapreduce/examples/hadoop-examples-2.6.0-mr1-cdh5.10.2.jar wordcount /tmp/test1.sh /output1
输入日志如下:
19/12/24 16:45:15 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/12/24 16:45:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/12/24 16:45:15 INFO input.FileInputFormat: Total input paths to process : 1
19/12/24 16:45:15 INFO mapreduce.JobSubmitter: number of splits:1
19/12/24 16:45:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local704420616_0001
19/12/24 16:45:16 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/12/24 16:45:16 INFO mapreduce.Job: Running job: job_local704420616_0001
19/12/24 16:45:16 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/12/24 16:45:16 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/12/24 16:45:16 INFO mapred.LocalJobRunner: Waiting for map tasks
19/12/24 16:45:16 INFO mapred.LocalJobRunner: Starting task: attempt_local704420616_0001_m_000000_0
19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/12/24 16:45:16 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/12/24 16:45:16 INFO mapred.MapTask: Processing split: hdfs://node1:8020/tmp/test1.sh:0+346
19/12/24 16:45:16 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/12/24 16:45:16 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/12/24 16:45:16 INFO mapred.MapTask: soft limit at 83886080
19/12/24 16:45:16 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/12/24 16:45:16 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/12/24 16:45:16 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/12/24 16:45:16 INFO mapred.LocalJobRunner:
19/12/24 16:45:16 INFO mapred.MapTask: Starting flush of map output
19/12/24 16:45:16 INFO mapred.MapTask: Spilling map output
19/12/24 16:45:16 INFO mapred.MapTask: bufstart = 0; bufend = 524; bufvoid = 104857600
19/12/24 16:45:16 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
19/12/24 16:45:16 INFO mapred.MapTask: Finished spill 0
19/12/24 16:45:16 INFO mapred.Task: Task:attempt_local704420616_0001_m_000000_0 is done. And is in the process of committing
19/12/24 16:45:16 INFO mapred.LocalJobRunner: map
19/12/24 16:45:16 INFO mapred.Task: Task 'attempt_local704420616_0001_m_000000_0' done.
19/12/24 16:45:16 INFO mapred.LocalJobRunner: Finishing task: attempt_local704420616_0001_m_000000_0
19/12/24 16:45:16 INFO mapred.LocalJobRunner: map task executor complete.
19/12/24 16:45:16 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/12/24 16:45:16 INFO mapred.LocalJobRunner: Starting task: attempt_local704420616_0001_r_000000_0
19/12/24 16:45:16 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/12/24 16:45:16 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/12/24 16:45:16 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@720cd60d
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=175793760, maxSingleShuffleLimit=43948440, mergeThreshold=116023888, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/12/24 16:45:16 INFO reduce.EventFetcher: attempt_local704420616_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/12/24 16:45:16 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local704420616_0001_m_000000_0 decomp: 447 len: 451 to MEMORY
19/12/24 16:45:16 INFO reduce.InMemoryMapOutput: Read 447 bytes from map-output for attempt_local704420616_0001_m_000000_0
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 447, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->447
19/12/24 16:45:16 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/12/24 16:45:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
19/12/24 16:45:16 INFO mapred.Merger: Merging 1 sorted segments
19/12/24 16:45:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 441 bytes
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merged 1 segments, 447 bytes to disk to satisfy reduce memory limit
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merging 1 files, 451 bytes from disk
19/12/24 16:45:16 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/12/24 16:45:16 INFO mapred.Merger: Merging 1 sorted segments
19/12/24 16:45:16 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 441 bytes
19/12/24 16:45:16 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/12/24 16:45:17 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
19/12/24 16:45:17 INFO mapred.Task: Task:attempt_local704420616_0001_r_000000_0 is done. And is in the process of committing
19/12/24 16:45:17 INFO mapred.LocalJobRunner: 1 / 1 copied.
19/12/24 16:45:17 INFO mapred.Task: Task attempt_local704420616_0001_r_000000_0 is allowed to commit now
19/12/24 16:45:17 INFO output.FileOutputCommitter: Saved output of task 'attempt_local704420616_0001_r_000000_0' to hdfs://node1:8020/output1/_temporary/0/task_local704420616_0001_r_000000
19/12/24 16:45:17 INFO mapred.LocalJobRunner: reduce > reduce
19/12/24 16:45:17 INFO mapred.Task: Task 'attempt_local704420616_0001_r_000000_0' done.
19/12/24 16:45:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local704420616_0001_r_000000_0
19/12/24 16:45:17 INFO mapred.LocalJobRunner: reduce task executor complete.
19/12/24 16:45:17 INFO mapreduce.Job: Job job_local704420616_0001 running in uber mode : false
19/12/24 16:45:17 INFO mapreduce.Job: map 100% reduce 100%
19/12/24 16:45:17 INFO mapreduce.Job: Job job_local704420616_0001 completed successfully
19/12/24 16:45:17 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=553634
FILE: Number of bytes written=1137145
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=692
HDFS: Number of bytes written=313
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=11
Map output records=47
Map output bytes=524
Map output materialized bytes=451
Input split bytes=95
Combine input records=47
Combine output records=33
Reduce input groups=33
Reduce shuffle bytes=451
Reduce input records=33
Reduce output records=33
Spilled Records=66
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=16
Total committed heap usage (bytes)=504365056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=346
File Output Format Counters
Bytes Written=313
查看output1目录下的统计:
[hdfs@node1 examples]$ hadoop fs -cat /output1/*
"+3 1
"3 1
"Password 2
"never 1
"never" 1
"unexpired" 1
#!/bin/bash 1
#edate=$(chage 1
$2,$3,$6}') 1
$4,$5,$6,$7}') 2
$USER|grep 1
$date3 1
$edate 2
'{print 3
-d 1
-l 2
= 2
[[ 2
]]; 2
date3=$(date 1
day"|awk 1
days" 1
echo 3
edate=$(chage 1
elif 1
else 1
expired" 1
expires" 2
fi 1
if 1
test|grep 1
then 2
|awk 2
完成ok~

浙公网安备 33010602011771号