提问人:Mohamed MOUHNARI 提问时间:10/17/2023 更新时间:10/17/2023 访问量:33
MapReduce使用python脚本作为映射器和简化器使用hadoop-streaming-3.3.6.jar进行故障排除
MapReduce Troubleshoot with python script as mapper and reducer using hadoop-streaming-3.3.6.jar
问:
核心站点.xml 配置:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
</configuration>
hdfs-site.xml :
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/mohamed/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/mohamed/datanode/</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
在我的 ~/.bashrc 中导出:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_HOME=/home/mohamed/hadoop-3.3.6export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binexport PATH=$PATH:/home/mohamed/spark-3.5.0
$HADOOP_HOME=/home/mohamed/hadoop-3.3.6
**mapper.py 脚本:**
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print("%s\t%d" % (word, 1))
reducer.py 脚本:
!/usr/bin/env python3
import sys
total = 0
lastword = None
for line in sys.stdin:
line = line.strip()
word, count = line.split()
count = int(count)
if lastword is None:
lastword = word
if word == lastword:
total += count
else:
print("%s\t%d occurences" % (lastword, total))
total = count
lastword = word
HDFS 和 yarn 在各自的端口 9870 和 8088 上运行良好
**我为我的map reduce作业运行的命令:**
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /MMdata/Overview.txt -output /results -mapper /home/mohamed/mapper.py -reducer /home/mohamed/reducer.py
一旦我运行这个命令,这些日志就会出现在我的map reduce作业中:
2023-10-17 12:04:57,865 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/tmp/hadoop-unjar1033840378945881812/] [] /tmp/streamjob8466353576267893322.jar tmpDir=null
2023-10-17 12:04:59,228 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032
2023-10-17 12:04:59,755 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032
2023-10-17 12:05:00,296 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027
2023-10-17 12:05:00,969 INFO mapred.FileInputFormat: Total input files to process : 1
2023-10-17 12:05:01,204 INFO mapreduce.JobSubmitter: number of splits:2
2023-10-17 12:05:54,790 WARN hdfs.DataStreamer: Slow waitForAckedSeqno took 53218ms (threshold=30000ms). File being written: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027/job.xml, block: BP-1651669171-192.168.162.41-1697114500534:blk_1073755253_14430, Write pipeline datanodes: [DatanodeInfoWithStorage[192.168.144.232:9866,DS-9a5dac38-b0e3-4530-a67c-b52419a0ca9f,DISK], DatanodeInfoWithStorage[192.168.144.92:9866,DS-6837ad2a-8cd2-40cf-94ad-b76aecc76d4d,DISK], DatanodeInfoWithStorage[192.168.144.74:9866,DS-71881df1-f738-449a-bb3a-9fe2bf0f75d1,DISK]].
2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1697530620860_0027
2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-10-17 12:05:55,263 INFO conf.Configuration: found resource resource-types.xml at file:/home/mohamed/hadoop-3.3.6/etc/hadoop/resource-types.xml
2023-10-17 12:05:55,438 INFO impl.YarnClientImpl: Submitted application application_1697530620860_0027
2023-10-17 12:05:55,520 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1697530620860_0027/
2023-10-17 12:05:55,533 INFO mapreduce.Job: Running job: job_1697530620860_0027
2023-10-17 12:06:06,781 INFO mapreduce.Job: Job job_1697530620860_0027 running in uber mode : false
2023-10-17 12:06:06,784 INFO mapreduce.Job: map 0% reduce 0%
2023-10-17 12:06:25,228 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_0, Status : FAILED
2023-10-17 12:06:25,255 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_0, Status : FAILED
2023-10-17 12:06:33,508 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 25 more
2023-10-17 12:06:40,636 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_1, Status : FAILED
2023-10-17 12:06:47,750 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 25 more
2023-10-17 12:06:48,789 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_2, Status : FAILED
2023-10-17 12:07:02,022 INFO mapreduce.Job: map 50% reduce 100%
2023-10-17 12:07:03,050 INFO mapreduce.Job: map 100% reduce 100%
2023-10-17 12:07:03,093 INFO mapreduce.Job: Job job_1697530620860_0027 failed with state FAILED due to: Task failed task_1697530620860_0027_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
2023-10-17 12:07:03,233 INFO mapreduce.Job: Counters: 14
Job Counters
Failed map tasks=7
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=94672
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=94672
Total vcore-milliseconds taken by all map tasks=94672
Total megabyte-milliseconds taken by all map tasks=96944128
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
2023-10-17 12:07:03,235 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
就像我的目录找不到一样,即使我使用 chmod 777 精确它并提升了对它的所有权限限制。我正在使用 Ubuntu 22.04 和 hadoop-3.3.6。顺便说一句,我对 ChatGPT 进行了研究,但答案是我的文件映射器和化简器的路径可能不正确,但它们是正确的并且存在于 /home/mohamed 中。
请任何帮助。
谢谢大家。
我是hadoop发行版的新用户,我正在研究map和reduce作业的简单示例。但是一旦我执行命令,它就不起作用了。为了让您了解我在这里所做的是 python 脚本中的所有配置以及映射器和化简器。 如果有人可以帮助我解决这个问题,请原谅。
答: 暂无答案
评论