MapReduce使用python脚本作为映射器和简化器使用hadoop-streaming-3.3.6.jar进行故障排除

MapReduce Troubleshoot with python script as mapper and reducer using hadoop-streaming-3.3.6.jar

提问人:Mohamed MOUHNARI 提问时间:10/17/2023 更新时间:10/17/2023 访问量:33

问:

核心站点.xml 配置:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://Master:9000</value>

    </property>

</configuration>

hdfs-site.xml :

<configuration>
    <property>

                <name>dfs.namenode.name.dir</name>
                <value>file:/home/mohamed/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/home/mohamed/datanode/</value>
        </property>
    <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
</configuration>

mapred-site.xml :

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>

            <name>mapreduce.jobhistory.address</name>

            <value>localhost:10020</value>

    </property>

    <property>

           <name>yarn.app.mapreduce.am.env</name>

           <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>

    </property>

    <property>

           <name>mapreduce.map.env</name>

           <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>

    </property>

    <property>

           <name>mapreduce.reduce.env</name>
           <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>

    </property>
</configuration>

yarn-site.xml:

<configuration>

<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>Master</value> 
</property>

<property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
</property>

<property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration> 

在我的 ~/.bashrc 中导出:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

export HADOOP_HOME=/home/mohamed/hadoop-3.3.6export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOME

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binexport PATH=$PATH:/home/mohamed/spark-3.5.0

$HADOOP_HOME=/home/mohamed/hadoop-3.3.6

**mapper.py 脚本:**

#!/usr/bin/env python3

import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
        print("%s\t%d" % (word, 1))

reducer.py 脚本:

!/usr/bin/env python3

import sys
total = 0
lastword = None


for line in sys.stdin:

    line = line.strip()

    word, count = line.split()

    count = int(count)


    if lastword is None:

        lastword = word

    if word == lastword:

        total += count

    else:

        print("%s\t%d occurences" % (lastword, total))

        total = count

        lastword = word

HDFS 和 yarn 在各自的端口 9870 和 8088 上运行良好

**我为我的map reduce作业运行的命令:**

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /MMdata/Overview.txt -output /results -mapper /home/mohamed/mapper.py -reducer /home/mohamed/reducer.py

一旦我运行这个命令,这些日志就会出现在我的map reduce作业中:

2023-10-17 12:04:57,865 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

packageJobJar: [/tmp/hadoop-unjar1033840378945881812/] [] /tmp/streamjob8466353576267893322.jar tmpDir=null

2023-10-17 12:04:59,228 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032

2023-10-17 12:04:59,755 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032

2023-10-17 12:05:00,296 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027

2023-10-17 12:05:00,969 INFO mapred.FileInputFormat: Total input files to process : 1

2023-10-17 12:05:01,204 INFO mapreduce.JobSubmitter: number of splits:2



2023-10-17 12:05:54,790 WARN hdfs.DataStreamer: Slow waitForAckedSeqno took 53218ms (threshold=30000ms). File being written: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027/job.xml, block: BP-1651669171-192.168.162.41-1697114500534:blk_1073755253_14430, Write pipeline datanodes: [DatanodeInfoWithStorage[192.168.144.232:9866,DS-9a5dac38-b0e3-4530-a67c-b52419a0ca9f,DISK], DatanodeInfoWithStorage[192.168.144.92:9866,DS-6837ad2a-8cd2-40cf-94ad-b76aecc76d4d,DISK], DatanodeInfoWithStorage[192.168.144.74:9866,DS-71881df1-f738-449a-bb3a-9fe2bf0f75d1,DISK]].

2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1697530620860_0027

2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Executing with tokens: []

2023-10-17 12:05:55,263 INFO conf.Configuration: found resource resource-types.xml at file:/home/mohamed/hadoop-3.3.6/etc/hadoop/resource-types.xml

2023-10-17 12:05:55,438 INFO impl.YarnClientImpl: Submitted application application_1697530620860_0027

2023-10-17 12:05:55,520 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1697530620860_0027/

2023-10-17 12:05:55,533 INFO mapreduce.Job: Running job: job_1697530620860_0027

2023-10-17 12:06:06,781 INFO mapreduce.Job: Job job_1697530620860_0027 running in uber mode : false

2023-10-17 12:06:06,784 INFO mapreduce.Job:  map 0% reduce 0%

2023-10-17 12:06:25,228 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_0, Status : FAILED

2023-10-17 12:06:25,255 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_0, Status : FAILED

2023-10-17 12:06:33,508 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_1, Status : FAILED

Error: java.lang.RuntimeException: Error in configuring object

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)

    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)

    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)

    at java.base/java.security.AccessController.doPrivileged(Native Method)

    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)

    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)

Caused by: java.lang.reflect.InvocationTargetException

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.base/java.lang.reflect.Method.invoke(Method.java:566)

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)

    ... 9 more

Caused by: java.lang.RuntimeException: Error in configuring object

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)

    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)

    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)

    ... 14 more

Caused by: java.lang.reflect.InvocationTargetException

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.base/java.lang.reflect.Method.invoke(Method.java:566)

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)

    ... 17 more

Caused by: java.lang.RuntimeException: configuration exception

    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)

    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)

    ... 22 more

Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)

    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)

    ... 23 more

Caused by: java.io.IOException: error=2, No such file or directory

    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)

    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)

    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)

    ... 25 more

2023-10-17 12:06:40,636 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_1, Status : FAILED

2023-10-17 12:06:47,750 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_2, Status : FAILED

Error: java.lang.RuntimeException: Error in configuring object

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)

    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)

    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)

    at java.base/java.security.AccessController.doPrivileged(Native Method)

    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)

    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)

Caused by: java.lang.reflect.InvocationTargetException

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.base/java.lang.reflect.Method.invoke(Method.java:566)

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)

    ... 9 more

Caused by: java.lang.RuntimeException: Error in configuring object

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)

    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)

    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)

    ... 14 more

Caused by: java.lang.reflect.InvocationTargetException

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.base/java.lang.reflect.Method.invoke(Method.java:566)

    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)

    ... 17 more

Caused by: java.lang.RuntimeException: configuration exception

    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)

    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)

    ... 22 more

Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)

    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)

    ... 23 more

Caused by: java.io.IOException: error=2, No such file or directory

    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)

    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)

    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)

    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)

    ... 25 more

2023-10-17 12:06:48,789 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_2, Status : FAILED

2023-10-17 12:07:02,022 INFO mapreduce.Job:  map 50% reduce 100%

2023-10-17 12:07:03,050 INFO mapreduce.Job:  map 100% reduce 100%

2023-10-17 12:07:03,093 INFO mapreduce.Job: Job job_1697530620860_0027 failed with state FAILED due to: Task failed task_1697530620860_0027_m_000000

Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0

2023-10-17 12:07:03,233 INFO mapreduce.Job: Counters: 14

    Job Counters 

        Failed map tasks=7

        Killed map tasks=1

        Killed reduce tasks=1

        Launched map tasks=8

        Other local map tasks=6

        Data-local map tasks=2

        Total time spent by all maps in occupied slots (ms)=94672

        Total time spent by all reduces in occupied slots (ms)=0

        Total time spent by all map tasks (ms)=94672

        Total vcore-milliseconds taken by all map tasks=94672

        Total megabyte-milliseconds taken by all map tasks=96944128

    Map-Reduce Framework

        CPU time spent (ms)=0

        Physical memory (bytes) snapshot=0

        Virtual memory (bytes) snapshot=0

2023-10-17 12:07:03,235 ERROR streaming.StreamJob: Job not successful!

Streaming Command Failed!

就像我的目录找不到一样,即使我使用 chmod 777 精确它并提升了对它的所有权限限制。我正在使用 Ubuntu 22.04 和 hadoop-3.3.6。顺便说一句,我对 ChatGPT 进行了研究,但答案是我的文件映射器和化简器的路径可能不正确,但它们是正确的并且存在于 /home/mohamed 中。

请任何帮助。

谢谢大家。

我是hadoop发行版的新用户,我正在研究map和reduce作业的简单示例。但是一旦我执行命令,它就不起作用了。为了让您了解我在这里所做的是 python 脚本中的所有配置以及映射器和化简器。 如果有人可以帮助我解决这个问题,请原谅。

python 字典 bigdata reduce hadoop-streaming

评论


答: 暂无答案