<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    paulwong

    Submitting a Hadoop MapReduce job to a remote JobTracker

    While messing around with MapReduce code, I’ve found it to be a bit tedious having to generate the jarfile, copy it to the machine running the JobTracker, and then run the job every time the job has been altered. I should be able to run my jobs directly from my development environment, as illustrated in the figure below. This post explains how I’ve “solved” this problem. This may also help when integrating Hadoop with other applications. I do by no means claim that this is the proper way to do it, but it does the trick for me.

    My Hadoop infrastructure


    I assume that you have a (single-node) Hadoop 1.0.3 cluster properly installed on a dedicated or virtual machine. In this example, the JobTracker and HDFS resides on IP address 192.168.102.131.Let’s start out with a simple job that does nothing except to start up and terminate:

    package com.pcbje.hadoopjobs;

    import java.io.IOException;
    import java.util.Date;
    import java.util.Iterator;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapred.Reducer;

    public class MyFirstJob {
    public static void main(String[] args) throws Exception {
    Configuration config
    = new Configuration();

    JobConf job
    = new JobConf(config);
    job.setJarByClass(MyFirstJob.
    class);
    job.setJobName(
    "My first job");

    FileInputFormat.setInputPaths(job,
    new Path(args[0));
    FileOutputFormat.setOutputPath(job,
    new Path(args[1]));

    job.setMapperClass(MyFirstJob.MyFirstMapper.
    class);
    job.setReducerClass(MyFirstJob.MyFirstReducer.
    class);

    JobClient.runJob(job);
    }


    private static class MyFirstMapper extends MapReduceBase implements Mapper {
    public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {

    }

    }


    private static class MyFirstReducer extends MapReduceBase implements Reducer {
    public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

    }

    }

    }


    Now, most of the examples you find online typically shows you a local mode setup where all the components of Hadoop (HDFS, JobTracker, etc) run on the same machine. A typical mapred-site.xml configuration might look like:

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    </property>
    </configuration>

    As far as I can tell, such a configuration requires that jobs are submitted from the same node as the JobTracker. This is what I want to avoid. The first thing to do is to change the fs.default.name attribute to the IP address of my NameNode.

    Configuration conf = new Configuration();
    conf.set(
    "fs.default.name", "192.168.102.131:9000");

    And in core-site.xml:

    <configuration>
       
    <property>
           
    <name>fs.default.name</name>
           
    <value>192.168.102.131:9000</value>
       
    </property>
    </configuration>

    This tells the job to connect to the HDFS residing on a different machine. Running the job with this configuration will read from and write to the remote HDFS correctly, but the JobTracker at 192.168.102.131:9001 will not notice it. This means that the admin panel at 192.168.102.131:50030 wont list the job either. So the next thing to do is to tell the job configuration to submit the job to the appropriate JobTracker like this:

    config.set("mapred.job.tracker", "192.168.102.131:9001");

    You also need to change mapred-site.xml to allow external connections, this can be done by replacing “localhost” with the JobTracker’s IP address:
    <configuration>
       
    <property>
           
    <name>mapred.job.tracker</name>
           
    <value>192.168.102.131:9001</value>
       
    </property>
    </configuration>

    Restart hadoop.Upon trying to run your job, you may get an exception like this:
    SEVERE: PriviledgedActionException as:[user] cause:org.apache.hadoop.security.AccessControlException:
    org.apache.hadoop.security.AccessControlException: Permission denied: user=[user], access=WRITE, inode="mapred":root:supergroup:rwxr-xr-x
    
    If you do, this may be solved by adding the following mapred-site.xml:
    <configuration>
       
    <property>
           
    <name>mapreduce.jobtracker.staging.root.dir</name>
           
    <value>/user</value>
       
    </property>
    </configuration>

    And then execute the following commands:
    stop-mapred.sh
    start-mapred.sh
    
    When you now submit your job, it should be picked up by the admin page over at :50030. However, it will most probably fail and the log will be telling you something like:
    java.lang.ClassNotFoundException: com.pcbje.hadoopjobs.MyFirstJob$MyFirstMapper
    
    In order to fix this, you have to ensure that all dependencies of the submitted job are available to the JobTracker. This can be achieved by exporting the project in as a runnable jar, and then execute something like: 
    java -jar myfirstjob-jar-with-dependencies.jar /input/path /output/path
    
    If your user has the appropriate permissions to the input and out directory on HDFS, the job should now run successfully. This can be verified in the console and on the administration panel.

    Manually exporting runnable jars requires a lot of clicks in IDEs such as Eclipse. If you are using Maven, you can tell it to build the jar with its dependencies (See this answer for details). This would make the process a whole lot easier.Finally, to make it even easier, place a tiny bash-script in the same folder as pom.xml for building the maven project and executing the jar:
    #!/bin/sh
    mvn assembly:assembly
    java -jar $1 $2 $3
    
    After making the script executable, you can build and submit the job with the following command:
    ./build-and-run-job target/myfirstjob-jar-with-dependencies.jar /input/path 

    posted on 2012-10-03 15:06 paulwong 閱讀(770) 評論(0)  編輯  收藏 所屬分類: HADOOP云計算HBASE

    主站蜘蛛池模板: 亚洲福利电影在线观看| 99re在线免费视频| 亚洲色中文字幕在线播放| 国产aⅴ无码专区亚洲av| 免费国产a国产片高清网站| 国产四虎免费精品视频| 久久er国产精品免费观看2| 一区免费在线观看| 黄色毛片视频免费| 亚洲欧美自偷自拍另类视| 亚洲国产成人精品无码区在线秒播| 亚洲精品无码久久不卡| 四虎在线播放免费永久视频| aa级一级天堂片免费观看| 久久免费区一区二区三波多野| 一级人做人爰a全过程免费视频| 亚洲国产高清国产拍精品| 亚洲人精品亚洲人成在线| 亚洲乱码卡一卡二卡三| 久久精品国产亚洲AV无码麻豆| 好看的电影网站亚洲一区| 亚洲午夜国产精品无码 | 亚洲精品私拍国产福利在线| 亚洲无码视频在线| 亚洲国产精品国产自在在线| 亚洲 综合 国产 欧洲 丝袜| 国产成人一区二区三区免费视频| 色吊丝永久在线观看最新免费| 免费爱爱的视频太爽了 | 日本视频免费观看| 美女露100%胸无遮挡免费观看| 美女视频黄a视频全免费网站一区| 亚洲欧美日韩中文字幕一区二区三区 | 国产亚洲人成A在线V网站| 亚洲午夜福利精品无码| 国产成人综合亚洲亚洲国产第一页| 久久久久久亚洲精品不卡| 亚洲一区二区三区无码中文字幕| 亚洲成AV人片在线播放无码| 久久精品亚洲中文字幕无码麻豆| 亚洲男女一区二区三区|