<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    paulwong

    Install hadoop+hbase+nutch+elasticsearch

    This document is for Anyela Chavarro.
    Only these version of each framework work together
    Hadoop 1.2.1
    Hbase 
    0.90.4
    Nutch 
    2.2.1
    Elasticsearch 
    0.19.4
    Linux version : Ubuntu 12.04.2 LTS 

    Hadoop cluster environment:
    Name node/Job tracker
    192.168.1.100 master

    Data node/Task tracker
    192.168.1.101 slave1
    192.168.1.102 slave2
    192.168.1.103 slave3

    Install Hadoop(pseudo-distributed mode)
    1. add user hadoop
      useradd  -s /bin/bash -d /home/hadoop -m hadoop
    2. set password
      passwd hadoop
    3. login as hadoop
      su hadoop
    4. add a data folder
      mkdir data
    5. uninstall openjdk on centos
      [hadoop@netfox ~] rpm -qa | grep java
      [hadoop@netfox ~] java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
      [hadoop@netfox ~] java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
      [hadoop@netfox ~] rpm -e --nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
      [hadoop@netfox ~] rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
    6. install JDK 1.6
      apt-get update
      apt-get install python-software-properties
      add-apt-repository ppa:webupd8team/java
      apt-get update
      apt-get install oracle-java6-installer
    7. get hadoop tar file
    8. untar tar file
      [hadoop@netfox hadoop]$ tar -vxf hadoop-1.2.1.tar.gz
    9. install ssh-server
      apt-get install openssh-server
    10. setup ssh key(ssh-keygen is the built in tool in linux)
      [hadoop@netfox hadoop]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    11. make public key file
      [hadoop@netfox hadoop]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    12. change public key file authoriate mode
      [hadoop@netfox hadoop]$ chmod 600 ~/.ssh/authorized_keys
    13. find the ip of local machine
      [hadoop@netfox hadoop]$ ifconfig
      the ip can be found in this string:
      inet addr:192.168.1.100
    14. add to hosts, this line should be at the first line.
      [hadoop@netfox hadoop]$ vi /etc/hosts
      192.168.1.100 master
    15. add to /etc/profile
      export JAVA_HOME=/usr/lib/jvm/java-6-oracle

      export HADOOP_HOME
      =/home/hadoop/hadoop-1.2.1

      export HBASE_HOME
      =/home/hadoop/hbase-0.90.4

      export PATH
      =$HADOOP_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$PATH
    16. source it
      [hadoop@netfox hadoop]$ source /etc/profile
    17. create folder
      hadoop@netfox:~$ mkdir /home/hadoop/data
    18. edit /home/hadoop/hadoop-1.2.1/conf/hdfs-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

      <!-- Put site-specific property overrides in this file. -->

      <configuration>

      <property>
        
      <name>dfs.replication</name>
        
      <value>1</value>
        
      <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        
      </description>
      </property>

      <property>
       
      <name>dfs.permissions</name>
       
      <value>false</value>
      </property>

      </configuration>
    19. edit /home/hadoop/hadoop-1.2.1/conf/mapred-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

      <!-- Put site-specific property overrides in this file. -->

      <configuration>

      <property>
        
      <name>mapred.job.tracker</name>
        
      <value>master:9002</value>
        
      <description>The host and port that the MapReduce job tracker runs
        at. If "local", then jobs are run in-process as a single map
        and reduce task.
        
      </description>
      </property>


      </configuration>
    20. edit /home/hadoop/hadoop-1.2.1/conf/core-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

      <!-- Put site-specific property overrides in this file. -->

      <configuration>

      <property>
        
      <name>hadoop.tmp.dir</name>
        
      <value>/home/hadoop/data</value>
        
      <description>A base for other temporary directories.</description>
      </property>
       
      <property>
        
      <name>fs.default.name</name>
        
      <value>hdfs://master:9001</value>
        
      <description>The name of the default file system.  A URI whose
        scheme and authority determine the FileSystem implementation.  The
        uri's scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class.  The uri's authority is used to
        determine the host, port, etc. for a filesystem.
      </description>
      </property>


      </configuration>
    21. add to /home/hadoop/hadoop-1.2.1/conf/hadoop-env.sh
      export JAVA_HOME=/usr/lib/jvm/java-6-oracle
    22. add to /home/hadoop/hadoop-1.2.1/conf/slaves and masters
      master
    23. format hdoop namenode
      [hadoop@netfox ~]$ hadoop namenode -format
    24. start hadoop
      [hadoop@netfox hadoop]$ start-all.sh 
    25. check if hdoop install correctly
      [hadoop@netfox hadoop]$ hadoop dfs -ls / 
      for example, it will show the following output without error message.
      Found 4 items
      drwxr-xr-x   - hadoop supergroup          0 2013-08-28 14:02 /chukwa
      drwxr-xr-x   - hadoop supergroup          0 2013-08-29 09:53 /hbase
      drwxr-xr-x   - hadoop supergroup          0 2013-08-27 10:36 /opt
      drwxr-xr-x   - hadoop supergroup          0 2013-09-01 15:22 /tmp

    Install Hadoop(fully-distributed mode)
    repeat step1-23 on slave1-3, but some steps will be different:
    1. changet step 9 as below:
      don't make the public key, just transfer the public key from master to each slave.
      [hadoop@netfox hadoop]$ scp ~/.ssh/id_dsa.pub hadoop@slave1:/home/hadoop
    2. change step 12 as below:
      add to host
      [hadoop@netfox hadoop]$ vi /etc/hosts
      192.168.1.100 master
      192.168.1.101 slave1
      192.168.1.102 slave2
      192.168.1.103 slave3
    3. step 20, add to /home/hadoop/hadoop-1.2.1/conf/masters
      master
      add to /home/hadoop/hadoop-1.2.1/conf/slaves
      slave1
      slave2
      slave3
    4. step 22, start hadoop only on master
      [hadoop@netfox hadoop]$ start-all.sh 


    Install Hbase
    1. get hbase tar file
    2. untar the file
      [hadoop@netfox ~]$ tar -vxf hbase-0.90.4.tar.gz
    3. change /home/hadoop/hbase-0.90.4/conf/hbase-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
      /**
       * Copyright 2010 The Apache Software Foundation
       *
       * Licensed to the Apache Software Foundation (ASF) under one
       * or more contributor license agreements.  See the NOTICE file
       * distributed with this work for additional information
       * regarding copyright ownership.  The ASF licenses this file
       * to you under the Apache License, Version 2.0 (the
       * "License"); you may not use this file except in compliance
       * with the License.  You may obtain a copy of the License at
       *
       *     http://www.apache.org/licenses/LICENSE-2.0
       *
       * Unless required by applicable law or agreed to in writing, software
       * distributed under the License is distributed on an "AS IS" BASIS,
       * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
       * See the License for the specific language governing permissions and
       * limitations under the License.
       */
      -->
      <configuration>

        
      <property>
          
      <name>hbase.rootdir</name>
          
      <value>hdfs://master:9001/hbase</value>
        
      </property>

        
      <property>
          
      <name>hbase.cluster.distributed</name>
          
      <value>true</value>
        
      </property>

        
      <property>
          
      <name>hbase.zookeeper.quorum</name>
          
      <value>localhost</value>
        
      </property>

      </configuration>
    4. change /home/hadoop/hbase-0.90.4/conf/regionservers as below
      master
    5. add JAVA_HOME to /home/hadoop/hbase-0.90.4/conf/hbase-env.sh
      export JAVA_HOME=/usr/lib/jvm/java-6-oracle
    6. replace with the new hadoop jar
      [hadoop@netfox ~]$ rm /home/hadoop/hbase-0.90.4/lib/hadoop-core-0.20-append-r1056497.jar
      [hadoop@netfox ~]$ cp /home/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jar /home/hadoop/hbase-0.90.4/lib
      [hadoop@netfox ~]$ cp /home/hadoop/hadoop-1.2.1/lib/commons-collections-3.2.1.jar /home/hadoop/hbase-0.90.4/lib
      [hadoop@netfox ~]$ cp /home/hadoop/hadoop-1.2.1/lib/commons-configuration-1.6.jar /home/hadoop/hbase-0.90.4/lib
    7. start hbse
      [hadoop@netfox ~]$ start-hbase.sh  
    8. check if hbase install correctly
      [hadoop@netfox ~]$ hbase shell
      HBase Shell
      ; enter 'help<RETURN>' for list of supported commands.
      Type "exit<RETURN>" to leave the HBase Shell
      Version 
      0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011

      hbase(main):
      001:0> list
      TABLE                                          webpage                                         
      1 row(s) in 0.5270 seconds


    Install Nutch
    1. install ant
      [root@netfox ~]# apt-get install ant
    2. switch user and folder
      [root@netfox ~]# su hadoop          
      [hadoop@netfox root]$ cd ~
    3. get nutch tar file
    4. untar this file
      [hadoop@netfox webcrawer]$ tar -vxf apache-nutch-2.2.1-src.tar.gz
    5. add to /etc/profile
      export NUTCH_HOME=/home/hadoop/webcrawer/apache-nutch-2.2.1
      export PATH=$NUTCH_HOME/runtime/deploy/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$PATH
    6. change /home/hadoop/webcrawer/apache-nutch-2.2.1/conf/hbase-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
      /**
       * Copyright 2009 The Apache Software Foundation
       *
       * Licensed to the Apache Software Foundation (ASF) under one
       * or more contributor license agreements.  See the NOTICE file
       * distributed with this work for additional information
       * regarding copyright ownership.  The ASF licenses this file
       * to you under the Apache License, Version 2.0 (the
       * "License"); you may not use this file except in compliance
       * with the License.  You may obtain a copy of the License at
       *
       *     http://www.apache.org/licenses/LICENSE-2.0
       *
       * Unless required by applicable law or agreed to in writing, software
       * distributed under the License is distributed on an "AS IS" BASIS,
       * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
       * See the License for the specific language governing permissions and
       * limitations under the License.
       */
      -->
      <configuration>

        
      <property>
          
      <name>hbase.rootdir</name>
          
      <value>hdfs://master:9001/hbase</value>
        
      </property>

        
      <property>
          
      <name>hbase.cluster.distributed</name>
          
      <value>true</value>
        
      </property>

        
      <property>
          
      <name>hbase.zookeeper.quorum</name>
          
      <value>localhost</value>
        
      </property>

      </configuration>
    7. change /home/hadoop/webcrawer/apache-nutch-2.2.1/conf/nutch-site.xml as below
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

      <!-- Put site-specific property overrides in this file. -->

      <configuration>

          
      <property>
              
      <name>storage.data.store.class</name>
              
      <value>org.apache.gora.hbase.store.HBaseStore</value>
              
      <description>Default class for storing data</description>
          
      </property>
          
      <property>
              
      <name>http.agent.name</name>
              
      <value>NutchCrawler</value>
          
      </property>
          
      <property>
              
      <name>http.robots.agents</name>
              
      <value>NutchCrawler,*</value>
          
      </property>

      </configuration>
    8. Uncomment the following in the /home/hadoop/webcrawer/apache-nutch-2.2.1/ivy/ivy.xml file   
      <dependency org="org.apache.gora" name="gora-hbase" rev="0.2"
      conf
      ="*->default" />
    9. add to /home/hadoop/webcrawer/apache-nutch-2.2.1/conf/gora.properties file
      gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
    10. go to nutch installation folder(/home/hadoop/webcrawer/apache-nutch-2.2.1) and run
      ant clean
      ant runtime
    11. Create a directory in HDFS to upload the seed urls.
      [hadoop@netfox ~]$ hadoop dfs -mkdir urls
    12. Create a text file with the seed URLs for the crawl. Upload the seed URLs file to the directory created in the above step
      [hadoop@netfox ~]$ hadoop dfs -put seed.txt urls
    13. Issue the following command from inside the copied deploy directory in the
      JobTracker node to inject the seed URLs to the Nutch database and to generate the
      initial fetch list(-topN <N>  - number of top URLs to be selected, default is Long.MAX_VALUE )
      [hadoop@netfox ~]$ nutch inject urls
      [hadoop@netfox ~]$ nutch generate  -topN 3
    14. Issue the following commands from inside the copied deploy directory in the
      JobTracker node
      [hadoop@netfox ~]$ nutch fetch -all
      [hadoop@netfox ~]$ nutch parse -all
      [hadoop@netfox ~]$ nutch updatedb
      [hadoop@netfox ~]$ nutch generate -topN 10



    Install ElasticSearch
    1. get the tar file
    2. untar file
      [hadoop@netfox ~]$ tar -vxf elasticsearch-0.19.4.tar.gz
    3. add to /etc/profile
      export ELAST_HOME=/home/hadoop/webcrawer/elasticsearch-0.19.4

      export PATH=$ELAST_HOME/bin:$NUTCH_HOME/runtime/deploy/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$PATH
    4. Go to the extracted ElasticSearch directory and execute the following command to
      start the ElasticSearch server in the foreground
      > bin/elasticsearch -f
    5. Go to the $NUTCH_HOME/runtime/deploy (or $NUTCH_HOME/runtime/local
      in case you are running Nutch in the local mode) directory. Execute the following
      command to index the data crawled by Nutch in to the ElasticSearch server.  
      > bin/nutch elasticindex elasticsearch -all
    6. install curl 
      [hadoop@netfox ~]$ sudo apt-get install curl
    7. check if elasticsearch installation correct
      [hadoop@netfox ~]$ curl master:9200
    8. check query 
      [hadoop@netfox ~]$ curl -XGET 'http://master:9200/_search?q=hadoop'

    posted on 2013-08-31 01:17 paulwong 閱讀(6305) 評(píng)論(3)  編輯  收藏 所屬分類(lèi): 分布式HADOOP云計(jì)算分布式搜索

    Feedback

    # re: Install hadoop+hbase+nutch+elasticsearch 2013-09-23 14:19 ap

    nutch2.2.1默認(rèn)支持hbase0.90.4 和 elasticsearch0.19.4 , 能否將其支持elasticsearch0.90.x以上版本呢(嘗試使用elasticsearch0.90.x.jar包替換nutch2.2.1 lib目錄下elasticsearch0.19.4.jar,但nutch elasticindex時(shí)報(bào)錯(cuò))?
    Nutch1.7 默認(rèn)是支持elasticsearch0.90.1的。
      回復(fù)  更多評(píng)論   

    # re: Install hadoop+hbase+nutch+elasticsearch 2013-09-24 18:27 paulwong

    @ap
    我試過(guò)換0.90以上的版本不行的。
    Nutch1.7 不整合HBASE,就不試了  回復(fù)  更多評(píng)論   

    # re: Install hadoop+hbase+nutch+elasticsearch 2013-09-25 15:34 ap

    @paulwong
    我在Nutch1.7 的lib目錄下確實(shí)是沒(méi)找到HBASE的jar包,要是整合就好了。謝謝。
      回復(fù)  更多評(píng)論   


    主站蜘蛛池模板: 亚洲丁香色婷婷综合欲色啪| 亚洲精品白色在线发布| 日韩久久无码免费毛片软件| 国产大片51精品免费观看| 亚洲国产精品美女久久久久| 在线观看视频免费国语| 亚洲精品国产精品| 日韩成人在线免费视频| 亚洲AV无码一区二区一二区| 日韩特黄特色大片免费视频| 国产区图片区小说区亚洲区| 伊人久久亚洲综合影院| 一进一出60分钟免费视频| 成全视频免费高清 | av无码久久久久不卡免费网站| 久久精品国产亚洲AV无码麻豆| 91香蕉国产线观看免费全集| 亚洲欧洲综合在线| 一二三四在线播放免费观看中文版视频 | 麻豆视频免费观看| ww亚洲ww在线观看国产| 女人毛片a级大学毛片免费| 亚洲av永久无码天堂网| 日批日出水久久亚洲精品tv| a一级毛片免费高清在线| 免费一看一级毛片| 一级毛片免费在线观看网站| 亚洲色欲色欲www在线丝| 美女视频黄a视频全免费网站色窝 美女被cao网站免费看在线看 | 亚洲精品无码高潮喷水A片软| 最近2019年免费中文字幕高清| 亚洲精品国产第1页| 搡女人免费视频大全| 色偷偷亚洲男人天堂| 亚洲中文字幕成人在线| 久草福利资源网站免费| 亚洲成在人线中文字幕| 四虎成人免费观看在线网址| 九九视频高清视频免费观看| 亚洲AV永久无码精品| 日韩国产免费一区二区三区|