<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    paulwong

    How to install Hadoop cluster(2 node cluster) and Hbase on Vmware Workstation. It also includes installing Pig and Hive in the appendix

    By Tzu-Cheng Chuang 1-28-2011


    Requires: Ubuntu10.04, hadoop0.20.2, zookeeper 3.3.2 HBase0.90.0
    1. Download Ubuntu 10.04 desktop 32 bit from Ubuntu website.

    2. Install Ubuntu 10.04 with username: hadoop, password: password,  disk size: 20GB, memory: 2048MB, 1 processor, 2 cores

    3. Install build-essential (for GNU C, C++ compiler)    $ sudo apt-get install build-essential

    4. Install sun-jave-6-jdk
        (1) Add the Canonical Partner Repository to your apt repositories
        $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
         (2) Update the source list
        $ sudo apt-get update
         (3) Install sun-java-6-jdk and make sure Sun’s java is the default jvm
        $ sudo apt-get install sun-java6-jdk
         (4) Set environment variable by modifying ~/.bashrc file, put the following two lines in the end of the file
        export JAVA_HOME=/usr/lib/jvm/java-6-sun
        export PATH=$PATH:$JAVA_HOME/bin 

    5. Configure SSH server so that ssh to localhost doesn’t need a passphrase
        (1) Install openssh server
        $ sudo apt-get install openssh-server
         (2) Generate RSA pair key
        $ ssh-keygen –t ras –P ""
         (3) Enable SSH access to local machine
        $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    6. Disable IPv6 by      modifying  /etc/sysctl.conf file, put the following two lines in the end of the file
    #disable
    ipv6 net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    net.ipv6.conf.lo.disable_ipv6 = 1

    7. Install hadoop
        (1) Download hadoop-0.20.2.tar.gz(stable release on 1/25/2011)  from Apache hadoop website   
        (2) Extract hadoop archive file to /usr/local/   
        (3) Make symbolic link   
        (4) Modify /usr/local/hadoop/conf/hadoop-env.sh   
    Change from # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun To # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun
         (5)Create /usr/local/hadoop-datastore folder   
    $ sudo mkdir /usr/local/hadoop-datastore
    $ sudo chown hadoop:hadoop /usr/local/hadoop-datastore
    $ sudo chmod 750 /usr/local/hadoop-datastore
         (6)Put the following code in /usr/local/hadoop/conf/core-site.xml   
    hadoop.tmp.dir/usr/local/hadoop/tmp/dir/hadoop-${user.name}A base for other temporary directories.fs.default.namehdfs://master:54310The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
        (7) Put the following code in /usr/local/hadoop/conf/mapred-site.xml   
    mapred.job.trackermaster:54311The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
         (8) Put the following code in /usr/local/hadoop/conf/hdfs-site.xml   
    dfs.replication1Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
         (9) Add hadoop to environment variable by modifying ~/.bashrc   
    export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$PATH

    8. Restart Ubuntu Linux

    9. Copy this virtual machine to another folder. At least we have 2 copies of Ubuntu linux

    10. Modify /etc/hosts on both Linux Virtual Image machines, add in the following lines in the file. The IP address depends on each machine. We can use (ifconfig) to find out IP address.
    # /etc/hosts (for master AND slave) 192.168.0.1 master 192.168.0.2 slave     Modify the following line, because it might cause Hbase to find out wrong ip.   
    192.168.0.1 ubuntu

    11. Check hadoop user access on both machines.
    The hadoop user on the master (aka hadoop@master) must be able to connect a) to its own user account on the master – i.e. ssh master in this context and not necessarily ssh localhost – and b) to the hadoop user account on the slave (aka hadoop@slave)  via a password-less SSH login. On both machines, make sure each one can connect to master, slave without typing passwords.

    12. Cluster configuration
        (1) Modify /usr/local/hadoop/conf/masters
             only on master machine    master
         (2) Modify /usr/local/hadoop/conf/slaves
              only on master machine    master slave
         (3) Change “localhost” to “master” in /usr/local/conf/hadoop/conf/core-site.xml and /usr/local/hadoop/conf/mapred-site.xml
            only on master machine   
        (4) Change dfs.replication to “1” in /usr/local/conf/hadoop/conf/hdfs-site.xml
        only on master machine   

    13. Format the namenode only once and only on master machine
    $ /usr/local/hadoop/bin/hadoop namenode –format

    14. Later on, start the multi-node cluster by typing following code only on master. So far, please don’t start hadoop yet.
    $ /usr/local/hadoop/bin/start-dfs.sh $ /usr/local/hadoop/bin/start-mapred.sh

    15. Install zookeeper only on master node
        (1) download zookeeper-3.3.2.tar.gz from Apache hadoop website   
        (2) Extract  zookeeper-3.3.2.tar.gz    $ tar –xzf zookeeper-3-3.2.tar.gz
         (3) Move folder zookeeper-3.3.2 to /home/hadoop/ and create a symbloink link
        $ mv zookeeper-3.3.2 /home/hadoop/ ; ln –s /home/hadoop/zookeeper-3.3.2 /home/hadoop/zookeeper
         (4) copy conf/zoo_sample.cfg to conf/zoo.cfg
        $ cp conf/zoo_sample.cfg confg/zoo.cfg
         (5) Modify conf/zoo.cfg    dataDir=/home/hadoop/zookeeper/snapshot

    16. Install Hbase on both master and slave nodes, configure it as fully-distributed
        (1) Download hbase-0.90.0.tar.gz from Apache hadoop website   
        (2) Extract  hbase-0.90.0.tar.gz    $ tar –xzf hbase-0.90.0.tar.gz
         (3) Move folder hbase-0.90.0 to /home/hadoop/ and create a symbloink link    $ mv hbase-0.90.0 /home/hadoop/ ; ln –s /home/hadoop/hbase-0.90.0 /home/hadoop/hbase
         (4) Edit /home/hadoop/hbase/conf/hbase-site.xml, put the following in between and hbase.rootdirhdfs://master:54310/hbase The directory shared by region servers. Should be fully-qualified to include the filesystem to use. E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR hbase.cluster.distributedtrueThe mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) hbase.zookeeper.quorummasterComma separated list of servers in the ZooKeeper Quorum. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.
         (5) modify environment variables in /home/hadoop/hbase/conf/hbase-env.sh
        export JAVA_HOME=/usr/lib/jvm/java-6-sun/
    export HBASE_IDENT_STRING=$HOSTNAME
    export HBASE_MANAGES_ZK=false
         (6)Overwrite /home/hadoop/hbase/conf/regionservers
      on both machines    master slave
         (7)copy /usr/local/hadoop-0.20.2/haoop-0.20.2-core.jar to /home/hadoop/hbase/lib/  on both machines.
          This is very important to fix version difference issue. Pay attention to its ownership and mode(755).   

    17. Start zookeeper. It seems the zookeeper bundled with Hbase is not set up correctly.
    $ /home/hadoop/zookeeper/bin/zkServer.sh start     (Optional)We can test if zookeeper is running correctly by  typing     $ /home/hadoop/zookeeper/bin/zkCli.sh –server 127.0.0.1:2181

    18. Start hadoop cluster
    $ /usr/local/hadoop/bin/start-dfs.sh $ /usr/local/hadoop/bin/start-mapred.sh

    19. Start Hbase
    $ /home/hadoop/hbase/bin/start-hbase.sh

    20. Use Hbase shell
    $ /home/hadoop/hbase/bin/hbase shell     Check if hbase is running smoothly
        Open your browser, and type in the following.
        http://localhost:60010   


    21. Later on, stop the multi-node cluster by typing following code only on master
        (1) Stop Hbase    $ /home/hadoop/hbase/bin/stop-hbase.sh
         (2) Stop hadoop file system (HDFS)       
    $ /usr/local/hadoop/bin/stop-mapred.sh
    $ /usr/local/hadoop/bin/stop-dfs.sh
         (3) Stop zookeeper    
    $ /home/hadoop/zookeeper/bin/zkServer.sh stop

    Reference
    http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
    http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
    http://wiki.apache.org/hadoop/Hbase/10Minutes
    http://hbase.apache.org/book/quickstart.html
    http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/

    Author
    Tzu-Cheng Chuang


    Appendix- Install Pig and Hive
    1. Install Pig 0.8.0 on this cluster
        (1) Download pig-0.8.0.tar.gz from Apache pig project website.  Then extract the file and move it to /home/hadoop/   
    $ tar –xzf pig-0.8.0.tar.gz ; mv pig-0.8.0 /home/hadoop/
         (2) Make symbolink link under pig-0.8.0/conf/   
    $ ln -s /usr/local/hadoop/conf/core-site.xml /home/hadoop/pig-0.8.0/conf/core-site.xml
    $ ln -s /usr/local/hadoop/conf/mapred-site.xml /home/hadoop/pig-0.8.0/conf/mapred-site.xml
    $ ln -s /usr/local/hadoop/conf/hdfs-site.xml /home/hadoop/pig-0.8.0/conf/hdfs-site.xml
         3) Start pig in map-reduce mode: $ /home/hadoop/pig-0.8.0/bin/pig
         (4) Exit pig from grunt>    quit

    2. Install Hive on this cluster
        (1) Download hive-0.6.0.tar.gz from Apache hive project website, and then extract the file and move it to /home/hadoop/    $ tar –xzf hive-0.6.0.tar.gz ; mv hive-0.6.0 ~/
         (2) Modify java heap size in hive-0.6.0/bin/ext/execHiveCmd.sh  Change 4096 to 1024   
        (3) Create /tmp and /user/hive/warehouse and set them chmod g+w in HDFS before a table can be created in Hive    $ hadoop fs –mkdir /tmp $ hadoop fs –mkdir /user/hive/warehouse $ hadoop fs –chmod g+w /tmp $ hadoop fs –chmod g+w /user/hive/warehouse
         (4) start Hive     $ /home/hadoop/hive-0.6.0/bin/hive

         3. (Optional)Load data by using Hive
        Create a file /home/hadoop/customer.txt    1, Kevin 2, David 3, Brian 4, Jane 5, Alice     After hive shell is started, type in    > CREATE TABLE IF NOT EXISTS customer(id INT, name STRING) > ROW FORMAT delimited fields terminated by ',' > STORED AS TEXTFILE; >LOAD DATA INPATH '/home/hadoop/customer.txt' OVERWRITE INTO TABLE customer; >SELECT customer.id, customer.name from customer;

    http://chuangtc.info/ParallelComputing/SetUpHadoopClusterOnVmwareWorkstation.htm

    posted on 2013-08-17 22:23 paulwong 閱讀(1744) 評論(0)  編輯  收藏 所屬分類: HADOOP云計算HBASE

    主站蜘蛛池模板: 成年人在线免费看视频| 亚洲日韩看片无码电影| 亚洲精品成a人在线观看| 最刺激黄a大片免费网站| 在线观看人成视频免费无遮挡 | 国产亚洲Av综合人人澡精品| 久久亚洲日韩精品一区二区三区| 亚洲精品高清一二区久久| 毛片A级毛片免费播放| 免费无码一区二区三区 | 亚洲一区二区三区AV无码| 国产大片91精品免费观看男同| 免免费国产AAAAA片| 2019中文字幕免费电影在线播放| baoyu777永久免费视频 | 亚洲欧洲无码AV电影在线观看| 免费a级毛片18以上观看精品| 成年性生交大片免费看| 国产成人yy免费视频| 亚洲日本在线免费观看| 99在线视频免费| 99精品视频免费在线观看| 免费人成毛片动漫在线播放| 中国一级全黄的免费观看| A国产一区二区免费入口| 久久久免费观成人影院| 一级做a爰片久久免费| 国产精品高清免费网站| 成人片黄网站色大片免费观看cn | 波多野结衣久久高清免费| 最近的中文字幕大全免费版| 久久精品女人天堂AV免费观看| 在线观看免费人成视频色| 免费无码精品黄AV电影| 手机在线看永久av片免费| 国产精品视频永久免费播放| 成年女人视频网站免费m | 免费亚洲视频在线观看| 免费无遮挡无码视频在线观看| 特级aaaaaaaaa毛片免费视频| 一级特黄特色的免费大片视频|