1、下載并安裝hadoop(需安裝好jdk)
2、創建文件夾
~/dfs/name
~/dfs/data
~/temp
3、修改配置文件
~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
~/hadoop-2.2.0/etc/hadoop/yarn-env.sh
~/hadoop-2.2.0/etc/hadoop/slaves
~/hadoop-2.2.0/etc/hadoop/core-site.xml
~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
~/hadoop-2.2.0/etc/hadoop/mapred-site.xml
~/hadoop-2.2.0/etc/hadoop/yarn-site.xml
以上個別文件默認不存在的,可以復制相應的template文件獲得。
--hadoop-env.sh
修改javahome
export JAVA_HOME=/home/hduser/jdk1.6.0_45
--yarn-env.sh
同上
--slaves
增加處理節點機器名即可
--core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cloudn:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hduser/temp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
--hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>cloudn:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
--mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>cloudn:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>cloudn:19888</value>
</property>
</configuration>
--yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>cloudn:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cloudn:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cloudn:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>cloudn:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>cloudn:8088</value>
</property>
</configuration>
4、復制到其他節點,如有64位,32位,不能復制
5、啟動準備
格式化namenode: bin/hdfs namenode -format
6、啟動hadoop:sbin/start-all.sh
7、hellocount
先在hdfs上創建一個文件夾
./bin/hdfs dfs -mkdir /input
./bin/hdfs dfs -ls /
在namenode上創建個文件夾
files,創建2個文件 echo "Hello World" > file1.txt
more file1.txt file2.txt
hadoop fs -mkdir input
創建輸入文件夾
./bin/hdfs dfs -mkdir /input
./bin/hdfs dfs -mkdir input
./bin/hdfs dfs -ls /
上傳本地文件到集群的input目錄
./bin/hdfs dfs -put ~/files/*.txt /input
./bin/hdfs dfs -ls /input
執行
./bin/hadoop jar /home/hduser/hadoop-2.2.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jar org.apache.hadoop.examples.WordCount /input /output
結果
./bin/hadoop dfs -cat /output/part-r-00000
posted on 2014-02-26 16:40
李大嘴 閱讀(261)
評論(0) 編輯 收藏