您好,登錄后才能下訂單哦!
這篇文章主要講解了“基于CentOS的Hadoop分布式環境如何搭建”,文中的講解內容簡單清晰,易于學習與理解,下面請大家跟著小編的思路慢慢深入,一起來研究和學習“基于CentOS的Hadoop分布式環境如何搭建”吧!
在搭建hadoop環境中你要知道的一些事兒:
1.hadoop運行于linux系統之上,你要安裝linux操作系統
2.你需要搭建一個運行hadoop的集群,例如局域網內能互相訪問的linux系統
3.為了實現集群之間的相互訪問,你需要做到ssh無密鑰登錄
4.hadoop的運行在jvm上的,也就是說你需要安裝java的jdk,并配置好java_home
5.hadoop的各個組件是通過xml來配置的。在官網上下載好hadoop之后解壓縮,修改/etc/hadoop目錄中相應的配置文件
工欲善其事,必先利其器。這里也要說一下,在搭建hadoop環境中使用到的相關軟件和工具:
1.virtualbox——畢竟要模擬幾臺linux,條件有限,就在virtualbox中創建幾臺虛擬機樓
2.centos——下載的centos7的iso鏡像,加載到virtualbox中,安裝運行
3.securecrt——可以ssh遠程訪問linux的軟件
4.winscp——實現windows和linux的通信
5.jdk for linux——oracle官網上下載,解壓縮之后配置一下即可
6.hadoop2.7.1——可在apache官網上下載
好了,下面分三個步驟來講解
linux環境準備
配置ip
為了實現本機和虛擬機以及虛擬機和虛擬機之間的通信,virtualbox中設置centos的連接模式為host-only模式,并且手動設置ip,注意虛擬機的網關和本機中host-only network 的ip地址相同。配置ip完成后還要重啟網絡服務以使得配置有效。這里搭建了三臺linux,如下圖所示
配置主機名字
對于192.168.56.101設置主機名字hadoop01。并在hosts文件中配置集群的ip和主機名。其余兩個主機的操作與此類似
[root@hadoop01 ~]# cat /etc/sysconfig/network # created by anaconda networking = yes hostname = hadoop01 [root@hadoop01 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.101 hadoop01 192.168.56.102 hadoop02 192.168.56.103 hadoop03
永久關閉防火墻
service iptables stop(1.下次重啟機器后,防火墻又會啟動,故需要永久關閉防火墻的命令;2由于用的是centos 7,關閉防火墻的命令如下)
systemctl stop firewalld.service #停止firewall systemctl disable firewalld.service #禁止firewall開機啟動
關閉selinux防護系統
改為disabled 。reboot重啟機器,使配置生效
[root@hadoop02 ~]# cat /etc/sysconfig/selinux # this file controls the state of selinux on the system # selinux= can take one of these three values: # enforcing - selinux security policy is enforced # permissive - selinux prints warnings instead of enforcing # disabled - no selinux policy is loaded selinux=disabled # selinuxtype= can take one of three two values: # targeted - targeted processes are protected, # minimum - modification of targeted policy only selected processes are protected # mls - multi level security protection selinuxtype=targeted
集群ssh免密碼登錄
首先設置ssh密鑰
ssh-keygen -t rsa
拷貝ssh密鑰到三臺機器
ssh-copy-id 192.168.56.101 <pre name="code" class="plain">ssh-copy-id 192.168.56.102
ssh-copy-id 192.168.56.103
這樣如果hadoop01的機器想要登錄hadoop02,直接輸入ssh hadoop02
<pre name="code" class="plain">ssh hadoop02
配置jdk
這里在/home忠誠創建三個文件夾中
tools——存放工具包
softwares——存放軟件
data——存放數據
通過winscp將下載好的linux jdk上傳到hadoop01的/home/tools中
解壓縮jdk到softwares中
<pre name="code" class="plain">tar -zxf jdk-7u76-linux-x64.tar.gz -c /home/softwares
可見jdk的家目錄在/home/softwares/jdk.x.x.x,將該目錄拷貝粘貼到/etc/profile文件中,并且在文件中設置java_home
export java_home=/home/softwares/jdk0_111 export path=$path:$java_home/bin
保存修改,執行source /etc/profile使配置生效
查看java jdk是否安裝成功:
java -version
可以將當前節點中設置的文件拷貝到其他節點
scp -r /home/* root@192.168.56.10x:/home
hadoop集群安裝
集群的規劃如下:
101節點作為hdfs的namenode ,其余作為datanode;102作為yarn的resourcemanager,其余作為nodemanager。103作為secondarynamenode。分別在101和102節點啟動jobhistoryserver和webappproxyserver
下載hadoop-2.7.3
并將其放在/home/softwares文件夾中。由于hadoop需要jdk的安裝環境,所以首先配置/etc/hadoop/hadoop-env.sh的java_home
(ps:感覺我用的jdk版本過高了)
接下來依次修改hadoop相應組件對應的xml
修改core-site.xml :
指定namenode地址
修改hadoop的緩存目錄
hadoop的垃圾回收機制
<configuration> <property> <name>fsdefaultfs</name> <value>hdfs://101:8020</value> </property> <property> <name>hadooptmpdir</name> <value>/home/softwares/hadoop-3/data/tmp</value> </property> <property> <name>fstrashinterval</name> <value>10080</value> </property> </configuration>
hdfs-site.xml
設置備份數目
關閉權限
設置http訪問接口
設置secondary namenode 的ip地址
<configuration> <property> <name>dfsreplication</name> <value>3</value> </property> <property> <name>dfspermissionsenabled</name> <value>false</value> </property> <property> <name>dfsnamenodehttp-address</name> <value>101:50070</value> </property> <property> <name>dfsnamenodesecondaryhttp-address</name> <value>103:50090</value> </property> </configuration>
修改mapred-site.xml.template名字為mapred-site.xml
指定mapreduce的框架為yarn,通過yarn來調度
指定jobhitory
指定jobhitory的web端口
開啟uber模式——這是針對mapreduce的優化
<configuration> <property> <name>mapreduceframeworkname</name> <value>yarn</value> </property> <property> <name>mapreducejobhistoryaddress</name> <value>101:10020</value> </property> <property> <name>mapreducejobhistorywebappaddress</name> <value>101:19888</value> </property> <property> <name>mapreducejobubertaskenable</name> <value>true</value> </property> </configuration>
修改yarn-site.xml
指定mapreduce為shuffle
指定102節點為resourcemanager
指定102節點的安全代理
開啟yarn的日志
指定yarn日志刪除時間
指定nodemanager的內存:8g
指定nodemanager的cpu:8核
<configuration> <!-- site specific yarn configuration properties --> <property> <name>yarnnodemanageraux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarnresourcemanagerhostname</name> <value>102</value> </property> <property> <name>yarnweb-proxyaddress</name> <value>102:8888</value> </property> <property> <name>yarnlog-aggregation-enable</name> <value>true</value> </property> <property> <name>yarnlog-aggregationretain-seconds</name> <value>604800</value> </property> <property> <name>yarnnodemanagerresourcememory-mb</name> <value>8192</value> </property> <property> <name>yarnnodemanagerresourcecpu-vcores</name> <value>8</value> </property> </configuration>
配置slaves
指定計算節點,即運行datanode和nodemanager的節點
192.168.56.101
192.168.56.102
192.168.56.103
先在namenode節點格式化,即101節點上執行:
進入到hadoop主目錄: cd /home/softwares/hadoop-3
執行bin目錄下的hadoop腳本: bin/hadoop namenode -format
出現successful format才算是執行成功(ps,這里是盜用別人的圖,不要介意哈)
以上配置完成后,將其拷貝到其他的機器
hadoop環境測試
進入hadoop主目錄下執行相應的腳本文件
jps命令——java virtual machine process status,顯示運行的java進程
在namenode節點101機器上開啟hdfs
[root@hadoop01 hadoop-3]# sbin/start-dfssh java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack' 16/11/07 16:49:19 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-namenode-hadoopout 102: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 103: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 101: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout starting secondary namenodes [hadoop03] hadoop03: starting secondarynamenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-secondarynamenode-hadoopout
此時101節點上執行jps,可以看到namenode和datanode已經啟動
[root@hadoop01 hadoop-3]# jps 7826 jps 7270 datanode 7052 namenode
在102和103節點執行jps,則可以看到datanode已經啟動
[root@hadoop02 bin]# jps 4260 datanode 4488 jps [root@hadoop03 ~]# jps 6436 secondarynamenode 6750 jps 6191 datanode
啟動yarn
在102節點執行
[root@hadoop02 hadoop-3]# sbin/start-yarnsh starting yarn daemons starting resourcemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-resourcemanager-hadoopout 101: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 103: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 102: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout
jps查看各節點:
[root@hadoop02 hadoop-3]# jps 4641 resourcemanager 4260 datanode 4765 nodemanager 5165 jps [root@hadoop01 hadoop-3]# jps 7270 datanode 8375 jps 7976 nodemanager 7052 namenode [root@hadoop03 ~]# jps 6915 nodemanager 6436 secondarynamenode 7287 jps 6191 datanode
分別啟動相應節點的jobhistory和防護進程
[root@hadoop01 hadoop-3]# sbin/mr-jobhistory-daemonsh start historyserver starting historyserver, logging to /home/softwares/hadoop-3/logs/mapred-root-historyserver-hadoopout [root@hadoop01 hadoop-3]# jps 8624 jps 7270 datanode 7976 nodemanager 8553 jobhistoryserver 7052 namenode [root@hadoop02 hadoop-3]# sbin/yarn-daemonsh start proxyserver starting proxyserver, logging to /home/softwares/hadoop-3/logs/yarn-root-proxyserver-hadoopout [root@hadoop02 hadoop-3]# jps 4641 resourcemanager 4260 datanode 5367 webappproxyserver 5402 jps 4765 nodemanager
在hadoop01節點,即101節點上,通過瀏覽器查看節點狀況
hdfs上傳文件
[root@hadoop01 hadoop-3]# bin/hdfs dfs -put /etc/profile /profile
運行wordcount程序
[root@hadoop01 hadoop-3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-jar wordcount /profile /fll_out java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack' 16/11/07 17:17:10 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable 16/11/07 17:17:12 info clientrmproxy: connecting to resourcemanager at /102:8032 16/11/07 17:17:18 info inputfileinputformat: total input paths to process : 1 16/11/07 17:17:19 info mapreducejobsubmitter: number of splits:1 16/11/07 17:17:19 info mapreducejobsubmitter: submitting tokens for job: job_1478509135878_0001 16/11/07 17:17:20 info implyarnclientimpl: submitted application application_1478509135878_0001 16/11/07 17:17:20 info mapreducejob: the url to track the job: http://102:8888/proxy/application_1478509135878_0001/ 16/11/07 17:17:20 info mapreducejob: running job: job_1478509135878_0001 16/11/07 17:18:34 info mapreducejob: job job_1478509135878_0001 running in uber mode : true 16/11/07 17:18:35 info mapreducejob: map 0% reduce 0% 16/11/07 17:18:43 info mapreducejob: map 100% reduce 0% 16/11/07 17:18:50 info mapreducejob: map 100% reduce 100% 16/11/07 17:18:55 info mapreducejob: job job_1478509135878_0001 completed successfully 16/11/07 17:18:59 info mapreducejob: counters: 52 file system counters file: number of bytes read=4264 file: number of bytes written=6412 file: number of read operations=0 file: number of large read operations=0 file: number of write operations=0 hdfs: number of bytes read=3940 hdfs: number of bytes written=261673 hdfs: number of read operations=35 hdfs: number of large read operations=0 hdfs: number of write operations=8 job counters launched map tasks=1 launched reduce tasks=1 other local map tasks=1 total time spent by all maps in occupied slots (ms)=8246 total time spent by all reduces in occupied slots (ms)=7538 total_launched_ubertasks=2 num_uber_submaps=1 num_uber_subreduces=1 total time spent by all map tasks (ms)=8246 total time spent by all reduce tasks (ms)=7538 total vcore-milliseconds taken by all map tasks=8246 total vcore-milliseconds taken by all reduce tasks=7538 total megabyte-milliseconds taken by all map tasks=8443904 total megabyte-milliseconds taken by all reduce tasks=7718912 map-reduce framework map input records=78 map output records=256 map output bytes=2605 map output materialized bytes=2116 input split bytes=99 combine input records=256 combine output records=156 reduce input groups=156 reduce shuffle bytes=2116 reduce input records=156 reduce output records=156 spilled records=312 shuffled maps =1 failed shuffles=0 merged map outputs=1 gc time elapsed (ms)=870 cpu time spent (ms)=1970 physical memory (bytes) snapshot=243326976 virtual memory (bytes) snapshot=2666557440 total committed heap usage (bytes)=256876544 shuffle errors bad_id=0 connection=0 io_error=0 wrong_length=0 wrong_map=0 wrong_reduce=0 file input format counters bytes read=1829 file output format counters bytes written=1487
瀏覽器中通過yarn查看運行狀態
查看最后的詞頻統計結果
瀏覽器中查看hdfs的文件系統
[root@hadoop01 hadoop-3]# bin/hdfs dfs -cat /fll_out/part-r-00000 java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack' 16/11/07 17:29:17 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable != 1 "$-" 1 "$2" 1 "$euid" 2 "$histcontrol" 1 "$i" 3 "${-#*i}" 1 "0" 1 ":${path}:" 1 "`id 2 "after" 1 "ignorespace" 1 # 13 $uid 1 && 1 () 1 *) 1 *:"$1":*) 1 -f 1 -gn`" 1 -gt 1 -r 1 -ru` 1 -u` 1 -un`" 2 -x 1 -z 1 2 /etc/bashrc 1 /etc/profile 1 /etc/profiled/ 1 /etc/profiled/*sh 1 /usr/bin/id 1 /usr/local/sbin 2 /usr/sbin 2 /usr/share/doc/setup-*/uidgid 1 002 1 022 1 199 1 200 1 2>/dev/null` 1 ; 3 ;; 1 = 4 >/dev/null 1 by 1 current 1 euid=`id 1 functions 1 histcontrol 1 histcontrol=ignoreboth 1 histcontrol=ignoredups 1 histsize 1 histsize=1000 1 hostname 1 hostname=`/usr/bin/hostname 1 it's 2 java_home=/home/softwares/jdk0_111 1 logname 1 logname=$user 1 mail 1 mail="/var/spool/mail/$user" 1 not 1 path 1 path=$1:$path 1 path=$path:$1 1 path=$path:$java_home/bin 1 path 1 system 1 this 1 uid=`id 1 user 1 user="`id 1 you 1 [ 9 ] 3 ]; 6 a 2 after 2 aliases 1 and 2 are 1 as 1 better 1 case 1 change 1 changes 1 check 1 could 1 create 1 custom 1 customsh 1 default, 1 do 1 doing 1 done 1 else 5 environment 1 environment, 1 esac 1 export 5 fi 8 file 2 for 5 future 1 get 1 go 1 good 1 i 2 idea 1 if 8 in 6 is 1 it 1 know 1 ksh 1 login 2 make 1 manipulation 1 merging 1 much 1 need 1 pathmunge 6 prevent 1 programs, 1 reservation 1 reserved 1 script 1 set 1 sets 1 setup 1 shell 2 startup 1 system 1 the 1 then 8 this 2 threshold 1 to 5 uid/gids 1 uidgid 1 umask 3 unless 1 unset 2 updates 1 validity 1 want 1 we 1 what 1 wide 1 will 1 workaround 1 you 2 your 1 { 1 } 1
感謝各位的閱讀,以上就是“基于CentOS的Hadoop分布式環境如何搭建”的內容了,經過本文的學習后,相信大家對基于CentOS的Hadoop分布式環境如何搭建這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是億速云,小編將為大家推送更多相關知識點的文章,歡迎關注!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。