您好,登錄后才能下訂單哦!
這篇文章主要介紹Spark1.6.1和Hadoop2.6.4完全分布式安裝的示例分析,文中介紹的非常詳細,具有一定的參考價值,感興趣的小伙伴們一定要看完!
前期準備: 以下安裝包均可在官網下載
hadoop-2.6.4.tar.gz jdk-7u71-linux-x64.tar.gz scala-2.10.4.tgz spark-1.6.1-bin-hadoop2.6.tgz
本人的硬件環境為:
master:虛擬內核8 內存16.0GB slave1:虛擬內核4 內存10.0GB slave2:虛擬內核4 內存10.0GB slave3:虛擬內核4 內存10.0GB slave4:虛擬內核4 內存10.0GB
將5臺機器分別命名為master、slave1、slave2、slave3、slave4:
在master這臺電腦上 sudo vim /etc/hostname master
在將5臺機器均配置相同hosts:
sudo vim /etc/hosts 127.0.0.1 localhost 127.0.1.1 master/slave1/... 192.168.80.70 master 192.168.80.71 slave1 192.168.80.72 slave2 192.168.80.73 slave3 192.168.80.74 slave4
配置好后,重啟,之后可以在master上ping slave1
配置ssh:
所有節點,使用 ssh-keygen -t rsa 一路按回車就行了。 ①在master上將公鑰放到authorized_keys里。命令:sudo cat id_rsa.pub >> authorized_keys ②將master上的authorized_keys放到其他linux的~/.ssh目錄下。 命令:scp authorized_keys root@salve1:~/.ssh ③修改authorized_keys權限,命令:chmod 644 authorized_keys ssh localhost以及ssh master ④測試是否成功 ssh slave1 輸入用戶名密碼,然后退出,再次ssh host2不用密碼,直接進入系統。這就表示成功了。 所有節點關閉防火墻 ufw disable
編輯配置文件:
vim /etc/profile export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71 export PATH=JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export SCALA_HOME=/opt/scala/scala-2.10.4 export PATH=/opt/scala/scala-2.10.4/bin:$PATH export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/root/hadoop-2.6.4 export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOOME/sbin:$HADOOP_HOME/lib export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export SPARK_HOME=/root/spark-1.6.1-bin-hadoop2.6 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin source /etc/profile
vim hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71 export HADOOP_CONF_DIR=/root/hadoop-2.6.4/etc/hadoop/ source hadoop-env.sh
vim yarn-env.sh export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71 source yarn-env.sh
vim spark-env.sh export SPARK_MASTER_IP=master export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=4 export SPARK_WORKER_MEMORY=4g export SPARK_WORKER_INSTANCES=2 export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71 export SCALA_HOME=/opt/scala/scala-2.10.4 export HADOOP_HOME=/root/hadoop-2.6.4 source spark-env.sh
Spark和Hadoop均需要修改slaves
vim slaves slave1 slave2 slave3 slave4
Hadoop相關配置:
vim core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/root/hadoop-2.6.4/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>
vim hdfs-site.xml <configuration> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name><value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
vim mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> <property> <name>mapred.map.tasks</name> <value>20</value> </property> <property> <name>mapred.reduce.tasks</name> <value>4</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name><value>master:10020</value> </property> <property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value> </property> </configuration>
vim yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name><value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name><value>master:8033</value> </property> <property> <name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
配置完上述內容后,在master節點上將上述兩個解壓包分發到slave1~slave4節點上:
scp -r spark-1.6.1-bin-hadoop2.6 root@slave1:~/ scp -r hadoop-2.6.4 root@slave1:~/
注意ssh要提前配置好,Hadoop運行測試這里不再贅述,注意 jps命令查看狀態
啟動測試Spark
./sbin/start-all.sh
測試Spark自帶的例子
./bin/spark-submit --master spark://master:7077 --class org.apache.spark.examples.SparkPi /root/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar
測試Spark shell
./bin/spark-shell --master spark://master:7077
以上是“Spark1.6.1和Hadoop2.6.4完全分布式安裝的示例分析”這篇文章的所有內容,感謝各位的閱讀!希望分享的內容對大家有幫助,更多相關知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。