您好,登錄后才能下訂單哦!
這篇文章主要為大家展示了“如何在hadoop YARN上運行spark-shell”,內容簡而易懂,條理清晰,希望能夠幫助大家解決疑惑,下面讓小編帶領大家一起研究并學習一下“如何在hadoop YARN上運行spark-shell”這篇文章吧。
1. spark模式架構圖 ![](https://cache.yisu.com/upload/information/20210522/355/683134.png "在這里輸入圖片標題") 2. Scala下載安裝 a. 官網: http://www.scala-alng.org/files/archive/ b. 選擇好版本,復制鏈接,使用wget 命令下載 wget http://www.scala-alng.org/files/archive/scala-2.11.6.tgz c. 解壓 tar xvf scala-2.11.6.tgz sudo mv scala-2.11.6 /usr/local/scala # 將scala移動到/usr/local目錄 d. 設置環境變量 sudo gedit ~/.bashrc export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/bin source ~/.bashrc # 使配置生效 e. 啟動scala hduser[@master](https://my.oschina.net/u/48054):~$ scala 3. Spark安裝 a. 官網: http://spark.apache.org/downloads.html b. 選擇版本1.4 || Pre-built for Hadoop 2.6 and later || 復制鏈接使用wget 命令下載 c. wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz d. 解壓并移動到 /usr/local/spark/ e. 編輯環境變量 f. sudo gedit ~/.bashrc export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin g. source ~/.bashrc # 使配置生效 4. 啟動spark-shell交互頁面 hduser[@master](https://my.oschina.net/u/48054):~$ spark-shell 5. 啟動hadoop 6. 在本地運行spark-shell a. spark-shell --master local[4] b. 讀取本地文件 val textFile=sc.textFile("file:/usr/local/spark/LREADME.md") textFile.count 7. 在Hadoop Yarn 運行spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar # 設置sparkjar文件路徑 HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # 設置hadoop配置文件目錄 MASTER=yarn-client # 設置運行模式是yarn-client /usr/local/spark/bin/spark-shell # 要運行的spark-shell的完整路徑 8. 構建Spark Standalone Cluster執行環境 a. cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh # 復制模板文件 在進行設置 b. 設置spark-env.sh c. sudo gedit /usr/local/spark/conf/spark-env.sh export SPARK_MASTER_IP=master master的IP export SPARK_WORKER_CORES=1 每個worker使用的cpu核心 export SPARK_WORKER_MEMORY=600m 每個worker使用的內存 export SPARK_WORKER_INSTANCES=1 設置每個worker實例 # 一定要注意自己的內存 # hadoop+spark 在多個虛擬機上運行起來后8G內存是不夠用的 非常耗內存 # 資源在經過虛擬機后會有比較大的損耗 d. 使用ssh鏈接data1,data2 并創建spark目錄 sudo mkdir /usr/local/spark sudo chown hduser:hduser /usr/local/spark # 對data1 和data2執行上面的操作 e. 將master的spark復制到data1,data2上 sudo scp -r /usr/local/spark hduser@data1:/usr/local sudo scp -r /usr/local/spark hduser@data2:/usr/local f. 編輯slaves文件 sudo gedit /usr/local/spark/conf/slaves data1 data2 9. 在Spark Standalone運行spark-shell a. 啟動Spark Standalone Cluster /usr/local/spark/sbin/start-all.sh b. 運行 spark-shell --master spark://master:7077 c. 查看Spark Standalone Web UI界面 http://master:8080/ d. 停止Spark Standalone Cluster /usr/local/spark/sbin/stop-all.sh 10. 命令參考 152 scala 153 jps 154 wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz 155 ping www.baidu.com 156 ssh data3 157 ssh data2 158 ssh data1 159 jps 160 start-all.sh 161 jps 162 spark-shell 163 spark-shell --master local[4] 164 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 165 ssh data2 166 ssh data1 167 cd /usr/local/hadoop/etc/hadoop/ 168 ll 169 sudo gedit masters 170 sudo gedit slaves 171 sudo gedit /etc/hosts 172 sudo gedit hdfs-site.xml 173 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs 174 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode 175 sudo chown -R hduser:hduser /usr/local/hadoop 176 hadoop namenode -format 177 start-all.sh 178 jps 179 spark-shell 180 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 181 ssh data1 182 ssh data2 183 ssh data1 184 start-all.sh 185 jps 186 cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh 187 sudo gedit /usr/local/spark/conf/spark-env.sh 188 sudo scp -r /usr/local/spark hduser@data1:/usr/local 189 sudo scp -r /usr/local/spark hduser@data2:/usr/local 190 sudo gedit /usr/local/spark/conf/slaves 191 /usr/local/spark/sbin/start-all.sh 192 spark-shell --master spark://master:7077 193 /usr/local/spark/sbin/stop-all.sh 194 jps 195 stop-all.sh 196 history
以上是“如何在hadoop YARN上運行spark-shell”這篇文章的所有內容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內容對大家有所幫助,如果還想學習更多知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。