您好,登錄后才能下訂單哦!
官方文檔:spark.apache.org/docs/latest
?
??? MapReduce局限性:
??? 1>) 繁雜
??????????? map/reduce (mapjoin沒有reduce)
??????????? low_level
??????????? constained
??????????? 需求 測試 每次改代碼再測試
??? 2>) 技術效率低
??????? 進程幾百:MapTask ReduceTask??? JVM復用
??????? IO: chain? 網絡+磁盤
??????? 排序:都要排序 :面試題:key類型是實現什么接口?
??????? Memory:
??????? ...
??????? 不適合迭代處理
??????? 不適合實時流式處理
???????
??? 很多框架各自為戰
???
spark.apache.org
??? Speed
??????? memory
??????? thread
??????? sort (可設置)
???????
??????? DAG rdd.map.filter....collect
???????
??? Ease of use
??????? high-level operators: join、 group 、 count。。。
???????
???
??? generality
???
??? Runs Everywhere
???
???
小結:
??? fast + general engine
??????? write code: java/Scala/Python/R? interactive shell
??????? run:memory/ADG/thread model/.....
???????版本介紹和選擇依據參考:
??? 如何學習Spark
??????? mail list
??????? user@spark.apache.org
??????? apache-spark-user-list/
??????? meetup/峰會
??????? 源碼樣例
??????? github.com/apache/spark
??????? source code
???????
???????
???????
???????
環境:
centos6
????
???? hadoop000(hadoop) hadoop001 hadoop002
???? app 存放安裝軟件的目錄
???? software 存放軟件包的tar
???? data 存放測試數據
???? lib存放我們自己的jar
???? source 存放源碼的位置
????
????
????
官網下載源碼解壓
????
????
???? 編譯Spark源碼的前置要求
???? java 8+, Python 2.7+/3.4+?? Spark 2.3.0? Scala 2.11.xx
???? 安裝jdk
????
? apache-maven安裝
???? 解壓配置.bash_proile
???? export MAVEN_HOME/home/hadoop/app/apache-maven-3.3.9
???? export PATH=$MAVE_HOME/bin:$PATH
????
???? 建議:修改maven本地倉庫的地址 $MAVE_HOME/conf/setting.xml
???? <lcoalRepository>/home/hadoop/mave_repo</lcoalRepository>
????
? 安裝scala-2.11.9.tgz
???? 解壓配置.bash_proile
???? export MAVEN_HOME/home/hadoop/app/scala-2.11.9
???? export PATH=$MAVE_HOME/bin:$PATH
????
???? source ~.bash_proile
???? 驗證:mvn -v
????
???? git安裝下 yum install git
????
????
編譯安裝
??? export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
??? ./build/mvn -DskipTests clean package
????
????
???? 修改源碼編碼默認hadoop版本
???? pom.xml
???? <hadoop.version2.6.5</hadoop.version>
???? <protobuf.version>2.5.0</protobuf.version>
???? profile
???? Apache Hadoop 2.7.x and later
??? ./build/mvn -pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
???? Hive 1.2.1 support
???? ./build/mvn -Pyarn -Phive -Phive -thriftserver -DskipTests clean package
??
????????
???????
????? 開發環境編譯?
????? ./build/mvn -pyarn -Phive -Phive -Phadoop-2.6 -Dhadoop.version=2.6.3 -DskipTests clean
???? 生產環境
????? ./dev/make-distribution.sh \
????? --name hadoop-2.6.0-cdh6.7.0 --(就填寫hadoop版本號) \
????? --tgz \
????? --Dhadoop.version=2.6.3 \
????? --Phadoop-2.6? \
????? --Phive -Phive-thriftserver \
????? --Pyarn
???????
???? 根據報錯配置倉庫源??
???????
???????
???? 修改腳本 加快編譯速度
??? vim make-distribution.zh? 注釋點下面幾個編譯檢查參數
??? VERSION=2.2.0
??? SCALA_VERSION=2.11
??? SPARK_HADOOP_VERSION=2.6.0-cdh6.7.0
??? SPARK_HIVE=1
????????
???????
???????
編譯文檔
http://spark.apache.org/docs/2.3.0/building-spark.html
more? --- building spark
???????
???????
Spark安裝包目錄結構說明
??? bin 存放客戶端相關腳本
??? conf 配置文件模板
??? data 存放測試數據
??? examples Spark 自帶的測試用例? 重點查看自帶的代碼樣例
??? jars???? jar包
??? sbin? 存放服務端相關腳本
??? yarn?? 存放yarn相關jar包
???????
源碼
github.com/apache/spark???????
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。