您好,登錄后才能下訂單哦!
這篇文章主要介紹“Win10怎么搭建Pyspark2.4.4+Pycharm開發環境”,在日常操作中,相信很多人在Win10怎么搭建Pyspark2.4.4+Pycharm開發環境問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”Win10怎么搭建Pyspark2.4.4+Pycharm開發環境”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
hadoop3.0.0
spark-2.4.4-bin-without-hadoop
winutils下載(對應hadoop3.0.1的bin目錄覆蓋本地hadoop的bin目錄)
jdk1.8(默認已按照配置)
conda/anaconda(默認已安裝)
注意:cdh7.3.2的spark為2.4.0但是使用2.4.0本地pyspark有bug,下載的文件可能在第一次解壓縮后,如未出現目錄,則需要修改文件后綴為zip,再次解壓縮
spark2.4.x不支持python3.7以上版本
conda create -n pyspark2.4 python=3.7 activate pyspark2.4 pip install py4j pip install psutil
pyspark安裝方法(推薦一)
%SPARK_HOME%\python\pyspark目錄復制到%CONDA_HOME%\pyspark2.4\Lib\site-packages下
pip install pyspark=2.4.4
以下只是示例,根據實際情況修改,路徑不要有空格,如果有使用mklink /J 軟鏈接 目錄路徑
系統變量添加 HADOOP_HOME E:\bigdata\ENV\hadoop-3.0.0 SPARK_HOME E:\bigdata\ENV\spark-2.4.4-bin-without-hadoop PYSPARK_PYTHON C:\Users\zakza\anaconda3\envs\pyspark2.4\python.exe PATH添加 %HADOOP_HOME%\bin %SPARK_HOME%\bin
配置一 %SPARK_HOME%\conf目錄下新建spark-env.cmd文件,內容如下
FOR /F %%i IN ('hadoop classpath') DO @set SPARK_DIST_CLASSPATH=%%i
配置二 %SPARK_HOME%\conf\目錄下新建log4j.properties文件,內容如下
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN # Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=WARN log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
注意:配置好環境變量重啟下電腦,不然可能存在pycharm無法加載系統環境變量的情況
wc.txt
hello hadoop hadoop spark python flink storm spark master slave first second thrid kafka scikit-learn flume hive spark-streaming hbase
wordcount測試代碼
from pyspark import SparkContext if __name__ == '__main__': sc = SparkContext('local', 'WordCount') textFile = sc.textFile("wc.txt") wordCount = textFile.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey( lambda a, b: a + b) wordCount.foreach(print)
正常運行結果:
spark-shell報錯Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
解決方法:見上述配置一
Pyspark報錯ModuleNotFoundError: No module named 'resource'
解決方法:spark2.4.0存在的bug,使用spark2.4.4
Pyspark報錯org.apache.spark.sparkexception: python worker failed to connect back
解決方法:環境變量未配置正確,檢查是否遺漏,并檢查pycharm的configuration的環境變量里面能夠看到
關于%SPARK_HOME%\python\lib下的py4j-0.10.7-src.zip,pyspark.zip(未配置運行正常),也可以嘗試添加到項目
到此,關于“Win10怎么搭建Pyspark2.4.4+Pycharm開發環境”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。