您好,登錄后才能下訂單哦!
問題:
用spark-submit以yarn-client方式提交任務,在集群的某些節點上的任務出現連接超時的錯誤,排查過各種情況后,確定在防火墻配置上出現問題。
原因:
我猜測是python程序啟動后,作為Server,hadoop中資源調度是以java程序作為Client端訪問,
Python啟動的Server端需要接受localhost的client訪問。
當你從一臺linux主機向自身發送數據包時,實際上的數據包是通過虛擬的lo接口來發送接受的,而不會通過你的物理網卡 eth0/eth2....,此時防火墻就要允許來自本地lo接口的數據包,需要加入以下配置允許Python Server接受來自本地lo接口的數據包,然后解決該問題。
iptables -A INPUT -i lo -j ACCEPT 添加iptables規則,允許來自于lo接口的數據包
任務的部分報錯日志
16/07/25 13:56:44 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev d62701d4d05dfa6115bbaf8d9dff002df142e62d] 16/07/25 13:56:44 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/07/25 13:56:44 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/07/25 13:56:44 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/07/25 13:56:44 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/07/25 13:56:44 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/07/25 13:57:47 WARN python.PythonWorkerFactory: Failed to open socket to Python daemon: java.net.ConnectException: 連接超時 at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.<init>(Socket.java:425) at java.net.Socket.<init>(Socket.java:241) at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75) at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/07/25 13:57:47 WARN python.PythonWorkerFactory: Assuming that daemon unexpectedly quit, attempting to restart 16/07/25 13:58:51 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
參考地址:
http://stackoverflow.com/questions/15659132/connection-refused-between-a-python-server-and-a-java-client
http://stackoverflow.com/questions/26297551/connecting-python-and-java-via-sockets/38605208#38605208
http://www.zybang.com/question/9ab66451988eb2768194817f25a0b7a9.html
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。