Spark 3.0怎么使用GPU加速

發布時間：2021-12-17 10:45:19 來源：億速云閱讀：835 作者：柒染欄目：大數據

今天就跟大家聊聊有關Spark 3.0怎么使用GPU加速，可能很多人都不太了解，為了讓大家更加了解，小編給大家總結了以下內容，希望大家根據這篇文章可以有所收獲。

概覽

RAPIDS Accelerator for Apache Spark 使用 GPUs數據加速處理，通過 RAPIDS libraries來實現。

當數據科學家從傳統數據分析轉向 AI applications以滿足復雜市場需求的時候，傳統的CPU-based 處理不再滿足速度與成本的需求。快速增長的 AI 分析需要新的框架來快速處理數據和節約成本，通過 GPUs來達到這個目標。

RAPIDS Accelerator for Apache Spark整合了 RAPIDS cuDF 庫和 Spark 分布式計算框架。該RAPIDS Accelerator library又一個內置的加速 shuffle 基于 UCX ，可以配置為 GPU-to-GPU 通訊和RDMA能力。

Spark RAPIDS 下載 v0.4.1

RAPIDS Spark Package
cuDF 11.0 Package
cuDF 10.2 Package
cuDF 10.1 Package

RAPIDS Notebooks

cuML Notebooks
cuGraph Notebooks
CLX Notebooks
cuSpatial Notebooks
cuxfilter Notebooks
XGBoost Notebooks

介紹

這些 notebooks 提供了使用 RAPIDS的例子。設計為自包含 runtime version of the RAPIDS Docker Container 和 RAPIDS Nightly Docker Containers and can run on air-gapped systems。可以快速獲得容器然后按照 RAPIDS.ai Getting Started page 進行安裝和使用。

用法

獲取最新的notebook repo 更新，運行 ./update.sh 或者使用命令：

git submodule update --init --remote --no-single-branch --depth 1

下載 CUDA Installer for Linux Ubuntu 20.04 x86_64

基礎安裝如下：

基本安裝程序

安裝說明：

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pinsudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda-repo-ubuntu2004-11-1-local_11.1.0-455.23.05-1_amd64.debsudo dpkg -i cuda-repo-ubuntu2004-11-1-local_11.1.0-455.23.05-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu2004-11-1-local/7fa2af80.pubsudo apt-get updatesudo apt-get -y install cuda

該CUDA Toolkit 包含開源項目軟件，可以在 here 找到。
可以在 Installer Checksums 中找到安裝程序和補丁的校驗和。

性能 & 成本與收益

Rapids Accelerator for Apache Spark 得益于 GPU 性能的同時降低了成本。如下： Spark 3.0怎么使用GPU加速 *ETL for FannieMae Mortgage Dataset (~200GB) as shown in our demo. Costs based on Cloud T4 GPU instance market price & V100 GPU price on Databricks Standard edition。

易于使用

運行以前的 Apache Spark 應用不需要改變代碼。啟動 Spark with the RAPIDS Accelerator for Apache Spark plugin jar然后打開配置，如下：

spark.conf.set('spark.rapids.sql.enabled','true')

physical plan with operators運行在GPU

一個統一的 AI framework for ETL + ML/DL

單一流水線，從數據準備到模型訓練：

Spark 3.0怎么使用GPU加速

開始使用RAPIDS Accelerator for Apache Spark

Apache Spark 3.0+ 為用戶提供了 plugin可以替換 SQL 和 DataFrame 操作。不需要對API做出改變，該 plugin替換 SQL operations為 GPU 加速版本。如果該操作不支持GPU加速將轉而用 Spark CPU 版本。

??注意plugin不能加速直接對RDDs的操作。

該 accelerator library 同時提供了Spark’s shuffle的實現，可以利用 UCX 優化 GPU data transfers，keeping as much data on the GPU as possible and bypassing the CPU to do GPU to GPU transfers。

該 GPU 加速處理 plugin 不要求加速的 shuffle 實現。但是，如果加速 SQL processing未開啟，該shuffle implementation 將使用缺省的SortShuffleManager。

開啟 GPU 處理加速，需要：

Apache Spark 3.0+
A spark cluster configured with GPUs that comply with the requirements for the version of cudf.

One GPU per executor.

The following jars:

A cudf jar that corresponds to the version of CUDA available on your cluster.
RAPIDS Spark accelerator plugin jar.

To set the config spark.plugins to com.nvidia.spark.SQLPlugin

Spark GPU 調度概覽

Apache Spark 3.0 現在支持 GPU 調度與 cluster manager 一樣。你可以讓 Spark 請求 GPUs 然后賦予tasks。精確的配置取決于 cluster manager的配置。下面是一些例子：

Request your executor to have GPUs:

--conf spark.executor.resource.gpu.amount=1

Specify the number of GPUs per task:

--conf spark.task.resource.gpu.amount=1

Specify a GPU discovery script (required on YARN and K8S):

--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh

查看部署的詳細信息確定其方法和限制。

注意 spark.task.resource.gpu.amount 可以是小數，如果想要 multiple tasks to be run on an executor at the same time and assigned to the same GPU，可以設置為小于1的小數。要與 spark.executor.cores 設置相對應。例如，spark.executor.cores=2 將允許 2 tasks 在每一個 executor，并且希望 2 tasks 運行在同一個 GPU，將設置spark.task.resource.gpu.amount=0.5。

看完上述內容，你們對Spark 3.0怎么使用GPU加速有進一步的了解嗎？如果還想了解更多知識或者相關內容，請關注億速云行業資訊頻道，感謝大家的支持。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Spark 3.0怎么使用GPU加速

概覽

Spark RAPIDS 下載 v0.4.1

RAPIDS Notebooks

介紹

用法

下載 CUDA Installer for Linux Ubuntu 20.04 x86_64

性能 & 成本與收益

易于使用

一個統一的 AI framework for ETL + ML/DL

開始使用RAPIDS Accelerator for Apache Spark

Spark GPU 調度概覽

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Spark 3.0怎么使用GPU加速

概覽

Spark RAPIDS 下載 v0.4.1

RAPIDS Notebooks

介紹

用法

下載 CUDA Installer for Linux Ubuntu 20.04 x86_64

性能 & 成本與收益

易于使用

一個統一的 AI framework for ETL + ML/DL

開始使用RAPIDS Accelerator for Apache Spark

Spark GPU 調度概覽

猜你喜歡

最新資訊

相關推薦

相關標簽