大數據分析平臺Apache Kylin的部署（Cube構建使用）

發布時間：2020-06-08 13:44:24 來源：網絡閱讀：7281 作者：醬醬醬子啊欄目：大數據

前言

Apache Kylin是一個開源的分布式分析引擎，最初由eBay開發貢獻至開源社區。它提供Hadoop之上的SQL查詢接口及多維分析（OLAP）能力以支持大規模數據，能夠處理TB乃至PB級別的分析任務，能夠在亞秒級查詢巨大的Hive表，并支持高并發。

Kylin的理論基礎：空間換時間。

Kylin從數據倉庫中最常用的Hive中讀取源數據，使用 MapReduce作為Cube構建的引擎，并把預計算結果保存在HBase中，對外暴露Rest API/JDBC/ODBC的查詢接口。

部署Kylin

(一)下載安裝

寫這篇博客時，最新版為2.0.0 beta版,最新的正式版為1.6.0,所以我使用的1.6.0。

可以直接下載源碼包編譯安裝，也可以根據自己的hadoop環境版本下載對應的二進制安裝包。

我使用的是HDP2.4.2，Hbase版本是1.1.2。直接下載的是二進制包安裝。

$ cd /opt
$ wget http://ftp.tc.edu.tw/pub/Apache/kylin/apache-kylin-1.6.0/apache-kylin-1.6.0-hbase1.x-bin.tar.gz
$ tar xf apache-kylin-1.6.0-hbase1.x-bin.tar.gz
$ vim /etc/profile
export KYLIN_HOME=/opt/apache-kylin-1.6.0-hbase1.x-bin
$ source /etc/profile

（二）環境檢查

$cd /opt/apache-kylin-1.6.0-hbase1.x-bin
$./bin/check-env.sh
KYLIN_HOME is set to /opt/apache-kylin-1.6.0-hbase1.x-binmkdir: Permission denied: user=root, access=WRITE, inode="/kylin":hdfs:hdfs:drwxr-xr-xfailed to create /kylin, Please make sure the user has right to access /kylin

#提示使用hdfs用戶
#check-env.sh腳本執行的是檢查本地hive，hbase，hadoop等環境情況。
#并會在hdfs中創建一個kylin的工作目錄。

$ su hdfs
$ ./bin/check-env.sh 
KYLIN_HOME is set to /opt/apache-kylin-1.6.0-hbase1.x-bin
$ hadoop fs -ls /   #多了一個/kylin的目錄drwxr-xr-x   - hdfs   hdfs            0 2017-01-19 10:08 /kylin

（三）啟動

$ chown hdfs.hadoop /opt/apache-kylin-1.6.0-hbase1.x-bin 
$ ./bin/kylin.sh start
A new Kylin instance is started by hdfs, stop it using "kylin.sh stop"Please visit 
 You can check the log at /opt/apache-kylin-1.6.0-hbase1.x-bin/logs/kylin.log

（四）進入頁面

http://localhost:7070/kylin

user：ADMIN passwd：KYLIN

大數據分析平臺Apache Kylin的部署（Cube構建使用）

使用Kylin

(一)添加新的項目

大數據分析平臺Apache Kylin的部署（Cube構建使用）

給項目起一個名字，添加項目描述。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

給項目添加數據源（加載hive數據表）

大數據分析平臺Apache Kylin的部署（Cube構建使用）

在數據源的頁面，可以手動填寫hive表名

大數據分析平臺Apache Kylin的部署（Cube構建使用）

成功加載了resource表的數據

大數據分析平臺Apache Kylin的部署（Cube構建使用）

這時就可以看到對應表的字段屬性。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

（二）創建model（模型）

新建model

大數據分析平臺Apache Kylin的部署（Cube構建使用）

編輯model名字和描述

大數據分析平臺Apache Kylin的部署（Cube構建使用）

選擇數據表

大數據分析平臺Apache Kylin的部署（Cube構建使用）

接下來選擇維度和度量，這是構建預計算模型cube中最為重要的兩個屬性。

度量：度量是具體考察的聚合數量值，例如：銷售數量、銷售金額、人均購買量。計算機一點描述就是在SQL中就是聚合函數。

例如：select cate,count(1),sum(num) from fact_table where date>’20161112’ group by cate;

count(1)、sum(num)是度量

維度：維度是觀察數據的角度。例如：銷售日期、銷售地點。計算機一點的描述就是在SQL中就是where、group by里的字段

例如：select cate,count(1),sum(num) from fact_table where date>’20161112’ group by cate;

date、cate是維度

選擇要分析的維度字段

大數據分析平臺Apache Kylin的部署（Cube構建使用）

選擇要分析的度量字段

大數據分析平臺Apache Kylin的部署（Cube構建使用）

設置表中的時間字段

大數據分析平臺Apache Kylin的部署（Cube構建使用）

（三）創建cube（立方體）

Cube構建需要依賴前面創建的model。選擇model，設置cube名。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

從上面model設置的維度字段中選擇你需要分析的字段。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

選擇度量。

第一個_COUNT_是默認要計算的。

第二個COUNT_DISTINCT，可以去重計算得到有多少個IP地址，即通常的UV。

（COUNT_DISTINCT計算時是有精確度選擇的，計算越精準需要的時間就越長）

第三個TOP_N，是用來計算排名的。

第四個MAX，是用來計算最大值的

還有其他的MIN，SUM等各種計算表達式。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

后面的幾個基本上就沒有什么要設置的了，直接Next了，最后保存cube就好了。

（四）構建cube

創建好cube之后，我們只是得到了一個計算模型。需要將數據按照我們設定的模型去計算，才能得到相應的結果。

下面開始構建cube，在Action中選擇Build

大數據分析平臺Apache Kylin的部署（Cube構建使用）

選擇要構建的時間范圍（如果數據是持續寫入hive表，那么可以使用cube持續構建）

大數據分析平臺Apache Kylin的部署（Cube構建使用）

進入Monitor中查看正在構建的Cube，和歷史構建的cube

大數據分析平臺Apache Kylin的部署（Cube構建使用）

（五）查詢

cube構建成功后，數據就已經計算過，并將計算結果存儲到了Hbase。那么這時候我們可以使用SQL在kylin中進行查詢。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

比較一下在kylin中查詢和直接在hive中查詢的速度。

執行一個group by order by的查詢。

SQL：select ip, max(loadmax) as loadmax,max(connectmax) as connectmax, max(eth0max) as eth0max, max(eth2max) as eth2max ,max(rospace) as rospace,max(team) as team from resource group by ip order by loadmax asc ；

在Kylin預計算之后，這條查詢只用了0.11s

大數據分析平臺Apache Kylin的部署（Cube構建使用）

直接在hive中進行計算時間是30.05s

大數據分析平臺Apache Kylin的部署（Cube構建使用）

時間相差270倍！！！

（六）樣例數據

#kylin自帶一個樣例，包含1w條數據的樣本

$ ./bin/sample.sh
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin server or reload the metadata from web UI to see the change.
$ ./bin/kylin.sh stop
stopping Kylin:15334
$ ./bin/kylin.sh start

可以在Kylin中看到learn_kylin這個項目。并且有創建好的model和cube，可以供參考和學習。

大數據分析平臺Apache Kylin的部署（Cube構建使用）

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

大數據分析平臺Apache Kylin的部署（Cube構建使用）

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

大數據分析平臺Apache Kylin的部署（Cube構建使用）

猜你喜歡

最新資訊

相關推薦

相關標簽