您好,登錄后才能下訂單哦!
這篇文章主要介紹Hive如何自定義函數實現將自定義函數注冊到hive源碼中并重新編譯hive,文中介紹的非常詳細,具有一定的參考價值,感興趣的小伙伴們一定要看完!
hive版本:
hive-1.1.0-cdh6.7.0
1 編寫UDF函數
1.1 用戶idea+maven創建項目,pom.xml文件的配置如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ruozedata.bigdata</groupId>
<artifactId>hive</artifactId>
<version>1.0-SNAPSHOT</version>
<name>hive</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<hive.version>1.1.0-cdh6.7.0</hive.version>
<hadoop.version>2.6.0-cdh6.7.0</hadoop.version>
</properties>
<!--Note: CDH版本一定需要添加如下依賴-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
<dependencies>
<!-- HDFS -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<!-- Hive -->
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.7.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.20.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
1.2 下載相關包:右鍵maven項目,maven->Reimport,就可自動下載jar包了,網絡不好的話有點慢,
1.3 創建類,右鍵包名:New->java Class 新建HelloUDF.java類
package com.ruozedata.bigdata;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
@Description(
name="sayhelloudf",
value="_FUNC_(input_str) - returns Hello : input_str or _FUNC_(input_str,input_str2) - returns Hello : input_str:input_str2 ",
extended = "Example:\n"+
" > SELECT _FUNC_('xx') FROM dual LIMIT 1;\n" +
" Hello : xx "
)
public class HelloUDF extends UDF {
public String evaluate(String input){
return "Hello:" + input;
}
public String evaluate(String input, String input2){
return "Hello:" + input + " : " + input2;
}
}
2 下載hive-1.1.0-cdh6.7.0-src.tar.gz源碼文件
http://archive.cloudera.com/cdh6/cdh/5/hive-1.1.0-cdh6.7.0-src.tar.gz
[hadoop@hadoop002 software]$ pwd
/home/hadoop/software
[hadoop@hadoop002 software]$ ll
total 499272
drwxrwxr-x. 17 hadoop hadoop 4096 Mar 24 2016 hadoop-2.6.0-cdh6.7.0
-rw-rw-r--. 1 hadoop hadoop 42610549 Jul 5 15:05 hadoop-2.6.0-cdh6.7.0-src.tar.gz
-rw-rw-r--. 1 hadoop hadoop 311585484 Feb 20 07:16 hadoop-2.6.0-cdh6.7.0.tar.gz
-rw-rw-r--. 1 hadoop hadoop 14652104 Feb 21 05:28 hive-1.1.0-cdh6.7.0-src.tar.gz
-rw-rw-r--. 1 hadoop hadoop 116082695 Feb 21 05:28 hive-1.1.0-cdh6.7.0.tar.gz
-rw-rw-r--. 1 hadoop hadoop 29966286 Feb 22 02:28 sqoop-1.4.6-cdh6.7.0.tar.gz
解壓到當前目錄:
[hadoop@hadoop002 software]$ tar -zxvf hive-1.1.0-cdh6.7.0-src.tar.gz
[hadoop@hadoop002 software]$ ll
total 503368
drwxrwxr-x. 17 hadoop hadoop 4096 Mar 24 2016 hadoop-2.6.0-cdh6.7.0
-rw-rw-r--. 1 hadoop hadoop 42610549 Jul 5 15:05 hadoop-2.6.0-cdh6.7.0-src.tar.gz
-rw-rw-r--. 1 hadoop hadoop 311585484 Feb 20 07:16 hadoop-2.6.0-cdh6.7.0.tar.gz
drwxrwxr-x. 32 hadoop hadoop 4096 Jul 5 15:15 hive-1.1.0-cdh6.7.0
-rw-rw-r--. 1 hadoop hadoop 14652104 Feb 21 05:28 hive-1.1.0-cdh6.7.0-src.tar.gz
-rw-rw-r--. 1 hadoop hadoop 116082695 Feb 21 05:28 hive-1.1.0-cdh6.7.0.tar.gz
-rw-rw-r--. 1 hadoop hadoop 29966286 Feb 22 02:28 sqoop-1.4.6-cdh6.7.0.tar.gz
3 將HelloUDF.java類添加到源碼
3.1 將HelloUDF.java類上傳到/home/hadoop/software/hive-1.1.0-cdh6.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf 目錄。
3.2 修改HelloUDF.java類的包:將 package com.ruozedata.bigdata; 修改為package org.apache.hadoop.hive.ql.udf;
3.3 修改FunctionRegistry.java類文件,對HelloUDF.java類進行注冊:
3.3.1 引入HelloUDF.java類文件的路徑:
import org.apache.hadoop.hive.ql.udf.HelloUDF;
3.3.2 在 static 塊中添加:system.registerUDF("helloudf", HelloUDF.class, false);
如下:
static {
system.registerGenericUDF("concat", GenericUDFConcat.class);
system.registerUDF("helloudf", HelloUDF.class, false);
.
.
.
}
4 編譯 hive-1.1.0-cdh6.7.0
4.1 進入hive-1.1.0-cdh6.7.0目錄進行源碼編譯:
[hadoop@hadoop002 software]$ cd hive-1.1.0-cdh6.7.0
[hadoop@hadoop002 hive-1.1.0-cdh6.7.0]$ mvn clean package -DskipTests -Phadoop-2 -Pdist
出現如下日志,表示編譯成功
[INFO] --- build-helper-maven-plugin:1.8:attach-artifact (attach-jdbc-driver) @ hive-packaging ---
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hive ............................................... SUCCESS [ 9.558 s]
[INFO] Hive Shims Common .................................. SUCCESS [ 14.763 s]
[INFO] Hive Shims 0.23 .................................... SUCCESS [ 7.488 s]
[INFO] Hive Shims Scheduler ............................... SUCCESS [ 4.449 s]
[INFO] Hive Shims ......................................... SUCCESS [ 2.784 s]
[INFO] Hive Common ........................................ SUCCESS [ 14.109 s]
[INFO] Hive Serde ......................................... SUCCESS [ 12.877 s]
[INFO] Hive Metastore ..................................... SUCCESS [ 39.906 s]
[INFO] Hive Ant Utilities ................................. SUCCESS [ 3.377 s]
[INFO] Spark Remote Client ................................ SUCCESS [ 13.578 s]
[INFO] Hive Query Language ................................ SUCCESS [02:22 min]
[INFO] Hive Service ....................................... SUCCESS [01:14 min]
[INFO] Hive Accumulo Handler .............................. SUCCESS [ 36.593 s]
[INFO] Hive JDBC .......................................... SUCCESS [01:31 min]
[INFO] Hive Beeline ....................................... SUCCESS [ 8.103 s]
[INFO] Hive CLI ........................................... SUCCESS [ 8.419 s]
[INFO] Hive Contrib ....................................... SUCCESS [ 7.041 s]
[INFO] Hive HBase Handler ................................. SUCCESS [01:29 min]
[INFO] Hive HCatalog ...................................... SUCCESS [ 31.376 s]
[INFO] Hive HCatalog Core ................................. SUCCESS [ 13.366 s]
[INFO] Hive HCatalog Pig Adapter .......................... SUCCESS [ 10.937 s]
[INFO] Hive HCatalog Server Extensions .................... SUCCESS [ 32.887 s]
[INFO] Hive HCatalog Webhcat Java Client .................. SUCCESS [ 6.679 s]
[INFO] Hive HCatalog Webhcat .............................. SUCCESS [ 41.266 s]
[INFO] Hive HCatalog Streaming ............................ SUCCESS [ 6.445 s]
[INFO] Hive HWI ........................................... SUCCESS [ 5.169 s]
[INFO] Hive ODBC .......................................... SUCCESS [ 3.305 s]
[INFO] Hive Shims Aggregator .............................. SUCCESS [ 1.139 s]
[INFO] Hive TestUtils ..................................... SUCCESS [ 1.869 s]
[INFO] Hive Packaging ..................................... SUCCESS [01:19 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:37 min
[INFO] Finished at: 2018-07-05T16:02:01+08:00
[INFO] Final Memory: 158M/494M
[INFO] ------------------------------------------------------------------------
4.2 編譯成功后會生成一個tar包:/home/hadoop/software/hive-1.1.0-cdh6.7.0/packaging/target/apache-hive-1.1.0-cdh6.7.0-bin.tar.gz
5 部署hive-1.1.0-cdh6.7.0
5.1 直接將編譯好的apache-hive-1.1.0-cdh6.7.0-bin.tar.gz進行安裝,見我前面寫的一篇博客:hive-1.1.0-cdh6.7.0 安裝 http://blog.itpub.net/29609890/viewspace-2155488/
5.2 替換hive-exec-1.1.0-cdh6.7.0.jar包。 在編譯好的文件中找到hive-exec-1.1.0-cdh6.7.0.jar包,去替換現在使用的hadoop-2.6.0-cdh6.7.0的hive-exec-1.1.0-cdh6.7.0.jar包
編譯好的hive-exec-1.1.0-cdh6.7.0.jar包路徑:/home/hadoop/software/hive-1.1.0-cdh6.7.0/packaging/target/apache-hive-1.1.0-cdh6.7.0-bin/apache-hive-1.1.0-cdh6.7.0-bin/lib/
現在使用的hadoop-2.6.0-cdh6.7.0的hive-exec-1.1.0-cdh6.7.0.jar包路徑:/home/hadoop/app/hive-1.1.0-cdh6.7.0/lib/hive-exec-1.1.0-cdh6.7.0.jar
切換到現在使用的hive目錄:
[hadoop@hadoop002 ~]$ cd ~/app/hive-1.1.0-cdh6.7.0/
[hadoop@hadoop002 hive-1.1.0-cdh6.7.0]$ pwd
/home/hadoop/app/hive-1.1.0-cdh6.7.0
[hadoop@hadoop002 hive-1.1.0-cdh6.7.0]$
備份現在使用的jar包:
/home/hadoop/app/hive-1.1.0-cdh6.7.0/lib/
[hadoop@hadoop002 hive-1.1.0-cdh6.7.0]# mv hive-exec-1.1.0-cdh6.7.0.jar hive-exec-1.1.0-cdh6.7.0.jar.bak
拷貝到原hive 部署位置:
[hadoop@hadoop002 hive-1.1.0-cdh6.7.0]# cp /home/hadoop/software/hive-1.1.0-cdh6.7.0/packaging/target/apache-hive-1.1.0-cdh6.7.0-bin/apache-hive-1.1.0-cdh6.7.0-bin/lib/hive-exec-1.1.0-cdh6.7.0.jar ./
查看hive-exec-1.1.0-cdh6.7.0.jar包
[hadoop@hadoop002 lib]$ ll hive-exec-1.1.0-cdh6.7.0.*
-rw-rw-r--. 1 hadoop hadoop 19274963 Jul 5 17:44 hive-exec-1.1.0-cdh6.7.0.jar
-rw-r--r--. 1 hadoop hadoop 19272159 Mar 24 2016 hive-exec-1.1.0-cdh6.7.0.jar.bak
6 重啟hive進行驗證:
hive> desc function sayhelloudf;
OK
sayhelloudf(input_str) - returns Hello : input_str or sayhelloudf(input_str,input_str2) - returns Hello : input_str:input_str2
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> select sayhelloudf('zhangsan') from ruozedata.dual;
OK
Hello:zhangsan
Time taken: 2.531 seconds, Fetched: 1 row(s)
hive> select sayhelloudf('zhangsan','lisi') from ruozedata.dual;
OK
Hello:zhangsan : lisi
Time taken: 0.397 seconds, Fetched: 1 row(s)
hive>
以上是“Hive如何自定義函數實現將自定義函數注冊到hive源碼中并重新編譯hive”這篇文章的所有內容,感謝各位的閱讀!希望分享的內容對大家有幫助,更多相關知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。