hive與hbase數據交互的方法是什么

發布時間：2021-12-09 13:54:42 來源：億速云閱讀：148 作者：iii 欄目：云計算

本篇內容介紹了“hive與hbase數據交互的方法是什么”的有關知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓小編帶領大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠學有所成！

HBase和Hive的集成原理

ApacheCN | apache中文網?

Hive和Hbase有各自不同的特征：hive是高延遲、結構化和面向分析的，hbase是低延遲、非結構化和面向編程的。Hive數據倉庫在hadoop上是高延遲的。Hive集成Hbase就是為了使用hbase的一些特性。如下是hive和hbase的集成架構：

hive與hbase數據交互的方法是什么

圖1 hive和hbase架構圖

Hive集成HBase可以有效利用HBase數據庫的存儲特性，如行更新和列索引等。在集成的過程中注意維持HBase jar包的一致性。Hive集成HBase需要在Hive表和HBase表之間建立映射關系，也就是Hive表的列(columns)和列類型(column types)與HBase表的列族(column families)及列限定詞(column qualifiers)建立關聯。每一個在Hive表中的域都存在于HBase中，而在Hive表中不需要包含所有HBase中的列。HBase中的RowKey對應到Hive中為選擇一個域使用:key來對應，列族(cf:)映射到Hive中的其它所有域，列為(cf:cq)。例如下圖2為Hive表映射到HBase表：

hive與hbase數據交互的方法是什么

圖2 Hive表映射HBase表

基本介紹

Hive是基于Hadoop的一個數據倉庫工具，可以將結構化的數據文件映射為一張數據庫表，并提供完整的sql查詢功能，可以將sql語句轉換為MapReduce任務進行運行。其優點是學習成本低，可以通過類SQL語句快速實現簡單的MapReduce統計，不必開發專門的MapReduce應用，十分適合數據倉庫的統計分析。
Hive與HBase的整合功能的實現是利用兩者本身對外的API接口互相進行通信，相互通信主要是依靠hive_hbase-handler.jar工具類，大致意思如圖所示：

hive與hbase數據交互的方法是什么

軟件版本

使用的軟件版本：（沒有下載地址的百度中找就行）

jdk-6u24-linux-i586.bin

hive-0.9.0.tar.gz http://yun.baidu.com/share/link?shareid=2138315647&uk=1614030671

hbase-0.94.14.tar.gz http://mirror.bit.edu.cn/apache/hbase/hbase-0.94.14/

hadoop-1.1.2.tar.gz http://pan.baidu.com/s/1mgmKfsG

安裝位置

安裝目錄：/usr/local/ （記得解壓后重命名一下哦）
Hbase的安裝路徑為：/usr/local/hbase

Hive的安裝路徑為：/usr/local/hive

整合步驟

整合hive與hbase的過程如下：

1.在 /usr/local/hbase-0.90.4下：

hbase-0.94.14.jar，hbase-0.94.14-tests.jar 與lib/zookeeper-3.4.5.jar拷貝到/usr/local /hive/lib文件夾下面
注意：

如果hive/lib下已經存在這兩個文件的其他版本（例如zookeeper-3.3.1.jar）

建議刪除后使用hbase下的相關版本

還需要

protobuf-java-2.4.0a.jar拷貝到/usr/local/hive/lib和/usr/local/hadoop/lib下

2.修改hive-site.xml文件

在/usr/local/hive/conf下目錄下，在hive-site.xml最底部添加如下內容：

(跳轉到最下面的linux命令：按住Esc鍵 + 冒號 + $  然后回車) <property> <name>hive.querylog.location</name> <value>/usr/local/hive/logs</value> </property> <property> <name>hive.aux.jars.path</name>

注意：如果不存在則自行創建，或者把文件改名后使用。拷貝到所有節點包括的下。拷貝下的文件到所有節點包括的下。

注意，如果3,4兩步跳過的話，運行hive時很可能出現如下錯誤：
org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and
then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.
hbase.zookeeper.ZooKeeperWatcher.

5 啟動hive （測試成功）
單節點啟動
bin/hive -hiveconf hbase.master=master:60000
集群啟動 (這個我沒測試)
bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3 (所有的zookeeper節點)
如果hive-site.xml文件中沒有配置hive.aux.jars.path，則可以按照如下方式啟動。
hive --auxpath /opt/mapr/hive/hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,/opt/mapr/hive/hive-0.7.1/lib/hbase-0.90.4.jar,/opt/mapr/hive/hive-0.7.1/lib/zookeeper-3.3.2.jar -hiveconf hbase.master=localhost:60000

經測試修改hive的配置文件hive-site.xml

<property>
<name>hive.zookeeper.quorum</name>
<value>node1,node2,node3</value>
<description>The list of zookeeper servers to talk to. This is only needed for read/write locks.</description>
</property>

不用增加參數啟動hive就可以聯合hbase

測試hive到hbase中

啟動后進行測試（重啟一下集群）

1. 用hive創建hbase能識別的表

語句如下：

create table hbase_table_1(key int, value string)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping">:key,cf1:val")

tblproperties ("hbase.table.name" = "xyz");

此刻你進入hbase shell中發現多了一張表 ‘xyz’
（可以先跳過這句話：hbase.table.name 定義在hbase的table名稱，

多列時：data:1，data:2；多列族時：data1:1,data2:1；）
hbase.columns.mapping 定義在hbase的列族，里面的:key 是固定值而且要保證在表pokes中的foo字段是唯一值

創建有分區的表

create table hbase_table_1(key int, value string)

partitioned by (day string)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping" = ":key,cf1:val")

tblproperties ("hbase.table.name" = "xyz");

不支持表的修改
會提示不能修改非本地表。
hive> ALTER TABLE hbase_table_1 ADD PARTITION (day = '2012-09-22');
FAILED: Error in metadata: Cannot use ALTER TABLE on a non-native table FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

2. 導入數據到關聯hbase的表中去

1.在hive中新建一張中間表

create table pokes(foo int,bar string)

row format delimited fields terminated by ',';
批量導入數據
load data local inpath '/home/1.txt' overwrite into table pokes;

1.txt文件的內容為
1,hello
2,pear
3,world

使用sql導入hbase_table_1

set hive.hbase.bulk=true;

2.插入數據到hbase表中去

insert overwrite table hbase_table_1

select * from pokes;

導入有分區的表

insert overwrite table hbase_table_1 partition (day='2012-01-01')

select * from pokes;

3.查看關聯hbase的那張表

hive> select * from hbase_table_1;
OK
1 hello
2 pear
3 world

(注：與hbase整合的有分區的表存在個問題 select * from table查詢不到數據，select key,value from table可以查到數據)

4.登錄hbase查看那張表的數據

hbase shell

hbase(main):002:0> describe 'xyz'
DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATION_S true
COPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSI
ZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0830 seconds
hbase(main):003:0> scan 'xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1331002501432, value=hello
2 column=cf1:val, timestamp=1331002501432, value=pear
3 column=cf1:val, timestamp=1331002501432, value=world

這時在Hbase中可以看到剛才在hive中插入的數據了。

測試hbase到hive中

1.在hbase中創建表

create 'test1','a','b','c'

put 'test1','1','a','qqq'

put 'test1','1','b','aaa'

put 'test1','1','c','bbb'

put 'test1','2','a','qqq'

put 'test1','2','c','bbb'

2.把hbase中的表關聯到hive中

對于在hbase已經存在的表，在hive中使用CREATE EXTERNAL TABLE來建立
例如hbase中的表名稱為test1,字段為 a: , b: ,c: 在hive中建表語句為

create external table hive_test

(key int,gid map<string,string>,sid map<string,string>,uid map<string,string>)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping" ="a:,b:,c:")

tblproperties ("hbase.table.name" = "test1");

2.檢查test1中的數據

在hive中建立好表后，查詢hbase中test1表內容
select * from hive_test;

OK
1 {"":"qqq"} {"":"aaa"} {"":"bbb"}
2 {"":"qqq"} {} {"":"bbb"}

查詢gid字段中value值的方法為
select gid[''] from hive_test;
得到查詢結果
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201203052222_0017, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203052222_0017
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201203052222_0017
2012-03-06 14:38:29,141 Stage-1 map = 0%, reduce = 0%
2012-03-06 14:38:33,171 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203052222_0017
OK
qqq
qqq

如果hbase表test1中的字段為user:gid,user:sid,info:uid,info:level，在hive中建表語句為
create external table hive_test

(key int,user map<string,string>,info map<string,string>)

stored by 'org.apache.hadoop.hive.hbase.hbasestoragehandler'

with serdeproperties ("hbase.columns.mapping" ="user:,info:")

tblproperties ("hbase.table.name" = "test1");

查詢hbase表的方法為
select user['gid'] from hive_test;

注：hive連接hbase優化，將HADOOP_HOME/conf中的hbase-site.xml文件中增加配置

<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>

或者在執行hive語句之前執行hive>set hbase.client.scanner.caching=10000;

報錯：

Hive報錯

1.NoClassDefFoundError
Could not initialize class java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.HbaseObjectWritable
將protobuf-***.jar添加到jars路徑

//$HIVE_HOME/conf/hive-site.xml

<value>file:///data/hadoop/hive-0.10.0/lib/hive-hbase-handler-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/hbase-0.94.8.jar,file:///data/hadoop/hive-0.10.0/lib/zookeeper-3.4.5.jar,file:///data/hadoop/hive-0.10.0/lib/guava-r09.jar,file:///data/hadoop/hive-0.10.0/lib/hive-contrib-0.10.0.jar,file:///data/hadoop/hive-0.10.0/lib/protobuf-java-2.4.0a.jar</value>

</property>

Hbase 報錯

:java.lang.NoClassDefFoundError: com/google/protobuf/Message

編個Hbase程序，系統提示錯誤，java.lang.NoClassDefFoundError: com/google/protobuf/Message

找了半天，從這個地方發現了些東西：http://abloz.com/2012/06/15/hive-execution-hbase-create-the-table-can-not-find-protobuf.html

內容如下：

hadoop:1.0.3

hive:0.9.0

hbase:0.94.0

protobuf:$HBASE_HOME/lib/protobuf-java-2.4.0a.jar

可以看到，0.9.0的hive里面自帶的hbase的jar是0.92版本的。

[zhouhh@Hadoop48 ~]$ hive –auxpath $HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar,$HIVE_HOME/lib/hbase-0.92.0.jar,$HIVE_HOME/lib/zookeeper-3.3.4.jar,$HIVE_HOME/lib/guava-r09.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar

hive> CREATE TABLE hbase_table_1(key int, value string)

> STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’

> WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,cf1:val”)

> TBLPROPERTIES (“hbase.table.name” = “xyz”);

java.lang.NoClassDefFoundError: com/google/protobuf/Message

at org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java

…

Caused by: java.lang.ClassNotFoundException: com.google.protobuf.Message

解決辦法：

將$HBASE_HOME/lib/protobuf-java-2.4.0a.jar 拷貝到 $HIVE_HOME/lib/.

[zhouhh@Hadoop48 ~]$ cp /home/zhouhh/hbase-0.94.0/lib/protobuf-java-2.4.0a.jar $HIVE_HOME/lib/.

hive> CREATE TABLE hbase_table_1(key int, value string)

> STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’

> WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,cf1:val”)

> TBLPROPERTIES (“hbase.table.name” = “xyz”);

Time taken: 10.492 seconds

hbase(main):002:0> list ‘xyz’

TABLE

xyz

1 row(s) in 0.0640 seconds

在引用的jar包中包含protobuf-java-2.4.0a.jar即可。

測試腳本

bin/hive -hiveconf hbase.master=master:60000

hive --auxpath /usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,/usr/local/hive/lib/hbase-0.94.7-security.jar,/usr/local/hive/lib/zookeeper-3.4.5.jar -hiveconf hbase.master=localhost:60000

<value>file:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar</value>

</property>

1 nana

2 hehe

3 xixi

hadoop dfsadmin -safemode leave

/home/hadoop

create table hbase_table_1(key int, value string)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping" = ":key,cf1:val")

tblproperties ("hbase.table.name" = "xyz");

drop table pokes;

create table pokes

(id int,name string)

row format delimited fields terminated by ' '

stored as textfile;

load data local inpath '/home/hadoop/kv1.txt' overwrite into table pokes;

insert into table hbase_table_1

select * from pokes;

create external table hive_test

(key int,gid map<string,string>,sid map<string,string>,uid map<string,string>)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping" ="a:,b:,c:")

tblproperties ("hbase.table.name" = "test1");

“hive與hbase數據交互的方法是什么”的內容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業相關的知識可以關注億速云網站，小編將為大家輸出更多高質量的實用文章！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

hive與hbase數據交互的方法是什么

HBase和Hive的集成原理

軟件版本

安裝位置

整合步驟

1.在 /usr/local/hbase-0.90.4下：

2.修改hive-site.xml文件

測試hive到hbase中

1. 用hive創建hbase能識別的表

2. 導入數據到關聯hbase的表中去

1.在hive中新建一張中間表

2.插入數據到hbase表中去

3.查看關聯hbase的那張表

4.登錄hbase查看那張表的數據

測試hbase到hive中

1.在hbase中創建表

2.把hbase中的表關聯到hive中

2.檢查test1中的數據

報錯：

Hive報錯

Hbase 報錯

測試腳本

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

hive與hbase數據交互的方法是什么

HBase和Hive的集成原理

軟件版本

安裝位置

整合步驟

1.在 /usr/local/hbase-0.90.4下：

2.修改hive-site.xml文件

測試hive到hbase中

1. 用hive創建hbase能識別的表

2. 導入數據到關聯hbase的表中去

1.在hive中新建一張中間表

2.插入數據到hbase表中去

3.查看關聯hbase的那張表

4.登錄hbase查看那張表的數據

測試hbase到hive中

1.在hbase中創建表

2.把hbase中的表關聯到hive中

2.檢查test1中的數據

報錯：

Hive報錯

Hbase 報錯

測試腳本

猜你喜歡

最新資訊

相關推薦

相關標簽