如何使用python客戶端訪問impala的操作方式

發布時間：2021-04-14 10:02:46 來源：億速云閱讀：447 作者：小新欄目：開發技術

這篇文章主要介紹了如何使用python客戶端訪問impala的操作方式，具有一定借鑒價值，感興趣的朋友可以參考下，希望大家閱讀完這篇文章之后大有收獲，下面讓小編帶著大家一起了解一下。

因需要將impala僅僅作為數據源使用，而python有較好的數據分析函數，所以需要使用python客戶端來獲取impala中的表數據，這里的測試環境是：

操作系統：win7 (linux下也可行)

python 2.7

大數據環境：centos6.6

CDH版本：CDH5.4.1

impala 2.1.2 port:21050

1、安裝Python package

pip install impyla

2、python客戶端與impala交互

2.1 連接impala

>>> from impala.dbapi import connect
>>> conn = connect(host='my.impala.host', port=21050)
>>> cur = conn.cursor()

注意：這里要確保端口設置為HS2服務，而不是Beeswax服務。在Cloudera的管理集群中，HS2的默認端口是21050。（Beeswax默認端口21000）

2.2 對impala執行SQL查詢

>>> cur.execute('SHOW TABLES')
>>> cur.fetchall()
[('defect_code_dim',), ('gxzl_ca_materialinfo',), ('gxzl_cg_materialinfo',), ('gxzl_defect2',), ('gxzl_defects',), ('gxzl_defects_hd',), ('gxzl_fx_class',), ('gxzl_fx_leftmidright',), ('gxzl_fx_topandbot',), ('gxzl_jiejing_2cc_slab',), ('gxzl_kgx_drw',), ('gxzl_kgx_drw_tmp',), ('gxzl_rz_materialinfo',), ('gxzl_sdbase_defects',), ('gxzl_test',), ('new_table',), ('ouye_transactionlog',), ('ouye_userinfo',), ('simple_test',), ('t0',), ('t_100m_hdfs',), ('t_100m_test',), ('t_10m_hdfs',), ('target1',), ('target2',), ('target3',), ('test',), ('tianchi_mobile_recommend_train_full',), ('tianchi_mobile_recommend_train_item',), ('tianchi_mobile_recommend_train_user',), ('tianchi_mobile_recommend_train_useritem',)]
>>> cur.execute('SELECT * FROM test')
>>> cur.description
[('id', 'DOUBLE', None, None, None, None, None), ('name', 'STRING', None, None, None, None, None), ('value', 'STRING', None, None, None, None, None)]
>>> cur.fetchall()
[(1.0, 'tom', 'f'), (2.0, 'jerry', 't')]
>>>

注意：從服務器上獲取數據會刪除緩存，所以第二個.fetchall（）返回一個空列表。

>>> cur.fetchall()
[(1.0, 'tom', 'f'), (2.0, 'jerry', 't')]
>>> cur.fetchall()
[]
>>>

2.3 遍歷查詢結果

>>> cur.execute('SELECT * FROM test')
>>> for row in cur:
  print row[1] == 1.0


False
False

注：python的角標是以0開始。以上仍是以緩存方式來獲取數據。

如果你的數據集較小可以使用這種方式；如果你需要存儲大量的數據集，你可以用CREATE TABLE AS SELECT語句把它寫入HDFS。

2.4 將查詢結果轉化為python中的pandas DataFrames

除了遍歷結果以外，還可以把結果轉化成pandas的數據框對象，以便進行數據分析：

>>> from impala.dbapi import connect
>>> conn = connect(host='my.impala.host', port=21050)
>>> cur = conn.cursor()
>>> from impala.util import as_pandas
>>> cur.execute('SELECT * FROM test')
>>> df = as_pandas(cur)
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
>>> df
  id  name value
0  1  tom   f
1  2 jerry   t
>>>

注：前提是python中安裝了pandas，使用pip install pandas在線安裝，安裝過程中可能會提示：Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

只要按照提示說的的去下載一個VC就可以了。這樣就安裝好了pandas。

感謝你能夠認真閱讀完這篇文章，希望小編分享的“如何使用python客戶端訪問impala的操作方式”這篇文章對大家有幫助，同時也希望大家多多支持億速云，關注億速云行業資訊頻道，更多相關知識等著你來學習!

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

如何使用python客戶端訪問impala的操作方式

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

如何使用python客戶端訪問impala的操作方式

猜你喜歡

最新資訊

相關推薦

相關標簽