mr如何使用hcatalog讀寫hive表

發布時間：2021-11-09 18:33:40 來源：億速云閱讀：192 作者：柒染欄目：大數據

本篇文章給大家分享的是有關mr如何使用hcatalog讀寫hive表，小編覺得挺實用的，因此分享給大家學習，希望大家閱讀完這篇文章后可以有所收獲，話不多說，跟著小編一起來看看吧。

企業中，由于領導們的要求，hive中有數據存儲格式很多時候是會變的，比如為了優化將tsv，csv格式改為了parquet或者orcfile。那么這個時候假如是mr作業讀取hive的表數據的話，我們又要重新去寫mr并且重新部署。這個時候就很蛋疼。hcatalog幫我們解決了這個問題，有了它我們不用關心hive中數據的存儲格式。詳細信息請仔細閱讀本文。

本文主要是講mapreduce使用HCatalog讀寫hive表。

hcatalog使得hive的元數據可以很好的被其它hadoop工具使用，比如pig，mr和hive。

HCatalog的表為用戶提供了（HDFS）中數據的關系視圖，并確保用戶不必擔心他們的數據存儲在何處或采用何種格式，因此用戶無需知道數據是否以RCFile格式存儲，文本文件或sequence 文件。

它還提供通知服務，以便在倉庫中有新數據可用時通知工作流工具（如Oozie）。

HCatalog提供HCatInputFormat / HCatOutputFormat，使MapReduce用戶能夠在Hive的數據倉庫中讀/寫數據。它允許用戶只讀取他們需要的表和列的分區。返回的記錄格式是方便的列表格式，用戶無需解析它們。

下面我們舉個簡單的例子。

在mapper類中，我們獲取表schema并使用此schema信息來獲取所需的列及其值。

下面是map類。

public class onTimeMapper extends Mapper {   @Override   protected void map(WritableComparable key, HCatRecord value,    org.apache.hadoop.mapreduce.Mapper.Context context)    throws IOException, InterruptedException {
    // Get table schema    HCatSchema schema = HCatBaseInputFormat.getTableSchema(context);
    Integer year = new Integer(value.getString("year", schema));    Integer month = new Integer(value.getString("month", schema));    Integer DayofMonth = value.getInteger("dayofmonth", schema);
    context.write(new IntPair(year, month), new IntWritable(DayofMonth));   }}

在reduce類中，會為將要寫入hive表中的數據創建一個schema。

public class onTimeReducer extends Reducer {public void reduce (IntPair key, Iterable value, Context context) throws IOException, InterruptedException{  int count = 0; // records counter for particular year-month for (IntWritable s:value) {  count++; }  // define output record schema List columns = new ArrayList(3); columns.add(new HCatFieldSchema("year", HCatFieldSchema.Type.INT, "")); columns.add(new HCatFieldSchema("month", HCatFieldSchema.Type.INT, "")); columns.add(new HCatFieldSchema("flightCount", HCatFieldSchema.Type.INT,"")); HCatSchema schema = new HCatSchema(columns); HCatRecord record = new DefaultHCatRecord(3);  record.setInteger("year", schema, key.getFirstInt()); record.set("month", schema, key.getSecondInt()); record.set("flightCount", schema, count); context.write(null, record);}}

最后，創建driver類，并且表明輸入輸出schema和表信息。

public class onTimeDriver extends Configured implements Tool{   private static final Log log = LogFactory.getLog( onTimeDriver.class );
   public int run( String[] args ) throws Exception{    Configuration conf = new Configuration();    Job job = new Job(conf, "OnTimeCount");    job.setJarByClass(onTimeDriver.class);    job.setMapperClass(onTimeMapper.class);    job.setReducerClass(onTimeReducer.class);
    HCatInputFormat.setInput(job, "airline", "ontimeperf");    job.setInputFormatClass(HCatInputFormat.class);    job.setMapOutputKeyClass(IntPair.class);    job.setMapOutputValueClass(IntWritable.class);       job.setOutputKeyClass(Text.class);    job.setOutputValueClass(DefaultHCatRecord.class);    job.setOutputFormatClass(HCatOutputFormat.class);    HCatOutputFormat.setOutput(job, OutputJobInfo.create("airline", "flight_count", null));    HCatSchema s = HCatOutputFormat.getTableSchema(job);    HCatOutputFormat.setSchema(job, s);       return (job.waitForCompletion(true)? 0:1);   }      public static void main(String[] args) throws Exception{ int exitCode = ToolRunner.run(new onTimeDriver(), args); System.exit(exitCode);}}

當然，在跑上面寫的代碼之前，應該先在hive中創建輸出表。

create table airline.flight_count(Year INT ,Month INT ,flightCount INT)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE;

可能會引起錯誤的地方是沒有設置$HIVE_HOME.

以上就是mr如何使用hcatalog讀寫hive表，小編相信有部分知識點可能是我們日常工作會見到或用到的。希望你能通過這篇文章學到更多知識。更多詳情敬請關注億速云行業資訊頻道。

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

mr如何使用hcatalog讀寫hive表

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

mr如何使用hcatalog讀寫hive表

猜你喜歡

最新資訊

相關推薦

相關標簽