未加星标

导入 HDFS 数据至 HBase

字体大小 | |
[大数据技术 所属分类 大数据技术 | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Time: 2017.9.14

Targets: 对于用户活跃情况的数据

执行16年日志数据;
HDFS导入HBase;

导入 HDFS 数据至 HBase

HBase

执行日志

数据格式

hadoop fs -ls /warehouse/orc_elapsed_log
/warehouse/orc_elapsed_log/dt=20160101

执行脚本,Java的Hive脚本。

cd wangchenlong/workspace/user-profile/processor/profile
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160101 20160131
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160201 20160229
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160301 20160331
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160401 20160430
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160501 20160731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160801 20161031
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20161101 20161231
hadoop fs -ls /tmp/wangchenlong/log_event
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160601 20160731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160912 20161031
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20161214 20161231

Hive的Maven Jar包,与Orc包造成冲突,版本不同,导致类不同,一些方法找不到,参考。

java.lang.Exception: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1044)

原因是hive-exec和orc-mapreduce的hive-storage-api版本不同,导致VectorizedRowBatch类异常。

测试:

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_process.HiveMRDemo /tmp/wangchenlong/orc

解决方案,添加hive-storage-api,强制指定使用新的类。

<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
<version>2.4.0</version>
</dependency>
导入HBase

HDFS导入HBase,查看表

hbash shell
list
desc 'cy_event'
scan 'cy_event', {LIMIT=>5} # 显示5个rowkey

表数据

user_time|1488384000000|29768601 column=info:assess_num, timestamp=1505384441438, value=3
user_time|1488384000000|29768601 column=info:duration, timestamp=1505384441438, value=42654
user_time|1488384000000|29768601 column=info:event_name, timestamp=1505384441438, value=user_time
user_time|1488384000000|29768601 column=info:event_time, timestamp=1505384441438, value=20170302_000000
user_time|1488384000000|29768601 column=info:login_zone, timestamp=1505384441438, value=0
user_time|1488384000000|29768601 column=info:uid, timestamp=1505384441438, value=29768601

执行数据,从HDFS导入HBase

cd wangchenlong/workspace/user-profile/processor/profile
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170101 20170331
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170401 20170731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170731 20170909

Processor业务类

public class UserTimeHBaseProcessor extends BaseSavedProcessor<LogEntity> {
public static final String DEF_NAME = UserTimeHBaseProcessor.class.getSimpleName();
@Override protected void onProcess(LogEntity entity) {
super.onProcess(entity);
String line = entity.original_line;
String[] items = line.split("\\|");
if (items.length != 6) {return;
}
Map<String, String> map = new HashMap<>();
String uid = items[0];
String event_name = items[1];
String time = items[2];
Date date = LaDateUtils.parseWriteDate(time);
if (date == null) {return;
}
String login_zone = items[3];
String duration = items[4];
String assess_num = items[5];
String rowKey = event_name + "|" + date.getTime() + "|" + uid;
map.put("uid", uid);
map.put("event_name", event_name);
map.put("event_time", time);
map.put("login_zone", login_zone);
map.put("duration", duration);
map.put("assess_num", assess_num);
saveHBase(rowKey, map);
}
}

注册Processor

public class ProcessorRegister extends BaseMainManager {
private static class Holder {
private static ProcessorRegister sInstance = new ProcessorRegister();
}
public static ProcessorRegister getInstance() {
return Holder.sInstance;
}
private ProcessorRegister() {
super();
//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/
// registerProcessor(UserTimeProcessor.DEF_NAME, new UserTimeProcessor());
registerProcessor(UserTimeHBaseProcessor.DEF_NAME, new UserTimeHBaseProcessor());
//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/
}
}

执行

case "-p":
main = new ProcessMain(args[1], args[2], LaValues.PathFormat.USER_TIME_PATH_FORMAT); // 进程模式
break;

使用Log_Analysis分析框架

OK, that's all!

作者:SpikeKing
链接:http://www.jianshu.com/p/1b9db1ba6afe
來源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

导入 HDFS 数据至 HBase
主题: HBaseHDFSHive数据JavaTI著作权冲突
tags: jar,log,profile,target,dependencies,me,chunyu,Main,analysis,String,gt,time,lt,event
分页:12
转载请注明
本文标题:导入 HDFS 数据至 HBase
本站链接:http://www.codesec.net/view/565650.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 大数据技术 | 评论(0) | 阅读(58)