未加星标

HBase Troubleshooting

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二05 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏

Nasty little problem. Connecting to HBase kept timing out after lots of errors that looked like:

client.RpcRetryingCaller: Call exception, tries=17, retries=35, retryTime=208879ms, msg row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=HOST,60020,1475135661112, seqNum=0

while logging in the HBase shell seemed to be OK.

HBase uses Zookeeper to keep a record of its regions. So, first I tried checking that:

> echo ruok | nc ZOOKEEPER_HOST 2181

imok

Running a repair didn't seem to help either.

Time to look at the HBase transaction logs. They kept repeating every second or so something like:

master.splitLogManager: total tasks =3 unassigned = 0 tasks={ /hbase-secure/splitWAL/WALs/HOST,600201474532814834-splitting ... status = in_progess ...

WALs are write ahead logs . "HBase data updates are stored in a place in memory called memstore for fast write. In the event of a region server failure, the contents of the memstore are lost because they have not been saved to disk yet. To prevent data loss in such a scenario, the updates are persisted in a WAL file before they are stored in the memstore".

These WALs appeared to be constantly splitting. So, a quick look at the WALs directory was in order. It looked something like this:

drwr-x-r-x - hbase app-hdfs 0 2016-07-29 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532814834-splitting

drwr-x-r-x - hbase app-hdfs 0 2016-09-29 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532435621

drwr-x-r-x - hbase app-hdfs 0 2016-09-30 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532365837

drwr-x-r-x - hbase app-hdfs 0 2016-07-28 /apps/hbase/xxx/WALs/MACHINE_NAME,60020,1474532463823-splitting

.

.

Splitting is natural but shouldn't take too long (a split file from months ago is definitely a bad sign).

We were lucky that this was a test environment so we could happily delete these splitting files and restart HBase (YMMV. Think long and hard about doing that in a production environment...) But the problem went away for us.

本文数据库(综合)相关术语:系统安全软件

主题: HBaseKEE
分页:12
转载请注明
本文标题:HBase Troubleshooting
本站链接:http://www.codesec.net/view/480639.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(56)