未加星标

The How and Why of Spark and Couchbase

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二03 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

I can spend a lot of time gushing about Couchbase and the details about its architecture and implementation. I’ve grown to really love Couchbase as a NoSQL store but my love for it isn’t really a good reason to write a blog post. I think a great deal of people using Couchbase for analytical purposes can benefit from combining it with Spark. This blog is just a quick rundown of some of the features I’ll often use when working with the two. More so my notes than really any wider statement.

House Keeping

I won’t go into too much detail about Couchbase, but it’s a JSON document store that is easy to distribute and has some other great features. I would recommend reading thedocs for more details.

Type Safe Serialization
The How and Why of Spark and Couchbase

A big annoyance with JSON is serialization. If you have data that looks like this:

Using it for analytical purposes requires some kind of query language or hand written code to loop through the objects but there are not a lot of guarantees about types when doing this in doing so. It’s safer to at least know what you’re dealing with and that requires guaranteeing an implicit conversion. In Scala we can use some help from case classes andSpray JSON to accomplish this:

From here, converting JSON to a Spark dataset is fairly trivial:

We can go safely from a JSON string to a dataset of rows and columns that have properly defined types with minimal effort, allowing for a natural pipeline from Couchbase to Spark.

You might say “Well you will have to hand write code to do these implicit conversions,” which is true. You can do a loose conversion to a list of JSON documents and then convert the schema afterward, or use some of Sparks built in facilities:

N1QL
The How and Why of Spark and Couchbase

N1QL is the query language behind Couchbase allowing you to write SQL-like queries over the JSON data structure. There have been other attempts at this but none implemented as well as N1QL, in my opinion. If I have data that looks like this:

I could query it with N1QL in the following way:

That may not be that impressive because there aren’t any nested structures to get through. You can do array searches in N1QL as well making for some interesting query opportunities like the following:

I find N1QL intuitive, especially in the Spark SQL context where you’re already writing SQL-Like syntax. I used N1QL in the example in the previous section without explaining it. You can see how it is a bit more intuitive than the traditional Couchbaseget if you come from a SQL background.

Streaming
The How and Why of Spark and Couchbase

You can stream data from Couchbase as well instead of querying it. This only makes sense if you have some analytical needs based on updates to the database.

Setting up this code is straight forward:

You can use this as an analytics layer to watch for abnormalities in the dataor to trigger other events or pipelines. You can also use Spark streaming to write data to Couchbase from a Stream as outlined in thedocs.

Full-Text Search

You can use full-text search in Couchbase, similar to Elastic Search. Full-text search is much less precise than a SQL query but it’s appropriate for many use cases. You can use full-text search without much effort:

From here you can use pattern matching to ensure correct serialization and so on. In addition to a simple search like this, there are manycomplex searches you can do in Couchbase as well.

That’s All

Couchbase isn’t a good fit for all applications but it’s being adopted pretty rapidly and it’s great to know it’s nice to work with for Spark. I’ve been using it for a few months now and have grown to like it quite a bit.

Poll

本文数据库(综合)相关术语:系统安全软件

分页:12
转载请注明
本文标题:The How and Why of Spark and Couchbase
本站链接:http://www.codesec.net/view/519854.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(89)