未加星标

Neo4j With Scala: Awesome Experience with Spark

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二03 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏

Lets start our journey again in this series. In the last blog we have discussed about the Data Migration from the Other Database to the Neo4j. Now we will discuss that how can we combine Neo4j with Spark?

Before starting the blog here is recap :

Getting Started Neo4j with Scala : An Introduction Neo4j with Scala: Defining User Defined Procedures and APOC Neo4j with Scala: Migrate Data From Other Database to Neo4j

Now we start our journey. We know that we use Apache Spark is a generalized framework for distributed data processing providing functional API for manipulating data at large scale, in-memory data caching and reuse across computations. Now when we are using Neo4j as a graph database and we have to perform data processing that time we use Spark for the data processing. We have to follow some basic step before start playing with Spark.

We can download Apache Spark 2.0.0 ( Download Here ). Here we have to remember that we can use only Spark 2.0.0 because which one connector we are going to use, it is build on Spark 2.0.0 . Set the SPARK_HOME Path in the .bashrc file(for linux user). Now We can use Neo4j-Spark-Connector : Configuration with Neo4j : When we are running Neo4j with default host and port then we have to configure only our password in the spark-defaults.conf. spark.neo4j.bolt.password=your_neo4j_password If we are not running Neo4j with default host and port then we have to configure it spark-defaults.conf. 1. spark.neo4j.bolt.url=bolt://$host_name:$port_number 2. spark.neo4j.bolt.user=user_name 3. spark.neo4j.bolt.password=your_neo4j_password We can provide user and password with the URL. bolt://neo4j:<password>@localhost Integration of Neo4j Spark Connector : Download Neo4j-Spark-Connector ( Download Here ) and build it. After building we can provide jar path when we start spark-shell. $ $SPARK_HOME/bin/spark-shell --jars neo4j-spark-connector_2.11-full-2.0.0-M2.jar We can provide Neo4j-Spark-Connector as a package also when we start spark-shell. $ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2 We can directly integrate Neo4j-Spark-Connector in SBT project. resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "neo4j-contrib" % "neo4j-spark-connector" % "2.0.0-M2" We can directly integrate Neo4j-Spark-Connector with POM( See Here ).

Now we are ready to use Neo4j with Spark. We will start Spark from the command prompt.

$ $SPARK_HOME/bin/spark-shell --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2
Neo4j With Scala: Awesome Experience with Spark

Now we are ready to start a new journey with the Spark and Neo4j.

I suppose that you have created data in the Neo4j for running the basic commands. If you have not created yet then here is a cypher for creating 1K record in your Neo4j Database :

1. UNWIND range(1, 1000) AS row CREATE (:Person {id: row, name: 'name' + row, age: row % 100}) 2. UNWIND range(1, 1000) AS row MATCH (n) WHERE id(n) = row MATCH (m) WHERE id(m) = toInt(rand() * 1000) CREATE (n)-[:KNOWS]->(m) RDD Operations :

We start with some RDD operations on the data which store in the Neo4j.

import org.neo4j.spark._ val neo = Neo4j(sc) val rowRDD = neo.cypher("MATCH (n:Person) RETURN n.name as name limit 10").loadRowRdd rowRDD.map(t => "Name: " + t(0)).collect.foreach(println)
Neo4j With Scala: Awesome Experience with Spark
import org.neo4j.spark._ val neo = Neo4j(sc) //calculate mean from the age data val rowRDD = neo.cypher("MATCH (n:Person) RETURN n.age").loadRdd[Long].mean //rowRDD: Double = 49.5 // load relationships via pattern neo.pattern("Person",Seq("KNOWS"),"Person").partitions(12).batch(100).loadNodeRdds.count //res30: Long = 1000

DataFrame Operations :

Now we will perform some operation on the DataFrame.

import org.neo4j.spark._ val neo = Neo4j(sc) val df = neo.cypher("MATCH (n:Person) RETURN n.name as name, n.age as age limit 5 ").loadDataFrame df.collect.foreach(println)
Neo4j With Scala: Awesome Experience with Spark
import org.neo4j.spark._ val neo = Neo4j(sc) //calculate sum from the age data val df = neo.cypher("MATCH (n:Person) RETURN n.age as age ").loadDataFrame df.agg(sum(df.col("age"))).collect //res10: Array[org.apache.spark.sql.Row] = Array([49500]) // load relationships via pattern val df = neo.pattern("Person",Seq("KNOWS"),"Person").partitions(2).batch(100).loadDataFrame.count //df: Long = 200

Graphx Graph Operations :

Now we will perform some operation on the Graphx Graph.

import org.neo4j.spark._ val neo = Neo4j(sc) import org.apache.spark.graphx._ import org.apache.spark.graphx.lib._ //load graph val graphQuery = "MATCH (n:Person)-[r:KNOWS]->(m:Person) RETURN id(n) as source, id(m) as target, type(r) as value"; val graph: Graph[Long, String] = neo.rels(graphQuery).partitions(10).batch(100).loadGraph graph.vertices.count //res0: Long = 1000 graph.edges.count //res1: Long = 999 //load graph with pattern val graph = neo.pattern(("Person","id"),("KNOWS","since"),("Person","id")).partitions(10).batch(100).loadGraph[Long,Long] //Count in-degree graph.inDegrees.count //res0: Long = 505 val graph2 = PageRank.run(graph, 5) graph2.vertices.sortBy(_._2).take(3) //res1: Array[(org.apache.spark.graphx.VertexId, Double)] = Array((602,0.15), (630,0.15), (476,0.15))

This is the basic example for using Spark with Neo4j.

I hope it will help you to start working with Spark and Neo4j.

Thanks.

Reference: Neo4j Spark Connector
Neo4j With Scala: Awesome Experience with Spark

本文数据库(综合)相关术语:系统安全软件

主题: SparkScalaLinux
分页:12
转载请注明
本文标题:Neo4j With Scala: Awesome Experience with Spark
本站链接:http://www.codesec.net/view/479794.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(59)