未加星标

Life after 1 year of using Neo4J

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

A year ago on one of my projects we got an idea that migrating to Neo4j would be great, cause we having data that would be ideal for graph. After that, our life changed forever.

I think Neo4j is like heroine, first you think that it’s the most awesome thing that you can even imagine, but after few months eyforiya goes down and you starting to understand that maybe it was not the best choice in your life.

On the Neo4j site you can see many big clients like Ebay, LinkedIn, but in real i don’t know how and where they are using this DB, so if some of their developers can share some info in the comments it would be great. But for now i will tell you only my personal experience of using Neo4j.

Query language

Query language of Neo4j is called Cypher. It’s very simple and after few minutes of reading docs you can already make some non-trivial queries. Like most DBs it also has Explain and Profile commands, that are giving you possibility to understand what is happening under the hood of the query.

But when you start making more and more complex queries you starting to see that you can’t understand how query is working, and after each change you need to use Profile . For example in SQL when you use Join directive, you already know that this will make query heavier, but in Neo4j you can change order of few rows that logically not lead to any changes, but on runtime it can lead to increasing query time from 0.05ms to 30sec. So making hard queries is some sort of magic. I think that one of the reason why Neo4j guys recommend to split all queries on small queries.

Query execution

Query watcher

So you make your first query, and run it on the server, but unfortunately you make a little mistake which return not 5 nodes but 5M nodes. In most DBs there is a watcher that looking for long running queries or queries that are using big amount of memory and can kill them to prevent DB from going down. Neo4j by docs also have one, but i didnt see that it really work. In the best scenario you will get error like “undefined ― undefined”, in the worst you DB will go down, and maybe your server also.

Read all data first

Remember, in Neo4j it’s not matter what are you doing it will make read query first. So for example in relational DB you want to delete all records ― Db will go through each record and delete them, and it doesn’t matter how many do you have them. In Neo4j it first will try to get all the info from this records, and then run delete. In real life this mean, for example, that you can’t delete all 1m records from DB, cause you just don’t have much RAM for that. And to delete all of them, you will need to make script that will run queries with Limit until all records will be deleted. The problem of that is that you cant know how much data in the node and how to set limit so it doesn’t overflow RAM.

Locking

That is another fun part. Locking works here in a different way that in many relational DBs. So for example in some rel DB when you making update query then query executor understand that and set write lock for field, record, etc, depend on locking policy. In the Neo4j there is only write lock which is set not before query starting to execute, but when part of the query will try to update something. So for example query MATCH (n:Test {id:1}) SET n.param=2 will add write lock to node only after making MATCH request. And that mean that on concurrent updates you will get problems. There is a big topic on Neo4j blog how to handle such problems, but for me it seems like a collection of hot fixes. Here it is: https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-scaling/ Connections

Another problem when you are going live ― there is no connection proxy balancer like in Pg, and also there is no way to limit number of connection that DB can handle, and close other. This lead to use HAproxy and strange hand written scripts to achieve this and make DB more stable for live use.

High Availability

Neo4j has only one way for this and basically it’s a Master ― Slave replication, it even doesn’t have master master replication. Also there is no way to set master priority for instances(which is good thing when you have installed plugins. i will tell about them further).

So for example you cant make DC-DC replication, you cant use Blue Green Deployment technic, or make two clusters. All this things you will need to make by you own by writing scripts, services and kernel plugins for Neo4J.

Also developers of Neo4j in their blog also wrote that you should always check sync between master and slave cause sometimes it can fail.

Extensions

So there are 3 types of extensions for Neo4j ― Unmanaged Extensions, Server Plugins and Kernel Extensions. I was working only with third one.

Kernel extensions used when you need to add some additional functionality to how Neo4j work internally. In my case i was working with TransactionEventHandler which used to work with transaction events like beforeCommit, afterCommit, afterRollback.

It seems me a good way of adding some features that Neo4j doesn’t have from the box.

First thing that i understand that there is almost no documentation for that, and i was need to look over few created plugins, StackOverflow and other sites to get things together and make first try(Maybe if i would a Java developer it would be much faster, but i mostly work with python, and sometimes Android).

Second thing that i found that not all events works as they should. So for example in beforeCommit(which should be run when DB not changed) you cant access deleted nodes params, labels, relations cause they are already deleted. Yeah.. strange. Then afterCommit(which should be run after transaction commit and close) executed when transaction is still opened, which will lead to deadlock(without any info and exception) if you will try to update your local db(in some cases).

Third thing, is that every extension is run in global environment which lead to dependencies collisions, which if you have more then one plugin will lead you to fixing plugins for your case by hands.

I wrote some example for Kernel Extension that can help to start making new cool plugin: https://github.com/creotiv/neo4j-kernel-plugin-example

My conclusion

So i don’t saying that Neo4j is not working, as i know it’s one of the best graph DB that you can use for free, but it still very raw. And you should understand that if you task not very trivial then you will get some overhead for making it work with Neo4j. Also it’s not very fit HL+HA requirements.

I would be glad if you guys can share your experience of working with Neo4j here in comments.


Life after 1 year of using Neo4J

本文数据库(综合)相关术语:系统安全软件

主题: SQLJavaAndroidLinkedIn
分页:12
转载请注明
本文标题:Life after 1 year of using Neo4J
本站链接:http://www.codesec.net/view/533637.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(35)