MongoRocks and WiredTiger versus linkbench on a small server
MongoRocks and WiredTiger versus linkbench on a small server
I spent a lot of time evaluating open-source database engines over the past few years and WiredTiger has been one of my favorites. The engine and the team are excellent. I describe it as a copy-on-write-random (CoW-R) b-tree as defined in aprevious post. WiredTiger also has a log-structured merge tree . It isn't officially supported in MongoDB. Fortunately we have MongoRocks if you want an LSM.
This one is long. Future reports will be shorter and reference this. My tl;dr for Linkbench with low concurrency on a small server:I think there is something wrong in how WiredTiger uses zlib, at least in MongoDB 3.2.4 mmapV1 did better than I expected. We can improve MongoRocks write efficiency and write throughput. The difference for write efficiency between MongoRocks and other MongoDB engines isn't as large as it is between MyRocks and other mysql engines.
All about the algorithm
Until recently I have not been sharing my performance evaluations that compare MongoRocks with WiredTiger. In some cases MongoRocks performance is much better than WiredTiger and I want to explain those cases. There are two common reasons. First, WiredTiger is a new engine and there is more work to be done to improve performance. I see progress and I know more is coming. This takes time.
The second reason for differences is the database algorithm.An LSM and a B-Tree make different tradeoffs for read, write and space efficiency. See the Rum Conjecture for more details. In most cases an LSM should have better space and write efficiency while a B-Tree should have better read efficiency. But better write and space efficiency can enable better read efficiency. First, when less IO capacity is consumed for writing back database changes then more IO capacity is available for the storage reads done for user queries. Second, when less space is wasted for caching database blocks then the cache hit ratio is higher. I expect the second reason is more of an issue for InnoDB than for WiredTiger because WT does prefix encoding for indexes and should have less or no fragmentation for database pages in cache.
Page write-back is a hard feature to get right for a B-Tree. There will be dirty pages at the end of the buffer pool LRU and these pages must be written back as they approach the LRU tail. Things that need to read a page into the buffer pool will take a page from the LRU tail. If the page is still dirty at the point the thing requesting the page will stall until the page has been written back. It took a long time to make this performant for InnoDB and work still remains. It will take more time to get this right for WiredTiger. Checkpoint and eviction are the steps by which dirty pages are written back for WT. While I am far from an expert on this I have filed several performance bugs and feature requests (and many of them have been fixed). One open problem is that checkpoint is still single threaded. This one thread must find dirty pages, compress them and then do buffered writes. When zlib is used then that is too much work for one thread. Even with a faster compression algorithm I think more threads are needed, and the cost of faster decompression is more space used for the database. Server-16736 is open as a feature request for this.
I have three small servers at home. They used Ubuntu 14.04 at the time, but have since been upgraded to 16.04. Each is a core i3 with 2 CPUs, 4 HW threads and 8G of RAM. The storage is a 120G Samsung 850 EVO m.2 SSD for the database and a 7200 RPM disk for the OS. I like the NUC servers but my next cluster will use a better CPU (core i5) with more RAM.
The benchmark is Linkbench using LinkbenchX from Percona that has support for MongoDB. For WiredTiger and MongoRocks engines this doesn't use transactions to protect the multi-operation transactions. I look forward to multi-document transactions in a future MongoDB release. I use main from my Linkbench fork rather than f rom LinkbenchX to avoid the use of the feature to sustain a constant request rate because that has added too much CPU overhead in some tests.
I ran two tests. First, I used an in-memory database workload with maxid1=2M. Second, I used an IO-bound database with maxid1=40M. By IO-bound I mean that the database is larger than 8G but smaller than 120G and the SSD is very busy during the test. Both tests were run with 2 connections for loading and 1 connection (client) for the query tests. The query tests were run for 24 1-hour loops and the result from the 24th hour is shared. I provide results for performance, quality of service (QoS) and efficiency . Note that for the mmapv1 IO-bound test I had to use maxid1=20M rather than 40M to avoid a full storage device.
The oplog is enabled, sync-on-commit is disabled and WiredTiger/MongoRocks get 2G of RAM for cache. Tests were run with zlib and snappy compression. I reduced file system readahead from 128 to 16 for the mmapV1 engine tests.For MongoRocks I disabled compression for the smaller levels of the LSM. For the cached database, much more of the database is not compressed because of this. I limited the oplog to 2000MB. The full mongo.conf is at the end of this post.
I used MongoDB 3.2.4 to compare the MongoRocks, WiredTiger and mmapv1 engines. I will share more results soon for MongoDB 3.3.5 and I think results are similar. When MongoDB 3.4 is published I will repeat my tests and hope to include zstandard .
If you measure storage write rates and use iostat then be careful because iostat includes bytes trimmed as bytes written. If the filesystem is mounted with discard enabled and the database engine frequently deletes files (RocksDB does) then iostat might overstate bytes written. The results I share here have been corrected for that.
Cached database load
These are the results for maxid1=2M. The database is cached for all engines except mmapV1.
Legend: ips - average inserts/second wKB/i - average KB written to storage per insert measured by iostat Mcpu/i - CPU usecs/insert, measured by vmstat Size - database size in GB at the end of the load rss - mongod process size (RSS) in GB from ps at the end of the load engine - rx.snap/rx.zlib is MongoRocks with snappy or zlib. wt.snap/wt.zlib is WiredTiger with snappy or zlib
Summary:MongoRocks has the worst insert rate. Some of this is because more efficient writes can
本文系统（linux）相关术语:linux系统 鸟哥的linux私房菜 linux命令大全 linux操作系统
本文标题：MongoRocks and WiredTiger versus linkbench on a small server