未加星标

Yes I had to migrate 6 million records from DynamoDB to MongoDB

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二04 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

Yes I had to migrate 6 million records from DynamoDB to MongoDB
Image Ref: lynda.com Why Migrate?

Before going into details some of you might wonder why on earth we need to migrate from DynamoDB to MongoDB in the first place, since both are well performing, stable and reliable NoSQL database engines. However both have their pros and cons based on the use case we have. In my case following are the highlighted features of MongoDB that lead me to make the decision to migrate.

Powerful query engine (specially the aggregation pipeline) No throughput limits (DynamoDB’s provisioned throughput was a real pain) Friendly pricing scheme offered by MongoDB Atlas

You can read a more detailed comparison of DynamoDB vs MongoDB from here .

Migration Options

So now the decision has been made. Woooh! Wait!! The fun part is yet to come:) The main challenge is not making the decision but migrating the data we had been collecting on DynamoDB for about 3 years. It’s not that big but I would say big enough to make the migration task painful. The largest table had a storage size of 10GB.

There isn’t a natively supported way from either of the database engines (atleast by the time I was writing this article) to migrate data directly, but using the export and import options provided by each database engine we can use the following methods to migrate data.

Use the Export to.csv option from the AWS Console and import to MongoDB using mongoimport command (You can get more details on this from this blog post ). If you are using a client like MongoDB Compass , you can do this from the GUI itself. Use the AWS Data Pipeline to export the DynamoDB table to S3 using AWS EMR , you can use the predefined template named Export DynamoDB table to S3 for this (For detailed steps follow this tutorial ). Then you can import the data to MongoDB using the mongoimport command.

Apart from the above methods, the other straight forward method is,

Write a program to read the table using the DynamoDB scan operation and write the data using the MongoDB query operation.

In my case I had to go for the last method above since the first two methods didn’t work out for me. My table had more than 6 million records and Export to.csv supports only exporting the items loaded on the console and the maximum records we can load to the console at a time is 100. I tried exporting to S3 using EMR as well, but couldn’t get that to work in the first attempt, however I didn’t try any further since I wanted to transform data before writing to MongoDB and felt like it’s more hassle to read the data again from S3 and write back.

How I did it?

Now starts the fun part. I wanted to implement this as a node module so that others also can use it. It’s now published to NPM registry, and you can find it by the name dynamodb-mongodb-migrate . After several testing attempts to migrate on my local machine. I thought why not run it on the cloud closer to the database servers so that the latency would be reduced.

There are so many ways to run a batch process like this on the cloud, however since I wanted to try out AWS Batch , I decided to run this as a batch job in AWS Batch . You can get more details on getting started with AWS Batch from here .

The main concern when running a migration process like this is to make sure it does not interfere with the regular services that use the databse table, since the production system is still using it (at-least in my case it was). The main challenge here is handling the DynamoDB throughput limits properly. We have to add a rate limiter to the migration service to maintain a predefined throughput during the migration. I found an article from AWS Developer Blog on Rate limited Scans in DynamoDB , which explains a similar implementation using Java. I have used the same methodology in the NodeJS module as well. You can access the source code from this Github repository in case you want to see how it’s implemented.

Running the AWS BatchJob

Okay now we have to use this module in a AWS Batch job, for that we just need a docker image which would run a simple nodejs process that will,

Refer the module Run the migration Gracefully exit the process

To make your life easy, I did implement the process and in addition to the above steps I added one additional step that will download a metadata file from Amazon S3 and refer it before running the migration. This file contains the following

Filter function ― Filter the source records returned from the scan operation and select only a subset DynamoDB filter expression ― Filter records while applying the DynamoDB scan operation Transformation function ― Change the shape of the data if needed

I have published the above process as a docker image in docker hub. You can refer this directly in your AWS Batch job definition. Refer the AWS Batch user guide for more details on how to write the job definition.

Let’s do somemath

There is one more thing, How much time will it take to complete the migration? For this I did a simple calculation as described in the blog post on Rate limited Scans in DynamoDB . Using this calculation we can decide how much throughput we need to allocate for the table and the process based on the time it takes.

Total storage size of the dynamodb table = 10.71 GB

Total read capacity units required = 10.71 * 1024 (MB / GB) * 1024 (KB / MB) / 2 (Scan performs eventually consistent reads, which are half the cost.) / 4 (Each 4 KB of data consumes 1 read capacity unit.) = 1,403,781.12

If we allocate 1000 read capacity units

Ti

本文数据库(综合)相关术语:系统安全软件

代码区博客精选文章
分页:12
转载请注明
本文标题:Yes I had to migrate 6 million records from DynamoDB to MongoDB
本站链接:https://www.codesec.net/view/611018.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(183)