未加星标

Importing HIBP's pwned password list in to DynamoDB

字体大小 | |
[数据库(综合) 所属分类 数据库(综合) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Troy Hunt recently introduced HIBP Passwords , a freely downloadable list of over 300 million passwords that have been pwned in the various breaches the site records. There is an API to access the list for auditing and checking passwords, but it's rate limited, and I thought it would be more friendly to import the passwords in to a database we control. It looks like HIBP uses Azure Table Storage to make the data quickly accessible, I do most of my work on AWS so I thought I'd take a look at importing the hashes in to DynamoDB , it's relatively cheap and easy to use.

The first thing I tried was to follow a tutorial using AWS Data Pipeline , which seemed to be exactly what I needed. In the end though, I found the tutorial made a few assumptions that I either missed or they failed to mention, mostly around the expected data format for the CSV files. Thankfully though, this lead me towards using AWS Elastic Map Reduce directly and this turned out to be the winning formula.

To start with, I needed to get the data from HIBP and upload it to S3. I think this method would have worked if the source files had been gzipped, but I wasn't too sure about 7zipped, so I decompressed them before uploading them to S3.

mkdir hibp-passwords cd hibp-passwords wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-1.0.txt.7z wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-1.txt.7z wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-update-2.txt.7z 7zr e pwned-passwords-1.0.txt.7z 7zr e pwned-passwords-update-1.txt.7z 7zr e pwned-passwords-update-2.txt.7z rm *.7z aws s3 mb s3://hibp-passwords-123 aws s3 sync ./ s3://hibp-passwords-123

I created a DynamoDB table to hold the data. It only needs the one column, which will also be the partition key.


Importing HIBP's pwned password list in to DynamoDB

Now adjust the write capacity to allow our import to go reasonably quickly. I set the capacity to 10000 units, which I think was the maximum without having to asking AWS to lift the limits.


Importing HIBP's pwned password list in to DynamoDB

We then need to create our EMR cluster. I experimented with different sizes to begin with, but settled on a cluster of 16 c4.8xlarge instances. I left the rest of the settings as the defaults. Between this and the write capacity we set on the DynamoDB table, this meant the import took around 10 hours. There's probably a better combination of cluster size and write capacity, but it was good enough for me.


Importing HIBP's pwned password list in to DynamoDB

No we're all good to go. Log in to the EMR master node, install tmux and fire up Hive .

ssh -i ~/.ssh/your-key.pem hadoop@your-master-node-public-url sudo yum install tmux tmux hive

We need to tell Hive to use as much DynamoDB write capacity as it sees fit, no other resources are currently trying to access the table.

> SET dynamodb.throughput.write.percent=1.5;

We then create our first Hive table, by telling it where to find our data.

> CREATE EXTERNAL TABLE s3_hibp_passwords(a_col string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://hibp-passwords-123/'; OK Time taken: 3.371 seconds

The first time I tried running the import, I ran in to trouble with it trying to enter duplicate items in to the DynamoDB table. A bit of searching lead me to a stackoverflow question and a means to create another table, but with only unique hashes.

> CREATE TABLE s3_hibp_passwords_dedup AS SELECT a_col FROM (SELECT a_col, rank() OVER (PARTITION BY a_col) AS col_rank FROM s3_hibp_passwords) t WHERE t.col_rank = 1 GROUP BY a_col; Query ID = hadoop_20170810144127_818de1ea-0c47-4a16-a5bf-5b77e3336d69 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (Executing on YARN cluster with App id application_1502375367082_0202) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 257 257 0 0 0 0 Reducer 2 ...... container SUCCEEDED 53 53 0 0 0 0 Reducer 3 ...... container SUCCEEDED 27 27 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 105.63 s ---------------------------------------------------------------------------------------------- Moving data to directory hdfs://ip-172-31-43-51.eu-west-1.compute.internal:8020/user/hive/warehouse/s3_hibp_passwords_dedup OK Time taken: 112.395 seconds

Now we need to create another hive table, this one will be backed by our DynamoDB table.

> CREATE EXTERNAL TABLE ddb_hibp_passwords (col1 string) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "pwned_passwords", "dynamodb.column.mapping" = "col1:sha1"); OK Time taken: 0.925 seconds

That's it, we're all ready to go. This is going to take some time, so set it running and go to bed.

> INSERT OVERWRITE TABLE ddb_hibp_passwords SELECT * FROM s3_hibp_passwords_dedup; Query ID = hadoop_20170810144407_872afa4b-ca88-4b57-a284-5496fcb0d30f Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1502375367082_0002) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 254 254 0 0 18 0 ---------------------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 59056.94 s ---------------------------------------------------------------------------------------------- OK Time taken: 59058.426 seconds hive>

I don't know what happened with the failures, I didn't bother checking. I know I ended up with 319,169,449 hashes in the database, so good enough for me.


Importing HIBP's pwned password list in to DynamoDB

Back in the terminal, we can quickly check to see how this is going to work. Note that I shifted the hash to uppercase, the HIBP source data hashes are all uppercase.

davem@wes:~$ SHA1DAVE=$(echo -n dave | sha1sum | tr "[a-z]" "[A-Z]" | awk '{ print $1 }') davem@wes:~$ echo $SHA1DAVE BFCDF3E6CA6CEF45543BFBB57509C92AEC9A39FB davem@wes:~$ aws dynamodb get-item --table-name hibp_passwords --key "{\"sha1\": {\"S\": \"$SHA1DAVE\"}}" { "Item": { "sha1": { "S": "BFCDF3E6CA6CEF45543BFBB57509C92AEC9A39FB" } } } real 0.46 user 0.30 sys 0.04 davem@wes:~$

Have fund auditing your passwords.

本文数据库(综合)相关术语:系统安全软件

主题: HiveTISUUC
分页:12
转载请注明
本文标题:Importing HIBP's pwned password list in to DynamoDB
本站链接:http://www.codesec.net/view/561406.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(48)