GitHub - a-h/ddbimport: Fast DynamoDB imports

ddbimport

Import CSV data into DynamoDB.

Features

Comma separated (CSV) files
Tab separated (TSV) files
Large file sizes
Local files
Files on S3
Parallel imports using AWS Step Functions to import > 4M rows per minute
No depdendencies (no need for .NET, Python, Node.js, Docker, AWS CLI etc.)

Warning

This program will use up all available DynamoDB capacity. It is not designed for use against production tables. Use at your own risk.

Installation

Download binaries for MacOS, Linux and Windows at https://github.com/a-h/ddbimport/releases

A Docker image is available:

docker pull adrianhesketh/ddbimport

Usage

Import local CSV from local computer:

ddbimport -inputFile ../data.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Import S3 file from local computer:

ddbimport -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Import S3 file using remote ddbimport Step Function

ddbimport -remote -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Install ddbimport Step Function

ddbimport -install -stepFnRegion=eu-west-2

Benchmarks

Inserts per second of the Google ngram 1 dataset (English).

To reproduce my results.

Create a DynamoDB table

aws dynamodb create-table \
  --table-name ddbimport \
  --attribute-definitions AttributeName=ngram,AttributeType=S AttributeName=year,AttributeType=N \
  --key-schema AttributeName=ngram,KeyType=HASH AttributeName=year,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

Download Google data

curl http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-1M-1gram-20090715-0.csv.zip -o 0.csv.zip

Prepare the data

# Add the headers.
echo "ngram	year	match_count	page_count	volume_count" > data.csv
# Prepare the data.
unzip 0.csv.zip
cat googlebooks-eng-1M-1gram-20090715-0.csv >> data.csv
rm googlebooks-eng-1M-1gram-20090715-0.csv

Resources

Learn about the project here:

Building from source

Ensure you have $GOPATH/bin in $PATH (by default that is ~/go/bin). This is needed for statik (https://github.com/rakyll/statik) to package the Serverless application into the ddbimport binary.

Install a supported version (v12 seems to work fine) of Node.js and NPM (https://www.npmjs.com/get-npm) or Yarn (https://classic.yarnpkg.com/en/docs/install/#mac-stable).

git clone git@github.com:a-h/ddbimport; cd ddbimport
Edit version/version.go and set the Version const to a non-empty value. Without this, installation in steps 7-8 will fail.
yarn global add serverless or npm -g install serverless, whichever you prefer.
sls plugin install -n serverless-step-functions
make -C sls package
go build -o ddbimport cmd/main.go. This is your main binary.
Run ./ddbimport -install -stepFnRegion your-region and wait a minute or so. You may check the CloudFormation console, a stack named ddbimport should now be created.
Run the same command again. This will now upload the binary that contains two Lambda function handlers, and setup the actual step function. If this fails, complaining about S3 key not found, you probably skipped step 2.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
batchwriter		batchwriter
cmd		cmd
csvtodynamo		csvtodynamo
log		log
sls		sls
version		version
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmarks.png		benchmarks.png
data.csv		data.csv
go.mod		go.mod
go.sum		go.sum
import.gif		import.gif
push-tag.sh		push-tag.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ddbimport

Features

Warning

Installation

Usage

Import local CSV from local computer:

Import S3 file from local computer:

Import S3 file using remote ddbimport Step Function

Install ddbimport Step Function

Benchmarks

Create a DynamoDB table

Download Google data

Prepare the data

Resources

Building from source

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ddbimport

Features

Warning

Installation

Usage

Import local CSV from local computer:

Import S3 file from local computer:

Import S3 file using remote ddbimport Step Function

Install ddbimport Step Function

Benchmarks

Create a DynamoDB table

Download Google data

Prepare the data

Resources

Building from source

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages