Import CSV data into DynamoDB.
- Comma separated (CSV) files
- Tab separated (TSV) files
- Large file sizes
- Local files
- Files on S3
- Parallel imports using AWS Step Functions to import > 4M rows per minute
- No depdendencies (no need for .NET, Python, Node.js, Docker, AWS CLI etc.)
This program will use up all available DynamoDB capacity. It is not designed for use against production tables. Use at your own risk.
Download binaries for MacOS, Linux and Windows at https://github.com/a-h/ddbimport/releases
A Docker image is available:
docker pull adrianhesketh/ddbimport
ddbimport -inputFile ../data.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport
ddbimport -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport
ddbimport -remote -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport
ddbimport -install -stepFnRegion=eu-west-2
Inserts per second of the Google ngram 1 dataset (English).
To reproduce my results.
aws dynamodb create-table \
--table-name ddbimport \
--attribute-definitions AttributeName=ngram,AttributeType=S AttributeName=year,AttributeType=N \
--key-schema AttributeName=ngram,KeyType=HASH AttributeName=year,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST
curl http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-1M-1gram-20090715-0.csv.zip -o 0.csv.zip
# Add the headers.
echo "ngram year match_count page_count volume_count" > data.csv
# Prepare the data.
unzip 0.csv.zip
cat googlebooks-eng-1M-1gram-20090715-0.csv >> data.csv
rm googlebooks-eng-1M-1gram-20090715-0.csv
Learn about the project here:
- https://www.youtube.com/watch?v=UOuJWFBggEY
- https://infinityworks.com/insights/importing-data-into-dynamodb/
Ensure you have $GOPATH/bin in $PATH (by default that is ~/go/bin). This is needed for statik (https://github.com/rakyll/statik) to package the Serverless application into the ddbimport binary.
Install a supported version (v12 seems to work fine) of Node.js and NPM (https://www.npmjs.com/get-npm) or Yarn (https://classic.yarnpkg.com/en/docs/install/#mac-stable).
git clone git@github.com:a-h/ddbimport; cd ddbimport- Edit
version/version.goand set the Version const to a non-empty value. Without this, installation in steps 7-8 will fail. yarn global add serverlessornpm -g install serverless, whichever you prefer.sls plugin install -n serverless-step-functionsmake -C sls packagego build -o ddbimport cmd/main.go. This is your main binary.- Run
./ddbimport -install -stepFnRegion your-regionand wait a minute or so. You may check the CloudFormation console, a stack namedddbimportshould now be created. - Run the same command again. This will now upload the binary that contains two Lambda function handlers, and setup the actual step function. If this fails, complaining about S3 key not found, you probably skipped step 2.

