Skip to content

glynnbird/cloudantimport

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudantimport

Introduction

When populating Cloudant databases, often the source of the data is initially some JSON documents in a file.

cloudantimport is designed to assist with importing such data into Cloudant efficiently. Simply pipe a file full of JSON documents into cloudantimport, telling it the database to send the data to and it will group the documents into batches and employ Cloudant's bulk import API.

Installation

You will need to download and install the Go compiler. Clone this repo then:

go build ./cmd/cloudantimport

The copy the resultant binary cloudantimport (or cloudantimport.exe in Windows systems) into your path.

Configuration

cloudantimport authenticates with your chosen Cloudant service using environment variables as documented here e.g.

CLOUDANT_URL=https://xxxyyy.cloudantnosqldb.appdomain.cloud
CLOUDANT_APIKEY="my_api_key"

Usage

Pipe a JSON file (one document per line) into cloudantimport and supply the database you want to write to using the --dbname/--db parameter:

cat myfile.json | cloudantimport --db mydb

By default, only one bulk write API call is in flight at any one time. This can be increased with the --concurrency/--c option

# import data with a maximum of 5 bulk write API calls in flight at once
cat myfile.json | cloudantimport --db mydb --concurrency 5

Generating random data

cloudantimport can be paired with datamaker to generate any amount of sample data:

# template ---> datamaker ---> 100 JSON docs ---> cloudantimport ---> Cloudant
echo '{"_id":"{{uuid}}","name":"{{name}}","email":"{{email true}}","dob":"{{date 1950-01-01}}"}' | datamaker -f json -i 100 | cloudantimport --db people
written {"docCount":100,"successCount":1,"failCount":0,"statusCodes":{"201":1}}
written {"batch":1,"batchSize":100,"docSuccessCount":100,"docFailCount":0,"statusCodes":{"201":1},"errors":{}}
Import complete

or with the template as a file:

cat template.json | datamaker -f json -i 10000 | cloudantimport --db people

Understanding the output

The output comes in two parts. Firstly, one line per bulk write request made to stderr:

2025/11/20 09:51:49 201 176 500 0
2025/11/20 09:51:49 201 165 500 0
2025/11/20 09:51:50 201 165 500 0

This shows the date/time, HTTP status code, latency (ms), number of documents successfully written and the number that failed.

Then at the end comes a summary to stdout:

{"statusCodes":{"201":20},"errors":{"conflict":10},"docs":9990,"batches":20}

which lists the counts of each HTTP status code, counts of document write errors, total docs written and total number of bulk write API calls.

How does it work?

To remind myself of what's going on, this diagram helps:

diagram

About

A Go command-line utility to import JSON data into a Cloudant database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages