Sitemap

Major League Hacking

The world’s largest community of early career developers, helping to bridge the gap between classroom learning and real world technical skills through hackathons, conferences, and our flagship Fellowship prgoram.

Load your Data in Minutes with DataStax Bulk Loader

4 min readDec 3, 2020

--

Press enter or click to view image in full size

So you’ve read my previous piece on Getting Started with DataStax Astra and now you’re looking to load some bulk data to your new database. Well search no further! By popular demand, I’ve got the exact walkthrough you need for implementing an existing DataStax solution: DataStax Bulk Loader or DSBulk for short.

What is DataStax Bulk Loader?

Well, it’s essentially a file containing some Java commands. These commands allow you to load, unload and count CSV or JSON data to and from the varying database solutions currently available through DataStax in addition to standalone open source Apache Cassandra (versions 2.1 and later). For our purposes, we’ll be using DSBulk to load a CSV or JSON file onto our existing DataStax Astra database (if you don’t have a DataStax Astra cluster set up, feel free to check out my getting started guide linked up above).

Prerequisites

In addition to an existing DataStax Astra cluster and downloading DSBulk from the DataStax website, you’ll need to have Java 8 or later installed on your machine along with the Java JDK. You can find both at the Oracle website available for free download, just be sure to sign up for an account.

Getting Started

With those prerequisites out of the way, let’s take a look at that DSBulk zip file you should already have sitting in your download folder. Feel free to double click and unpack the zip file. From there, open the dsbulk-1.7.0 folder and finally the bin folder. You’ll need to right click the dsbulk file and copy-paste the path into your text editor of choice. I use VSCode, but any text editor will do.

Press enter or click to view image in full size
Grab the path for dsbulk. Tip: You can place all of your files into a single folder for easier access.

Next, you’ll want to navigate to your CSV or JSON file, copy the file path and paste that into your text editor as well.

The last file path you’ll need is that of your Secure Connect Bundle zip file. If you did not download the Secure Connect Bundle from my previous article, no worries, simply head back to your DataStax Astra UI and click the green ‘Connect to Database’ button. You’ll see several options on the left hand side beneath the ‘Connect using a driver’ header. You can select any of the provided languages and click the ‘Secure Connect Bundle’ link shown.

Press enter or click to view image in full size

Once that bundle is downloaded, go ahead and navigate to your download folder, right click the zip file, copy the file path and paste it into your text editor.

Get Rosendo Pili’s stories in your inbox

Join Medium for free to get updates from this writer.

With the paths for your DSBulk file, CSV or JSON file, and the secure connect bundle zip file in place, the last remaining bits of information you’ll need to gather are your keyspace name, table name, database username and password. You can find everything you need in the DataStax Astra UI. Once you’ve got all your information in order, use it to fill in the following command in your text editor:

[DSBulk file path] \

load --connector.csv.url [CSV or JSON file path] \

--schema.keyspace [name of keyspace] --schema.table [name of table] \

--driver.basic.cloud.secure-connect-bundle [secure connect bundle zip file path] \

-u [database username] -p [ database username password]

Please note that the final product should not contain any of the brackets from the example provided.

Once you’ve input all of your information, you can copy and paste that command directly into your Bash terminal and have the contents of your CSV or JSON file uploaded in moments.

Press enter or click to view image in full size
You should receive a similar prompt with a brief summary of your transaction.

Wrapping up

If you want to double check that the upload worked, head back to the DataStax Astra UI and click on the CQL Console tab. After verifying your identity, use the SELECT * FROM [your keyspace].[your table]; command to view all of your recently uploaded data.

Congratulations! You have successfully used DataStax Bulk Loader to seed your database. I’m excited to see what you build next and look forward to seeing your work at our next MLH event. Don’t forget to submit your project to the Best Use of DataStax Astra prize category!

If you want to learn more, check out the DataStax Bulk Loader for Apache Cassandra documentation along with the open-source GitHub repository.

--

--

Major League Hacking
Major League Hacking

Published in Major League Hacking

The world’s largest community of early career developers, helping to bridge the gap between classroom learning and real world technical skills through hackathons, conferences, and our flagship Fellowship prgoram.

Rosendo Pili
Rosendo Pili

Written by Rosendo Pili

Customer Success Manager at Major League Hacking