Big Data Engineering

Let us take a look at two important evolving ecosystems; Big data engineering for storing and analyzing non-relational databases and data analysis for extracting insights from the data.

Big Data Engineering: For storing the large amounts of data in the first place we do not need large relational databases. They are expensive. A new paradigm called NoSQL is being developed for unrelated databases of excessive size. The goal is not consistency but fails over and speeds. The current options are Cassandra, MongoDB, CouchDB, Redis, Riak, HBase, Couchbase, Neo4j, Hypertable, ElasticSearch, Accumulo, VoltDB, Scalaris.

HBase& Cassandra are columnar/table oriented databases. As HBASE is an Apache product, maybe it is best integrated with Hadoop. Cassandra is more used for a highly distributed data store Cassandra is a good choice (e.g.Netflix). Shopping cart and other table oriented applications are more suited for columnar dbs. Mongo provides document oriented storage. A hierarchical, text-oriented use case (blogs, comments, even product catalogue) is better suited for document oriented dbs. Mongo enables complex queries. Couchbase combines both columnar and document oriented architectures.

For analytics on the data, we need the Hadoop framework comprising of HDFS and Map Reduce engines. PIG has been developed to create a scripting like environment for Hadoop applications. It makes programming Hadoop very easy. HIVE is an SQL like query engine that again makes querying in Hadoop lot easier.

Eco Systems

Leave a comment