SQL databases Vs Hadoop

Comparing SQL databases and Hadoop

Hadoop is a framework for processing data but what makes it better than standard relational databases? One reason is that SQL (structured query language) is by design targeted at structured data while Hadoop deals with unstructured data such as text, xml, image, json, pdf, doc etc. With that in mind, let’s look at a more detailed comparison of Hadoop with typical SQL databases on specific dimensions.

SCALE-OUT INSTEAD OF SCALE-UP

Scaling relational databases is costly. Their design is more friendly to scaling up. To run a bigger database you need to buy a bigger machine. Unfortunately, at some point there might not be a big enough machine available for large amount of data. Moreover, the high-end machines are not cost effective. For example, a machine with four times the power of a standard PC costs a lot more than putting four such PCs in a cluster. Hadoop is designed to be a scale-out architecture operating on a cluster of commodity hardware.

KEY/VALUE PAIRS INSTEAD OF RELATIONAL TABLES

A fundamental trait of relational databases is that data resides in structured format in tables having relational structure defined by a schema but many modern applications deal with data types that are not structured e.g. text documents, images, and XML files etc. Hadoop uses key/value pairs as its basic data unit, which is flexible enough to work with the unstructured data types.

FUNCTIONAL PROGRAMMING (MAPREDUCE) INSTEAD OF DECLARATIVE QUERIES (SQL)

SQL is fundamentally a high-level declarative language. You query data by stating the result you want and let the database engine figure out how to derive it. Under MapReduce you specify the actual steps in processing the data, which is more analogous to an execution plan for a SQL engine . Under SQL you have query statements; under MapReduce you have scripts and codes. MapReduce allows you to process data in a more general fashion than SQL queries. For example, you can build complex statistical models from your data or reformat your image data. SQL is not well designed for such tasks.

On the other hand, when working with data that do fit well into relational structures, some people may find MapReduce less natural to use. Those who are accustomed to the SQL paradigm may find it challenging to think in the MapReduce way. But note that many extensions are available to allow one to take advantage of the scalability of Hadoop while programming in more familiar paradigms e.g. Pig, Hive etc.

 

Leave a Reply

Your email address will not be published. Required fields are marked *