MapReduce VS RDBMs

Now, the RDBMs – Rational Database Management Systems, is good for updating a small portion of a big database. RDBMs uses a traditional B-Tree, which is highly dependent on the time required to perform seek operations.

Compared to this, MapReduce is good for updating all, or a majority of a big database. MapReduce uses sort and merge to rebuild the database which depends more on transfer operations.

RDBMs is good for applications that required the data sets if the database to be very frequently updated such as in point queries or small dataset updates. MapReduce is better for WORM, which is a write once and read many times based data applications.

Now, MapReduce is a complementary system to the traditional RDBMs system.

Here are some of the characteristics compared.

Data types

In order to understand the data types, we need to understand what types of data are actually existing and what their characteristics are.

  • Structured data is that has a formal defined structure, such as XML documents or database tables. Semi-structure is data that has looser format, where the data structure is used as a guide, and may be ignored.

  • Unstructured data is data that dose not have and formal structure, plain text, or image data. These are the things that we may use in our social network, sending images and also text massages, or what we will use in Google(Baidu) to search out various things.

MapReduce is very effective on unstructured and semi-structured data. MapReduce interprets that data during the data processing session. It dose not use intrinsic properties of data as input keys or input values. The parameters used are selected by the person analyzing the data. And MapReduce has a programming model that is linearly scalable.

MapReduce uses two very important functions. One is the map function, the other is the reduce function. And both of these functions define a key-value pair mapping relationship such like key-value_1 -> key-value_2.

Here are some of the Hadoop release series characteristics.

In 1.X, you can have secure authentication, old configuration names, and also old MapReduce APIs. In addition, MapReduce 1 runtime, the classic version is included. In the 0.22 version, it includes new configuration names, and the old MapReduce APIs. And the new MapReduce APIs are also included.

And it includes MapReduce 1 runtime, the classic version. In the 2.X version, which are the more recent ones, secure authentication, new configuration names, old MapReduce APIs, and new MapReduce APIs, they are all included. Some of the more advanced characteristics include MapReduce 2 runtime YARN, HDFS federation, and HDFS high-availability. The Hadoop release series 2.X, includes several major new features,

MapReduce 2 is the new MapReduce runtime implemented on a new system called YARN. YARN stands for yet another resource negotiator. YRAN is a great resource management system for running distributed applications.

HDFS federation partitions the HDFS namespace across multiple namenodes. It enables improved support for clusters with very large number of files. The HDFS high-availability feature uses standby namenodes for backup, and therefore the namenode is no longer a potential SPOF, which is a single point of failure.

Epilogue

The next note is about the HDFS’s details, and I will turn it into passage and push the blog if I have enough time :).

And the last note is a short introduction about the Hadoop, I’m too busy now to update it into passage because I have to finish the FGO activity.