Google’s Big table
Traditional relational databases introduce a view that’s composed of multiple tables. Each with rows and named columns. For example a table of students might include a student’s name, ID number, and contact info. In fact A table of grades might include a student’s ID number, course number, and grade. We will construct a query that extracts grades by name by checking out the ID number. In the student table then matching that ID number in the grade table.
Here is a new idea of storing data in table. This is called Big Table. Do you know what a Google’s Big Table is? Big Table is a distributed storage system that’s structured as a large table:. One that will be petabytes in size and distributed among tens of thousands of machines. It’s designed for storing items like billions of URLs, with several versions per page; over 100 TB of satellite image data; hundreds of millions of users; and performing thousands of queries a second.
Big Table is designed with semi-structured information storage in mind. It’s a large map that’s indexed by a column key, row key, and a timestamp. Furthermore each value inside the map is an array of bytes that’s interpreted by the application. Every read or write of information to a row is atomic, regardless of what number different columns are read or written within that row.
Google Big Table is the database for applications like the Google personalized Search, Google App Engine Datastore, Google Analytics and Google Earth. As a matter of fact Google has maintained the software as a proprietary, in-house technology. However, Big Table has had a large impact on NoSQL database designThe developers publically disclosed Big Table details during a technical paper presented at the USENIX symposium on operating Systems and design Implementation in 2006.
Big Table Cluster
A Big table cluster usually operates in a shared pool of machines that run a large variety of other distributed applications, and Big table processes usually share identical machines with processes from other applications. Moreover Big table is based on a cluster management system for managing resources on shared machines, scheduling jobs, dealing with machine failures, and observation machine status.
Big Table depends on a highly-available and persistent distributed lock service known as chubby. A chubby service consists of five active replicas, one of that is elected to be the master and actively serve requests. The service is active when a majority of the replicas are running and may communicate with each other.
There are certain characteristics for a Big Table. Now let us see some of these:
A map is an associative array; a data structure that permits one to look up a value to a corresponding key quickly. BigTable may be a collection of (key, value) pairs wherever the key identifies a row and therefore the value is the set of columns.
The data can be stored persistently on a disk.
The data stored in Big Table is distributed among several independent machines.
The table is said to be sparse, which means that different rows in a table could use different columns, with several of the columns empty for a specific row.
Most associative arrays aren’t sorted. A key is hashed to a position in a table. Moreover BigTable arranges its information by keys. Also this helps keep related information close together, typically on identical machine – assuming that one structures keys in such some way that sorting brings the information together.
Time is another factor in BigTable information. Each column family might keep multiple versions of column family information. Moreover If an application doesn’t specify a timestamp, it’ll retrieve the latest version of the column family. Or else, it will specify a timestamp and obtain the most recent version that’s earlier than or equal to that timestamp.