We all are familiar with file system. It is used to store data and information securely. In a file we usually store same type of information together. Now have you heard about Google File System (GFS)? It is a scalable distributed file system (DFS) created by Google inc. and developed to accommodate Google’s increasing processing necessities. GFS provides fault tolerance, scalability, availability, reliability and performance to connected nodes and large networks.
GFS is formed from many storage systems designed from low-cost commodity hardware elements. Also It’s optimized to accommodate Google’s different data use and storage requirements, like its search engine, that generates huge amounts of data that has to be stored. In Addition The GFS node cluster could be a single master with multiple chunk servers that are continuously accessed by different consumer systems. Chunk servers store information as Linux files on local disks. Stored information is split into massive chunks (64 MB) that are replicated in the network a minimum of three times. The big chunk size reduces network overhead.
Now let us see some of the features of Google File System (GFS):
- Fault tolerance
- Critical information replication
- Automatic and efficient information recovery
- High aggregate output
- Reduced consumer and master interaction due to massive chunk server size
- Namespace management and protection
- High availability
As the number of applications run by Google increased massively,Google’s goal became to create a large storage network out of cheap commodity hardware. Google file system was innovatively created by Google engineers and prepared for production in record time during a span of one year in 2003 that speeded Google’s market thereafter. GFS is the largest file system in operation.
In designing a file system for Google’s wants, they have been guided by assumptions that provide both opportunities and challenges.
- The system is made from several inexpensive commodity parts that usually fail. But even so It should constantly monitor itself and detect, tolerate, and recover promptly from element failures on a routine basis.
- The system stores a modest number of huge files. Sometimes a few million files, each generally 100 MB or larger in size. Multi-GB files are the common case and will be managed efficiently. Small files should be supported, but needn’t optimize for them.
- The workloads even have several massive, consecutive writes that append data to files. Usually, operation sizes are similar to those for reads. Once written, files are seldom changed again. Small writes at arbitrary positions in a file are supported but don’t need to be efficient.
- High sustained bandwidth is more necessary than reduced latency. Most of the target applications place a premium on processing information in bulk at a high rate, while few have tight response time needs for an individual read or write.