Tuesday, October 31, 2006

Paper: "Bigtable: A Distributed Storage System for Structured Data"

Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Google, Inc.

Abstract

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

2 Comments:

Blogger Navendu Jain said...

I understand that Bigtable doesn't aim to provide all the functionalities of a database system. However, for applications that do want to enforce data integrity constraints such as referential integrity -- how are these handled in Bigtable?

Further, since the tablets store only a subset of columns for any given row, does enforcing these constraints across columns that span multiple tablets lead to large overheads?

1:13 PM  
Blogger Wilson said...

Among your exising apps, how many of them leverage the ordering of rows? The design could be simpler if you don't have to organize rows in order.

I don't have any numbers, but a fair number of them do. Any data set where spatial locality is desired (for example, the Webtable described in the paper) will choose row keys appropriately.

10:13 AM  

Post a Comment

<< Home