google三驾马车之一：Bigtable解读（英文版）

本文重点关注了系统设计相关的内容，paper后半部分的具体应用此处没有过多涉及。从个人笔记修改而来，因此为英文版本。

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2

How the map indexed?

(row:string, column:string, time:int64) → string

just like json format, eg:

table{  
  // ...  
  "aaaaa" : { //row
    "A:foo" : { //col
        15 : "y", //timestamp 
        4 : "m"  
      },  
    "A:bar" : { //col
        15 : "d",  
      },  
    "B:" : { //col
        6 : "w"  
        3 : "o"  
        1 : "w"  
      }  
  },  
  // ...  
}

a particular table: webtable

row(also called tablet): reversed URL

concurrent: single row key is atomic
lexicographic order
col: column families, contents

family:qualifier
Access control and both disk and memory accounting
timestamp

avoid collisions: unique timestamp, decreasing order
garbage-collection mechanism(eg.)

API

C++ read/write

MapReduce + Bigtable

Building Block

Google File System: store log and data files

distributed Google File System

Google SSTable file format: store Bigtable data

K-V map: iterate key/value pairs in a specified key range

a sequence of blocks
a block index

disk seek or memory seek?

Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.

Chubby: distributed lock service

5 active replicas: 1 master, 4 slave

Paxos algorithm: to keep its replicas consistent in the face of failure

namespace: including directory and small file, op r/w is atomic

session: when expires, lose locks and open handles

Implementation

consist:

library(?) linked to every client
1 master server(schedule, garbage-collect......)
many tablet server(10-1000 tablets)

As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.

hierarchy (B⁺-tree)

Chubby file -> Root tablet -> other METADATA tablets -> UserTables

METADATA: many other things stored in it

Master: schedule & manage

Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.

The essential point for distributed database: lock

The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)

Tablet Representation

memtable: the recently committed updates are stored in memory in a sorted buffer

reconstruct: redo points in commit logs

Compactions

As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.

minor(memtable) -> major(SSTable) compaction

locality group

Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.

This section describes portions of the implementation in more detail in order to highlight these refinements.

in-memory locality groups are loaded lazily

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability

Performance Evaluation

Lesson

large distributed systems are vulnerable to many types of failures
it is important to delay adding new features until it is clear how the new features will be used
the importance of proper system-level monitoring
the value of simple designs

posted @ 2023-10-16 19:54 真昼小天使daisuki 阅读(25) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

google三驾马车之一：Bigtable解读（英文版）

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

How the map indexed?

API

Building Block

Google File System: store log and data files

Google SSTable file format: store Bigtable data

disk seek or memory seek?

Chubby: distributed lock service

Implementation

hierarchy (B⁺-tree)

Master: schedule & manage

Tablet Representation

Compactions

Refinement

locality group

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability

Performance Evaluation

Lesson

公告

google三驾马车之一：Bigtable解读（英文版）

Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

How the map indexed?

API

Building Block

Google File System: store log and data files

Google SSTable file format: store Bigtable data

disk seek or memory seek?

Chubby: distributed lock service

Implementation

hierarchy (B+-tree)

Master: schedule & manage

Tablet Representation

Compactions

Refinement

locality group

storage: compression

read performance: caching

Bloom filters

commit-log

Speeding up tablet recovery

Exploiting immutability

Performance Evaluation

Lesson

公告

hierarchy (B⁺-tree)