数据库 学习笔记

Entity Relationship Model

课程官网:https://www.pdbmbook.com/

Elements of a Database System

  • database model: consists of various data models
  • database state: current set of instance
  • data model: provides a clear and unambiguous description of the data items, their relationships and various data constraints from a particular perspective
    • external data model: contains various subsets of the data items in the logical model, also called views, tailored towards the needs of specific applications or groups of users
    • conceptual data model: provides a high-level description of the data items with their characteristics and relationships; logical data model: a translation or mapping of the conceptual data model towards a specific implementation environment
    • internal data model: represents the data’s physical storage details
  • catalog: contains the data definitions, or metadata (data that describes other data), of your database application; guarantees consistency
  • database users
    • architect: design conceptual data model
    • designer: translate conceptual data model into a logical internal data model
    • administrator: responsible for the implementation and monitoring of the database
    • application developer: develops database applications in a programming language such as Java or Python
    • business user: run applications
  • database languages
    • DDL: data defining language
    • DML: data manipulation language
    • SQL: structured query language

Architecture of DBMS

  • connection manager: provides facilities to setup a database connection
  • security manager verifies whether a user has the right privileges
  • DDL compiler: ideally 3 compilers, one for each layer; upon successful compilation, DDL compilers registers the data definitions in the catalog
  • query processor
    • DML compiler
    • query parser and query rewriter
    • query optimizer
    • query executor
    • storage manager
      • transaction manager
      • buffer manager
      • lock manager
      • recovery manager

Conceptual Data Modeling using Entity Relationship Model

Phases of design:

  • requirement collection and analysis -> database requirements
  • -> conceptual design -> conceptual data model
  • -> logical design -> logical data model
  • -> physical design -> internal data model

ER (entity relationship) model:

  • entity types
  • attribute types
    • domain
    • key attribute types
    • composite attribute types
  • relationship types
  • weak entity types
  • ternary relationship types

Relational Database

Superkey: a subset of attribute types of a relation R with the property that no two tuples in any relation state should have the same combination of values for these attribute types; may have redundant attributes; may not be unique
Candidate key (key): minimal superkey without redundancy; may not be unique
Primary key: selected candidate key used to identify tuples in the relation; unique
Foreign key: primary key from another tuple

Functional Dependency:
A functional dependency X -> Y between two sets of attribute types X and Y, implies that a value of X uniquely determines a value of Y

Normalization Forms

目的:防止不一致现象;防止有效信息被删除;减少冗余

1NF:
Every attribute type of a relation must be atomic and single valued

2NF:
Every non-prime attribute must depend on each whole candidate key, not just part of it. A non-prime attribute is an attribute that is not part of any candidate key of the relation.

3NF:
消除一个关系中的传递依赖(transitive dependency)

如下表格不符合3NF:

原因:Title->Genre ID->Genre name; Title->Author->AuthorNationality

Boyce-Codd Normal Form (BCNF):
A relational schema R is in Boyce-Codd normal form if and only if for every one of its dependencies X->Y, at least one of the following conditions hold:

  • X->Y is a trivial functional dependency
  • X is a superkey for schema R

Trivial dependency: X->Y when y is a subset of X. 用于消除不同的candidate key中的属性互相依赖的情况。

如下表格不符合BCNF:

其中4个属性都属于某个candidate key;court 1的rate type为SAVER和STANDARD,court 2的rate type为PREMIUM-A和PREMIUM-B。
易知Rate type -> Court。

4NF:
Multivalued dependencies: If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependency X->->Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xc,y,z combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z
A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X->->Y, X is a superkey.

用于消除对于有着X, Y, Z三组属性的关系R中,对于某个X,对任意Y都有相同Z的取值的情况。

如下表格不符合4NF:

原因:对于同一家餐厅而言,它的每个菜品配送范围都是一样的

Mapping a conceptual ER Model to a Relational Model

Mapping entity types

Mapping relationship types

  • binary 1:1 relationship type
  • binary 1:n relationship type
  • binary n:m relationship type
  • unary and n-ary relationship type

基本SQL语句

基本语法:

SELECT ... FROM ... WHERE ...

别名:SELECT column_name AS column_alias FROM table_name table_alias;

SELECT id AS FROM demo d;

Aggregation Query: COUNT, MAX, MIN

COUNT后可加DISTINCT

GROUP BY:按某一列分组;每一组只返回一个结果。常用于聚合查询;当用于普通查询时,作用类似于DISTINCT。

HAVING: added to SQL because the WHERE keyword cannot be used with aggregate functions. 常与GROUP BY连用。

IN/ALL/ANY:用于嵌套查询

两种嵌套方式:

  • WHERE后
  • FROM后+别名

增加约束示例:

ALTER TABLE property 
ADD CONSTRAINT property_pro_c_id_fkey FOREIGN KEY (pro_c_id) REFERENCES client(c_id) ON DELETE CASCADE;

删除约束示例:

ALTER TABLE property DROP CONSTRAINT property_pro_c_id_fkey;

插入一行:

INSERT INTO table_name VALUES (...)

更新表中数据:

UPDATE table_name SET column_name=expression WHERE ...

删除表中数据:

DELETE FROM table_name WHERE ...

Extended Relational Databases

Triggers:

CREATE TRIGGER SALARYTOTAL AFTER INSERT ON EMPLOYEE 
FOR EACH ROW 
WHEN (NEW.DNR IS NOT NULL) 
UPDATE DEPARTMENT 
SET TOTAL-SALARY = TOTAL-SALARY + NEW.SALARY 
WHERE DNR = NEW.DNR 
CREATE TRIGGER WAGEDEFAULT BEFORE INSERT ON EMPLOYEE 
REFERENCING NEW AS NEWROW 
FOR EACH ROW 
SET (SALARY, BONUS) = (SELECT BASE_SALARY, BASE_BONUS FROM WAGE WHERE JOBCODE = NEWROW.JOBCODE) 

Stored procedures

CREATE PROCEDURE REMOVE-EMPLOYEES (DNR-VAR IN CHAR(4), JOBCODE-VAR IN CHAR(6)) AS 
BEGIN 
DELETE FROM EMPLOYEE 
WHERE DNR = DNR-VAR AND JOBCODE = JOBCODE-VAR; 
END

递归查询

Physical File Organization and Indexing

rba: random block access
sba: sequential block access

Record organization

  • relative location
  • embedded identification
  • pointers and lists: ideal for dealing with variable length records

Blocking factor: determines how many records are retrieved with a single read operation

  • fixed length records: \(BF=\lfloor BS/RS \rfloor\),where BS stands for block space, RS stands for record space
  • variable length records: average number of records in a block

Primary file organization versus secondary file organization

Primary file organization

  • heap
  • sequential
  • random
  • indexed sequential file organization
    • primary index file organization: sparse, one for each block
    • clustered index file organization: 在相同策略下,由于数据的不同,既有可能是密集索引(每个值都有大量重复),也有可能是稀疏索引(每个值都只出现一次)
    • multilevel indexes
  • list

Secondary indexes

  • sequential index: unique
  • non-unique
    • dense index
    • inverted file
  • hash index
  • bitmap index: types with limited set of values
  • join index: combines types from 2 or more tables

Multilevel indexes -> B tree & B+ tree

Physical Database Organization

Database access methods

  • functioning of query optimizer
    • query cardinality = table cardinality * product(filter factors)
  • index search with atomic search key
  • multiple index and multicolumn index search
    • to cater for all possible search keys, \(\binom n{\lceil n/2\rceil}\) indexes are required.
  • index only access
  • full table scan

Create indexes:

CREATE INDEX index_name
ON table_name (column1_name, column2_name);

Transaction Management

ACID properties:

  • atomicity: multiple database operations that alter the database state can be treated as one indivisible unit of work; success or fail as a whole
  • consistency: a transaction, if executed in isolation, renders the database from one consistent state into another consistent state; do not violate constraints
  • isolation: in situations where multiple transactions are executed concurrently, the outcome should be the same as if every transaction were executed in isolation
  • durability: effects of a committed transaction should always be persisted into the database

Serializable schedules: precedence graph

  • 创建有向边Ti -> Tj ,如果Tj在由Ti写入后读取值
  • 创建有向边Ti -> Tj ,如果Tj在被Ti读取后写入值
  • 创建有向边Ti -> Tj ,如果Tj在由Ti写入后写入值

若此有向图中不存在环,则可以序列化

Lock

  • shared
  • exclusive

Two-phase Locking Protocol

  • growth phrase
  • shrink phrase

Cascading rollback: best way to avoid this, is for all transactions to hold their locks until they have reached the ‘committed’ state.

Deadlock: detected by wait-for graph

Isolation levels:

  • read uncommitted: typically only allow for read-only transactions
  • read committed: uses short-term read locks
  • repeatable read: uses both long-term read locks and write locks
  • serializable: avoid phantom reads

这里需要强调可重复读与可序列化的区别。可重复读不允许并发修改已有内容,但允许加入新内容,因此对原有内容修改时可能会发现新的未被修改的内容出现;可序列化则通过更严格的限制避免这种情况。

Lock granularity

错题

Physical/logical data independence指当前层的改变不影响上层

posted @ 2021-06-03 21:23  Zeonfai  阅读(103)  评论(0编辑  收藏  举报