All notes
DbComm

# Concepts

## CAP theorem

Eric Brewer, CAP theorem: consistency, availability, partition. It's good to achieve two of them. It's hard to achieve them all (Google Spanner has achieved with atom clock?).

• Consistency means that each client always has the same view of the data.
• Availability means that all clients can always read and write.
• Partition tolerance means that the system works well across physical network partitions.

## BASE, ACID

BASE vs ACID. A BASE system gives up on consistency. The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem. stackOverflow: BASE.

• Basically available indicates that the system does guarantee availability, in terms of the CAP theorem.
• Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.

## OLTP, OLAP

On-line Transaction Processing (OLTP).

IT systems can be divided into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses (INSERT, UPDATE, DELETE), whereas OLAP systems help to analyze it (Query, Table joining). [http://datawarehouse4u.info/OLTP-vs-OLAP.html](OLTP vs OLAP).

# Design pattern

## Normal Forms (NFs)

Normal forms were proposed by Edgar Codd, in a 1971 conference paper.

### First normal form, 1NF

Domain of each attribute contains only atomic values, and each attribute contains only a single value from that domain. Reference in Wikipedia.

The table is not 1NF (Wright's phone number has 2 values):

Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
555-776-4100
789 Maria Fernandez 555-808-9633

But this one is 1NF:

Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
456 Jane Wright 555-776-4100
789 Maria Fernandez 555-808-9633

### Second normal form, 2NF

The table is in 1NF, and no non-prime attribute is dependent on any proper subset of any candidate key of the table. Or, every non-prime attribute of the table is dependent on the whole of a candidate key.

A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table.

A candidate key is the key/key combination which is unique per row.

The following table is not NF2, because "Current Work Location" only depends on "Employee". By the way, only the composite key {Employee, Skill} qualifies as a candidate key for the table.

Employees' Skills
Employee Skill Current Work Location
Brown Light Cleaning 73 Industrial Way
Brown Typing 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
Jones Shorthand 114 Main Street
Jones Typing 114 Main Street
Jones Whittling 114 Main Street

The only way to be NF2 is to split the table:

Employees
Employee Current Work Location
Brown 73 Industrial Way
Harrison 73 Industrial Way
Jones 114 Main Street
Employees' Skills
Employee Skill
Brown Light Cleaning
Brown Typing
Harrison Light Cleaning
Jones Shorthand
Jones Typing
Jones Whittling

### Thrid normal form, 3NF

The tables are in second normal form and all the attributes in a table are dependent on the primary key and only the primary key.

Most 3NF tables are free of update, insertion, and deletion anomalies.
The normal forms stringent up to level 3 guanrantee the update consistency.
Reference in Wikipedia.

The following table is NF2 but not 3NF, as "Winner Date of Birth" depends on "Winner",

Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Now, the following is 3NF:

Tournament Winners
Tournament Year Winner
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Winner Dates of Birth
Winner Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968

### Boyce–Codd normal form (BCNF), 3.5NF

A relational schema R is in Boyce-Codd normal form if and only if for every one of its dependencies $X \rightarrow Y$, at least one of the following conditions hold:

• $X \rightarrow Y$ is a trivial functional dependency ($Y \subseteq X$)
• $X$ is a superkey for schema $R$

If a relational schema is in BCNF then all redundancy based on functional dependency has been removed.

The following is not 3.5NF, because "Rate Type" is dependent on "Court". (The story for the table: court 1 is hard field, court 2 is grass, and thus the owner charges distinctively on the two courts.)

Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A

Now 3.5NF:

Rate Types
Rate Type Court Member Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
Today's Bookings
Member Flag Court Start Time End Time
Yes 1 09:30 10:30
Yes 1 11:00 12:00
No 1 14:00 15:30
No 2 10:00 11:30
No 2 11:30 13:30
Yes 2 15:00 16:30

### 4NF

Forth normal form, 4NF: for every one of its non-trivial multivalued dependencies $X \twoheadrightarrow Y$, $X$ is a superkey, that is, X is either a candidate key or a superset thereof. 4NF aims to eliminate redundancy caused by multivalued dependency.

Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency.

Not 4NF:

The following table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are: $${Restaurant} \twoheadrightarrow {Pizza Variety}$$ $${Restaurant} \twoheadrightarrow {Delivery Area}$$

These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas.

Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville

Yes 4NF:

Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Thick Crust
A1 Pizza Stuffed Crust
Elite Pizza Thin Crust
Elite Pizza Stuffed Crust
Vincenzo's Pizza Thick Crust
Vincenzo's Pizza Thin Crust
Delivery Areas By Restaurant
Restaurant Delivery Area
A1 Pizza Springfield
A1 Pizza Shelbyville
A1 Pizza Capital City
Elite Pizza Capital City
Vincenzo's Pizza Springfield
Vincenzo's Pizza Shelbyville

### 5NF

Fifth normal form, 5NF: aims to eliminate the join dependency.

Not 5NF:

Traveling Salesman Product Availability By Brand
Traveling Salesman Brand Product Type
Jack Schneider Acme Vacuum Cleaner
Jack Schneider Acme Breadbox
Mary Jones Robusto Pruning Shears
Mary Jones Robusto Vacuum Cleaner
Mary Jones Robusto Breadbox
Mary Jones Robusto Umbrella Stand
Louis Ferguson Robusto Vacuum Cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum Cleaner
Louis Ferguson Acme Lava Lamp
Louis Ferguson Nimbus Tie Rack

Yes, 5NF:

Product Types By Traveling Salesman
Traveling Salesman Product Type
Jack Schneider Vacuum Cleaner
Jack Schneider Breadbox
Mary Jones Pruning Shears
Mary Jones Vacuum Cleaner
Mary Jones Breadbox
Mary Jones Umbrella Stand
Louis Ferguson Telescope
Louis Ferguson Vacuum Cleaner
Louis Ferguson Lava Lamp
Louis Ferguson Tie Rack
Brands By Traveling Salesman
Traveling Salesman Brand
Jack Schneider Acme
Mary Jones Robusto
Louis Ferguson Robusto
Louis Ferguson Acme
Louis Ferguson Nimbus
Product Types By Brand
Brand Product Type
Acme Vacuum Cleaner
Acme Breadbox
Acme Lava Lamp
Robusto Pruning Shears
Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack

### Functional dependency

A relation $R$ is said to satisfy the functional dependency $X\rightarrow Y$ (where $X$ and $Y$ are two attributes/columns), if and only if, $Y$ is a function of $X$. Wikipedia page for reference.

A functional dependency FD: $X$ → $Y$ is called trivial if $Y$ is a subset of $X$.

### Multivalued dependency

In simple words the multivalued dependency condition can be expressed as follows: if we denote by $(x,y,z)$ the tuple having values for $\alpha, \beta, R - \alpha - \beta$ collectively equal to $x, y, z$, correspondingly, then whenever the tuples $(a,b,c)$ and $(a,d,e)$ exist in $r$, the tuples $(a,b,e)$ and $(a,d,c)$ should also exist in $r$. Thus there is redundancy.

For example, the following table, when we need to add a Book to a Course, we have to repeat insertion for every "Lecturer".

Course Book Lecturer
AHA Nederpelt William M
AHA Silberschatz William M
AHA Nederpelt John D
AHA Silberschatz John D
AHA Silberschatz Christian G
AHA Nederpelt Christian G
OSO Silberschatz John D
OSO Silberschatz William M

A functional dependency is a special case of multivalued dependency. In a functional dependency $X \rightarrow Y$, every $x$ determines exactly one $y$, never more than one.

A trivial multivalued dependency $X \twoheadrightarrow Y$ is one where either $Y$ is a subset of $X$, or $X$ and $Y$ together form the whole set of attributes of the relation.

A multivalued dependency is a special case of a join dependency, with only two sets of values involved, i.e. it is a 2-ary join dependency.

### Join dependency

Let $R$ be a relation schema and let $R_1, R_2, \ldots, R_n$ be a decomposition of $R$.
The relation $r(R)$ satisfies the join dependency
$*(R_1,R_2,\ldots,R_n)$ if $\bowtie_{i = 1}^n \Pi_{R_i}(r) = r$.
A join dependency is trivial if one of the $R_i$ is $R$ itself.
-- Silberschatz, Korth. Database System Concepts, 1st Edition

The bow tie symbol ($\bowtie$) denotes the Natural Join.

## Database normalization

See wikipedia: database normalization for update / insertion / deletion anomaly.

# Relational algebra

## Set union, difference

For set union and set difference, the two relations involved must be union-compatible, that is, the two relations must have the same set of attributes.

## Cartesian product

For the Cartesian product to be defined, the two relations involved must have disjoint headers, that is, they must not have a common attribute name.

Definition: $R \times S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) \in R, (s1, s2, ..., sm) \in S}$
The cardinality of the Cartesian product is the product of the cardinalities of its factors, i.e., $|R \times S| = |R| \times |S|$.

## Projection

Example: To obtain the names and phone numbers from an address book, the projection might be written $\pi_{\text{contactName, contactPhoneNumber}}( \text{addressBook} )$.

## Selection

To obtain a listing of all friends or business associates in an address book, the selection might be written as $\sigma_{\text{isFriend = true} \, \bigvee \, \text{isBusinessContact = true}}( \text{addressBook} )$.

## Rename

To rename the 'isFriend' attribute to 'isBusinessContact' in a relation, $\rho_{\text{isBusinessContact / isFriend} } ( \text{addressBook} )$.

## Natural Join

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Sales
Dept
DeptName Manager
Finance George
Sales Harriet
Production Charles
Employee$\, \bowtie \,$Dept
Name EmpId DeptName Manager
Harry 3415 Finance George
Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet

This can also be used to define composition of relations. For example, the composition of Employee and Dept is their join as shown above, projected on all but the common attribute DeptName. In category theory, the join is precisely the fiber product.

## $\theta$-join

Join with condition.

Car
CarModel CarPrice
CarA 20,000
CarB 30,000
CarC 50,000
Boat
BoatModel BoatPrice
Boat1 10,000
Boat2 40,000
Boat3 60,000
$Car \bowtie Boat, (CarPrice\ge BoatPrice)$
CarModel CarPrice BoatModel BoatPrice
CarA 20,000 Boat1 10,000
CarB 30,000 Boat1 10,000
CarC 50,000 Boat1 10,000
CarC 50,000 Boat2 40,000

### Inner join, cross join

In MySQL, CROSS JOIN is syntactically equivalent to INNER JOIN; they can replace each other.

In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause; CROSS JOIN is used otherwise.

## Semijoin (⋉)(⋊)

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Production
Dept
DeptName Manager
Sales Bob
Sales Thomas
Production Katie
Production Mark
EmployeeDept
Name EmpId DeptName
Sally 2241 Sales
Harriet 2202 Production

## Antijoin (▷)

Employee
Name EmpId DeptName
Harry 3415 Finance
Sally 2241 Sales
George 3401 Finance
Harriet 2202 Production
Dept
DeptName Manager
Sales Sally
Production Harriet
EmployeeDept
Name EmpId DeptName
Harry 3415 Finance
George 3401 Finance

## Division

Completed
Student Task
Fred Database1
Fred Database2
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sarah Database1
Sarah Database2
DBProject
Task
Database1
Database2
Completed ÷ DBProject
Student
Fred
Sarah

# Stored procedures, database triggers, and UDFs

## Stored procedure

A stored procedure is a subroutine available to applications that access a relational database management system (RDMS). Such procedures are stored in the database data dictionary.

Typical uses for stored procedures include data validation (integrated into the database) or access control mechanisms. Furthermore, stored procedures can consolidate and centralize logic that was originally implemented in applications.

Stored procedures are similar to user-defined functions (UDFs). The major difference is that UDFs can be used like any other expression within SQL statements, whereas stored procedures must be invoked using the CALL/EXECUTE statement. A procedure is not an expression and, thus, cannot be used like user-defined functions.

Stored procedures may return result sets, i.e., the results of a SELECT statement. Such result sets can be processed using cursors, by other stored procedures, by associating a result set locator, or by applications.

SO: stored procedure VS ORM. Stored Procedure are faster than SQL statements because they are pre-compiled in the Database Engine, with execution plans cached. You can't do that in ORM, but you have other alternatives, like using Cache Level 1 or 2. Also, try to do bulk operations with ORM. Stored Procedures works very well in those cases.

# Concepts

## Cursor

A database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records. The database cursor characteristic of traversal makes cursors akin to the programming language concept of iterator.

Cursors allocate resources on the server, such as locks, packages, processes, and temporary storage. For example, Microsoft SQL Server implements cursors by creating a temporary table and populating it with the query's result set. If a cursor is not properly closed (deallocated), the resources will not be freed until the SQL session (connection) itself is closed.