unit 5 Relational database design

Relational model
Relational Model represents how data is stored in Relational Databases. A relational database stores data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in Table.
STUDENT

ROLL_NO	NAME	ADDRESS	PHONE	AGE
1	Ram	Dharan	9455123451	18
2	Ramesh	Gorkha	9652431543	18
3	Sujit	Pokhara	9156253131	20
4	Suresh	Kathmandu		18

The relational model consists of three major components:
1. The set of relations and set of domains that defines the way data can be represented (data structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
3. The operations that can be performed on data (data manipulation).

Important terminologies

Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
Relation Schema: A relation schema represents name of the relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema for STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
Tuple: Each row in the relation is known as tuple.
Relation Instance: The set of tuples of a relation at a particular instance of time is called as relation instance. It can change whenever there is insertion, deletion or updation in the database.
Degree: The number of attributes in the relation is known as degree of the relation.
Cardinality: The number of tuples in a relation is known as cardinality.
Column: Column represents the set of values for a particular attribute.
NULL Values: The value which is not known or unavailable is called NULL value. It is represented by blank space.

Pitfalls in Relational DB Design

A bad design may have several properties, including:

Repetition of information.
Inability to represent certain information.
Loss of information
hardware overhead

Anomalies: Anomalies are those unexpected and integrity error that occur due to the flaws or limitation of given database.There are three types of anomalies .They are

Insertion anomaly: It occurs while inserting one fact in the database requires unnecessary knowledge of other facts being inserted.
Deletion anomaly: It occurs while deleting one fact from database causes loss of other unrelated data.
Update anomaly: It occurs while updating the values of one fact requires multiple changes to the database.

Functional dependencies
It is a relationship between columns X and Y such that given value of X can determine the value of Y
i.e X → Y where , X is determinant of Y and Y is functionally dependent.

Partial dependency:It occurs when a column in a table only depends on a part of concatenated keys.
Transitive dependency: It occurs when non-key attribute is functionally dependent on one or more non-key attribute.

Closure of functional dependency
Closure of a set (x⁺) is the set of attribute functionally determined by x.
Let S be the set of functional dependencies on a relation R. Let x is set of attributes that appear on left hand side of same functional dependencies in S and we want to determine the set of all attributes that are dependent on x. Thus for such set of attribute x, we determine the set x⁺ of attribute that are functionally algorithm to find closure of functional dependency. Each determined by x based on S, x⁺ is called closure of x under S.
Algorithm to find closure of functional dependency is
x⁺=x;
Repeat
Old x⁺=x⁺
Do
If Y is subset of x⁺ then,
x⁺=x⁺U z
until (x⁺=old x⁺)
/* don’t change then leave loop */

Closure of attribute set
suppose we are given relation R with attributes (A,B,C,D) and FDS A→BC, B→CD. Find the closure attribute of A.
we have,
A⁺=A;
A(BC) as A →BC
AB(CD)C as
ABCDC = ABCD
So, A⁺=ABCD

Application of closure set of attributes

It is used to identify the additional functional dependencies
It is used to identify keys (candidate key and super key)
It is used to identify the prime and non-prime attribute
It is used to identify equivalence of functional dependencies

Decomposition
Decomposition in database means breaking tables down into multiple tables. If the relation has no proper decomposition, then it may lead to problems like loss of information. Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

Types of Decomposition
DBMS Relational Decomposition

Lossless join decomposition:If the information is not lost from the relation that is decomposed, then the decomposition will be lossless. The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed. The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Dependency preserving: In the dependency preservation, at least one decomposed table must satisfy every dependency. If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).

Normalization
It is the process of decomposing relations with anomalies to produce smaller, well-structured relations that

save typing of repetitive data
reduce disk space
to ease data manipulation
avoid frequent restructuring of tableBenefits
less storage space
quicker update
clearer data relationship
easier to add
flexible structure
less data inconsistencyFirst Normal Form (1NF)

Each column is unique in 1NF.
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

Example: 1
Sample Employee table, it displays employees are working with multiple departments.

Employee	Age	Department
Milan	32	Marketing, Sales
Ram	45	Quality Assurance
Krishna	36	Human Resource

Employee table following 1NF:

Employee	Age	Department
Milan	32	Marketing
Milan	32	Sales
Ram	45	Quality Assurance
Krishna	36	Human Resource

Example 2: Suppose a company wants to store the names and contact details of its employees. It creates a table that looks like this:

emp_id	emp_name	emp_address	emp_mobile
101	Hari	Dharan	8912312390
102	John	Kathmandu	8812121212 9900012222
103	Radha	Pokhara	7778881212
104	Seema	Bandipur	9990000123 8123450987

this table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the emp_mobile values for employees John & Seema violates that rule.
To make the table complies with 1NF we should have the data like this:

emp_id	emp_name	emp_address	emp_mobile
101	Hari	Dharan	8912312390
102	John	Kathmandu	8812121212
102	John	Kathmandu	9900012222
103	Radha	Pokhara	7778881212
104	Seema	Bandipur	9990000123
104	Seema	Bandipur	8123450987

Example:- 3

Student Table :

Student	Age	Subject
Adam	15	Biology, Maths
Alex	14	Maths
Stuart	17	Maths

In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be :

Student	Age	Subject
Adam	15	Biology
Adam	15	Maths
Alex	14	Maths
Stuart	17	Maths

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
The entity should be considered already in 1NF and all attributes within the entity should depend solely on the unique identifier of the entity.
Example: 1 Sample Products table:

productID	product	Brand
1	Monitor	Apple
2	Monitor	Samsung
3	Scanner	HP
4	Head phone	JBL

Product table following 2NF:
Products Category table:

productID	product
1	Monitor
2	Scanner
3	Head phone

Brand table:

brandID	Brand
1	Apple
2	Samsung
3	HP
4	JBL

Products Brand table:

pbID	productID	brandID
1	1	1
2	1	2
3	2	3
4	3	4

Example 2: Suppose a school wants to store the data of teachers and the subjects they teach. They create a table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

teacher_id	Subject	teacher_age
111	Maths	38
111	Physics	38
222	Biology	38
333	Physics	40
333	Chemistry	40

Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
he table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id	teacher_age
111	38
222	38
333	40

teacher_subject table:

teacher_id	Subject
111	Maths
111	Physics
222	Biology
333	Physics
333	Chemistry

Now the tables comply with Second normal form (2NF).
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:

Table must be in 2NF
Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

X is a super key of table
Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table named employee_details that looks like this:

emp_id	emp_name	emp_zip	emp_state	emp_city	emp_district
1001	John	282005	UP	Agra	Dayal Bagh
1002	Ajeet	222008	TN	Chennai	M-City
1006	Lora	282007	TN	Chennai	Urrapakkam
1101	Lilly	292008	UK	Pauri	Bhagwan
1201	Steve	222999	MP	Gwalior	Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:
employee table:

emp_id	emp_name	emp_zip
1001	John	282005
1002	Ajeet	222008
1006	Lora	282007
1101	Lilly	292008
1201	Steve	222999

employee_zip table:

emp_zip	emp_state	emp_city	emp_district
282005	UP	Agra	Dayal Bagh
222008	TN	Chennai	M-City
282007	TN	Chennai	Urrapakkam
292008	UK	Pauri	Bhagwan
222999	MP	Gwalior	Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They store the data like this:

emp_id	emp_nationality	emp_dept	dept_type	dept_no_of_emp
1001	Austrian	Production and planning	D001	200
1001	Austrian	stores	D001	250
1002	American	design and technical support	D134	100
1002	American	Purchasing department	D134	600

Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id	emp_nationality
1001	Austrian
1002	American

emp_dept table:

emp_dept	dept_type	dept_no_of_emp
Production and planning	D001	200
Stores	D001	250
design and technical support	D134	100
Purchasing department	D134	600

emp_dept_mapping table:

emp_id	emp_dept
1001	Production and planning
1001	Stores
1002	design and technical support
1002	Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.

Notice

unit 5 Relational database design

Important terminologies

Pitfalls in Relational DB Design

Boyce Codd normal form (BCNF)

Resources

Support Us

Join Group