Normalization
Normalization is the process of structuring relational data to:
- Reduce redundancy
- Prevent insert/update/delete anomalies
- Improve integrity and consistency
Why Normalize?
Without normalization:
- The same data appears in multiple rows
- Updates become inconsistent
- Deletes remove unrelated information
- Inserts require unrelated attributes or NULLs
Normalization ensures each fact is stored once and referenced elsewhere.
Data Anomalies (example)
Unnormalized (Single Table)
| student_id | student_name | branch_id | branch_name | hod_name |
|---|
| 1 | Alice | 101 | CSE | Mr. X |
| 2 | Bob | 101 | CSE | Mr. X |
Common issues:
- Insertion: need branch data to insert a student → NULLs or repeats
- Update: change HOD requires many row updates → inconsistency risk
- Deletion: removing last student of a branch also removes branch info
Normalized (Separate Entities)
- Student(student_id, student_name, branch_id)
- Branch(branch_id, branch_name, hod_name)
Benefits:
- One source of truth for branch attributes
- Fewer anomalies and cleaner updates
Functional Dependencies (FD)
- Trivial: X → Y where Y ⊆ X
Example: (RollNo, Name) → Name
- Non-trivial: X → Y where Y ⊄ X
Example: RollNo → Name
- Completely non-trivial: X → Y where X ∩ Y = ∅
FDs describe how attributes determine other attributes and guide normalization.
Types of Keys
- Primary Key (PK): uniquely identifies a row; non-NULL
Example: Student(RollNo, Name, Dept) → PK: RollNo
- Candidate Key: minimal unique attribute sets; multiple may exist
Example: Student(RollNo, Aadhaar, Email) → all are candidates
- Alternate Key: candidate keys not chosen as PK
- Composite Key: PK with multiple columns
Example: Score(StudentID, SubjectID, Marks) → PK: (StudentID, SubjectID)
- Foreign Key (FK): references a PK of another table
Example: Score.StudentID → Student.StudentID
- Super Key: any superset that uniquely identifies rows
Example: {RollNo}, {RollNo, Name}
Prime vs Non-Prime Attributes
- Prime attribute: part of any candidate key
- Non-prime attribute: not part of any candidate key
Dependencies to Watch
Partial Dependency (violates 2NF)
Occurs when PK is composite and a non-key attribute depends on part of it.
Example:
Score(StudentID, SubjectID, TeacherName)
PK → (StudentID, SubjectID)
SubjectID → TeacherName ❌ (depends on part of PK)
Fix: move TeacherName to Subject, reference by SubjectID.
Transitive Dependency (violates 3NF)
A → B, B → C ⇒ A → C (indirectly), where B and C are non-key attributes.
Example:
Student(RollNo, DeptID, DeptName)
RollNo → DeptID
DeptID → DeptName
DeptName depends transitively on RollNo ❌
Fix: separate Dept into its own table; Student stores DeptID only.
Normal Forms
1NF (First Normal Form)
- Atomic values (no arrays/repeating groups)
- Each column holds a single value per row
2NF (Second Normal Form)
- In 1NF and no partial dependency on a composite PK
Example violation:
Score(StudentID, SubjectID, Marks, TeacherName)
TeacherName depends only on SubjectID ❌
Fix:
- Subject(SubjectID, SubjectName, TeacherName)
- Score(StudentID, SubjectID, Marks)
3NF (Third Normal Form)
- In 2NF and no transitive dependency (non-keys depend only on the key)
Example violation:
Student(RollNo, State, City)
RollNo → State, State → City ⇒ City depends transitively on RollNo ❌
Fix:
- Student(RollNo, State)
- State(State, City)
BCNF (Boyce–Codd Normal Form)
- For every FD X → Y, X must be a super key
Example:
R(A,B,C) with AB → C and C → B
C is not a super key ⇒ violates BCNF ❌
Decompose: R1(A,C), R2(C,B)
4NF (Fourth Normal Form)
- In BCNF and no non-trivial multivalued dependencies (MVDs)
Example:
Person →→ Phone, Person →→ Hobby (independent lists)
Not 4NF: Person | Phone | Hobby ❌
Decompose:
Person_Phone(Person, Phone)
Person_Hobby(Person, Hobby)
5NF (Fifth Normal Form / PJNF)
- In 4NF and no non-trivial join dependencies
- If a table can be losslessly reconstructed from multiple projections, consider decomposing
Example:
Agent | Company | Product
May require decomposition into three pairwise relations
Cheat Sheet
Dependency Type Removed In
Multi-valued attributes 1NF
Partial dependency 2NF
Transitive dependency 3NF
Dependency without super key BCNF
Multivalued dependency 4NF
Join dependency 5NF