Normalization

Normalization is the process of structuring relational data to:

  • Reduce redundancy
  • Prevent insert/update/delete anomalies
  • Improve integrity and consistency

Why Normalize?

Without normalization:

  • The same data appears in multiple rows
  • Updates become inconsistent
  • Deletes remove unrelated information
  • Inserts require unrelated attributes or NULLs

Normalization ensures each fact is stored once and referenced elsewhere.


Data Anomalies (example)

Unnormalized (Single Table)

student_idstudent_namebranch_idbranch_namehod_name
1Alice101CSEMr. X
2Bob101CSEMr. X

Common issues:

  • Insertion: need branch data to insert a student → NULLs or repeats
  • Update: change HOD requires many row updates → inconsistency risk
  • Deletion: removing last student of a branch also removes branch info

Normalized (Separate Entities)

  • Student(student_id, student_name, branch_id)
  • Branch(branch_id, branch_name, hod_name)

Benefits:

  • One source of truth for branch attributes
  • Fewer anomalies and cleaner updates

Functional Dependencies (FD)

  • Trivial: X → Y where Y ⊆ X
    Example: (RollNo, Name) → Name
  • Non-trivial: X → Y where Y ⊄ X
    Example: RollNo → Name
  • Completely non-trivial: X → Y where X ∩ Y = ∅

FDs describe how attributes determine other attributes and guide normalization.


Types of Keys

  • Primary Key (PK): uniquely identifies a row; non-NULL
    Example: Student(RollNo, Name, Dept) → PK: RollNo
  • Candidate Key: minimal unique attribute sets; multiple may exist
    Example: Student(RollNo, Aadhaar, Email) → all are candidates
  • Alternate Key: candidate keys not chosen as PK
  • Composite Key: PK with multiple columns
    Example: Score(StudentID, SubjectID, Marks) → PK: (StudentID, SubjectID)
  • Foreign Key (FK): references a PK of another table
    Example: Score.StudentID → Student.StudentID
  • Super Key: any superset that uniquely identifies rows
    Example: {RollNo}, {RollNo, Name}


Prime vs Non-Prime Attributes

  • Prime attribute: part of any candidate key
  • Non-prime attribute: not part of any candidate key

Dependencies to Watch

Partial Dependency (violates 2NF)

Occurs when PK is composite and a non-key attribute depends on part of it.

Example:

Score(StudentID, SubjectID, TeacherName)
PK → (StudentID, SubjectID)
SubjectID → TeacherName  ❌  (depends on part of PK)

Fix: move TeacherName to Subject, reference by SubjectID.

Transitive Dependency (violates 3NF)

A → B, B → C ⇒ A → C (indirectly), where B and C are non-key attributes.

Example:

Student(RollNo, DeptID, DeptName)
RollNo → DeptID
DeptID → DeptName
DeptName depends transitively on RollNo  ❌

Fix: separate Dept into its own table; Student stores DeptID only.


Normal Forms

1NF (First Normal Form)

  • Atomic values (no arrays/repeating groups)
  • Each column holds a single value per row

2NF (Second Normal Form)

  • In 1NF and no partial dependency on a composite PK

Example violation:

Score(StudentID, SubjectID, Marks, TeacherName)
TeacherName depends only on SubjectID  ❌

Fix:

  • Subject(SubjectID, SubjectName, TeacherName)
  • Score(StudentID, SubjectID, Marks)

3NF (Third Normal Form)

  • In 2NF and no transitive dependency (non-keys depend only on the key)

Example violation:

Student(RollNo, State, City)
RollNo → State, State → City ⇒ City depends transitively on RollNo  ❌

Fix:

  • Student(RollNo, State)
  • State(State, City)

BCNF (Boyce–Codd Normal Form)

  • For every FD X → Y, X must be a super key

Example:

R(A,B,C) with AB → C and C → B
C is not a super key ⇒ violates BCNF  ❌
Decompose: R1(A,C), R2(C,B)

4NF (Fourth Normal Form)

  • In BCNF and no non-trivial multivalued dependencies (MVDs)

Example:

Person →→ Phone, Person →→ Hobby (independent lists)
Not 4NF: Person | Phone | Hobby  ❌
Decompose:
  Person_Phone(Person, Phone)
  Person_Hobby(Person, Hobby)

5NF (Fifth Normal Form / PJNF)

  • In 4NF and no non-trivial join dependencies
  • If a table can be losslessly reconstructed from multiple projections, consider decomposing

Example:

Agent | Company | Product
May require decomposition into three pairwise relations

Cheat Sheet

Dependency Type              Removed In
Multi-valued attributes      1NF
Partial dependency           2NF
Transitive dependency        3NF
Dependency without super key BCNF
Multivalued dependency       4NF
Join dependency              5NF