Normal Forms & Limitations of Relational Data Model

Database Systems - Lecture 9

Course: Database Systems (MTech First Semester)
Unit: I - Introduction & Relational Databases
Instructor: Dr. Mohsin Dar
Department: Cloud & Software Operations Cluster | SOCS | UPES

Learning Objectives

🎯 Understand Normalization

Learn the concept and importance of database normalization in relational databases.

📊 Master Normal Forms

Explore 1NF, 2NF, 3NF, BCNF, and higher normal forms with practical examples.

⚙️ Apply Normalization

Develop skills to normalize database schemas and eliminate redundancy.

⚠️ Identify Limitations

Recognize the limitations of relational data model in modern applications.

Database Normalization

Definition

Database Normalization is a systematic approach to organizing data in a relational database to minimize redundancy and dependency by organizing fields and table relationships.

Why Normalize?

1

Eliminate Redundancy

Reduce duplicate data storage to save space and improve consistency.

2

Data Integrity

Maintain data accuracy and consistency across the database.

3

Update Anomalies

Prevent insertion, update, and deletion anomalies.

4

Logical Structure

Create a logical and efficient database structure.

Functional Dependencies

Functional Dependency (FD)

A functional dependency X → Y exists if for every valid instance of a relation, whenever two tuples have the same value for attribute X, they must have the same value for attribute Y.

Types of Functional Dependencies

Type	Description	Example
Trivial FD	Y is a subset of X	{StudentID, Name} → {StudentID}
Non-trivial FD	Y is not a subset of X	{StudentID} → {Name}
Partial Dependency	Non-key attribute depends on part of composite key	{StudentID, CourseID} → {StudentName}
Transitive Dependency	A → B and B → C, then A → C	StudentID → DeptID → DeptName

First Normal Form (1NF)

1NF Rules

Each table cell should contain a single value (atomic values)
Each record needs to be unique
No repeating groups or arrays

Before 1NF (Unnormalized)

StudentID	Name	Courses
101	Alice	Math, Physics, Chemistry
102	Bob	English, History

After 1NF

StudentID	Name	Course
101	Alice	Math
101	Alice	Physics
101	Alice	Chemistry
102	Bob	English
102	Bob	History

Second Normal Form (2NF)

2NF Rules

Must be in 1NF
No partial dependencies on composite primary key
All non-key attributes must be fully functionally dependent on the entire primary key

Before 2NF (1NF but not 2NF)

StudentID	CourseID	StudentName	CourseName	Grade
101	CS101	Alice	Programming	A
101	CS102	Alice	Data Structures	B+

Problem: StudentName depends only on StudentID, not on the full composite key (StudentID, CourseID)

After 2NF

Students Table

StudentID	StudentName
101	Alice
102	Bob

Courses Table

CourseID	CourseName
CS101	Programming
CS102	Data Structures

Enrollments Table

StudentID	CourseID	Grade
101	CS101	A
101	CS102	B+

Third Normal Form (3NF)

3NF Rules

Must be in 2NF
No transitive dependencies
Non-key attributes should not depend on other non-key attributes

Before 3NF (2NF but not 3NF)

StudentID	StudentName	DepartmentID	DepartmentName	DepartmentHead
101	Alice	CS	Computer Science	Dr. Smith
102	Bob	CS	Computer Science	Dr. Smith

Problem: DepartmentName and DepartmentHead depend on DepartmentID (transitive dependency)

After 3NF

Students Table

StudentID	StudentName	DepartmentID
101	Alice	CS
102	Bob	CS

Departments Table

DepartmentID	DepartmentName	DepartmentHead
CS	Computer Science	Dr. Smith
EE	Electrical Engineering	Dr. Jones

Boyce-Codd Normal Form (BCNF)

BCNF Rules

Must be in 3NF
For every functional dependency X → Y, X should be a super key
Eliminates anomalies that can still exist in 3NF

Example: Course-Instructor-Time

Functional Dependencies:

CourseID, Time → Instructor
Instructor, Time → CourseID

Before BCNF (3NF but not BCNF)

CourseID	Time	Instructor
CS101	9:00 AM	Dr. Smith
CS102	10:00 AM	Dr. Smith

After BCNF

Course-Time Table

CourseID	Time
CS101	9:00 AM
CS102	10:00 AM

Time-Instructor Table

Time	Instructor
9:00 AM	Dr. Smith
10:00 AM	Dr. Smith

Higher Normal Forms

4

Fourth Normal Form (4NF)

Must be in BCNF and have no multi-valued dependencies. Deals with situations where a table has two or more independent multi-valued facts about an entity.

5

Fifth Normal Form (5NF)

Must be in 4NF and have no join dependencies. Every join dependency in the table is implied by the candidate keys.

6

Domain-Key Normal Form

Every constraint on the relation is a logical consequence of the definition of keys and domains. This is the ultimate normal form.

4NF Example: Multi-valued Dependencies

Problem: A professor can teach multiple courses and have multiple research interests independently.

Before 4NF

Professor	Course	Research Interest
Dr. Smith	Database Systems	Data Mining
Dr. Smith	Database Systems	Machine Learning
Dr. Smith	Data Structures	Data Mining
Dr. Smith	Data Structures	Machine Learning

After 4NF

Professor-Course Table

Professor	Course
Dr. Smith	Database Systems
Dr. Smith	Data Structures

Professor-Research Table

Professor	Research Interest
Dr. Smith	Data Mining
Dr. Smith	Machine Learning

Normalization Process

Step-by-Step Normalization Algorithm

Identify Functional Dependencies: List all FDs in the relation
Find Candidate Keys: Determine all possible candidate keys
Check 1NF: Ensure atomic values and unique records
Check 2NF: Eliminate partial dependencies
Check 3NF: Eliminate transitive dependencies
Check BCNF: Ensure all determinants are super keys
Check Higher Forms: If needed, check for 4NF and 5NF

Comprehensive Example: Student Management System

Original Unnormalized Table

StudentID	StudentName	Courses	Grades	Instructors	DeptID	DeptName	DeptHead
S001	Alice Johnson	CS101, CS102	A, B+	Dr. Smith, Dr. Jones	CS	Computer Science	Dr. Brown

-- Final Normalized Schema (3NF/BCNF)

-- Students Table
CREATE TABLE Students (
    StudentID VARCHAR(10) PRIMARY KEY,
    StudentName VARCHAR(50) NOT NULL,
    DeptID VARCHAR(10),
    FOREIGN KEY (DeptID) REFERENCES Departments(DeptID)
);

-- Departments Table
CREATE TABLE Departments (
    DeptID VARCHAR(10) PRIMARY KEY,
    DeptName VARCHAR(50) NOT NULL,
    DeptHead VARCHAR(50)
);

-- Courses Table
CREATE TABLE Courses (
    CourseID VARCHAR(10) PRIMARY KEY,
    CourseName VARCHAR(50) NOT NULL,
    InstructorID VARCHAR(10),
    FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);

-- Enrollments Table
CREATE TABLE Enrollments (
    StudentID VARCHAR(10),
    CourseID VARCHAR(10),
    Grade VARCHAR(5),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

-- Instructors Table
CREATE TABLE Instructors (
    InstructorID VARCHAR(10) PRIMARY KEY,
    InstructorName VARCHAR(50) NOT NULL,
    DeptID VARCHAR(10),
    FOREIGN KEY (DeptID) REFERENCES Departments(DeptID)
);
                

Denormalization

What is Denormalization?

Denormalization is the process of intentionally introducing redundancy into a database design to improve query performance, typically by combining normalized tables.

When to Denormalize?

📊

Performance Requirements

When query performance is more critical than storage space or update efficiency.

🔍

Read-Heavy Workloads

Systems with frequent reads but infrequent updates benefit from denormalization.

📈

Data Warehousing

OLAP systems often use denormalized structures for faster analytics.

⚡

Real-time Systems

Applications requiring sub-second response times may need denormalized data.

Denormalization Example

Instead of joining Students and Departments tables repeatedly:

Normalized (Multiple Tables)

SELECT s.StudentName, d.DeptName 
FROM Students s 
JOIN Departments d ON s.DeptID = d.DeptID;
                

Denormalized (Single Table)

StudentID	StudentName	DeptID	DeptName
S001	Alice Johnson	CS	Computer Science
S002	Bob Smith	CS	Computer Science

SELECT StudentName, DeptName FROM Student_Dept_View;

Limitations of Relational Data Model

Structural Limitations

1. Complex Data Types

Problem: Traditional relational databases struggle with complex data types like arrays, nested objects, multimedia content, and hierarchical data structures.

Example: Storing JSON documents, XML data, or multimedia files requires workarounds or separate storage systems.

2. Object-Relational Impedance Mismatch

Problem: Difficulty in mapping object-oriented programming concepts to relational database structures.

Impact: Complex inheritance hierarchies and polymorphic relationships are hard to represent efficiently.

3. Fixed Schema Requirements

Problem: Relational databases require predefined schemas, making it difficult to handle dynamic or evolving data structures.

Example: Adding new attributes requires schema modifications and potential data migration.

Performance & Scalability Limitations

1. Horizontal Scalability Challenges

Problem: Traditional RDBMS are designed for vertical scaling (scaling up) rather than horizontal scaling (scaling out).

Impact: Difficult to distribute data across multiple servers while maintaining ACID properties and referential integrity.

2. Join Operation Overhead

Problem: Complex queries involving multiple joins can be computationally expensive.

Example: Normalized designs may require numerous joins for simple data retrieval operations.

3. Big Data Processing

Problem: Traditional relational databases are not optimized for processing massive volumes of unstructured or semi-structured data.

Limitations:

Limited support for parallel processing
Inefficient for analytical workloads on large datasets
Poor performance with real-time data streaming

Scalability Comparison

Aspect	Relational Database	NoSQL Solutions
Scaling Type	Vertical (Scale Up)	Horizontal (Scale Out)
Data Volume	GB to TB range	TB to PB range
Schema Flexibility	Fixed Schema	Dynamic Schema
Complex Queries	SQL - Very Good	Limited - Good for Simple Queries

Modern Data Requirements vs Relational Model

1. Semi-Structured and Unstructured Data

Challenge: Modern applications generate diverse data types including social media posts, sensor data, log files, and multimedia content.

Relational Limitation: Rigid tabular structure cannot efficiently handle variable schemas and nested data structures.

2. Real-Time Analytics

Requirement: Need for real-time data processing and analytics for immediate decision-making.

Limitation: Traditional RDBMS are optimized for transactional processing (OLTP) rather than analytical processing (OLAP).

3. Geographic Distribution

Challenge: Global applications require data to be distributed across multiple geographic locations.

Issues:

Network latency affects performance
Maintaining consistency across distributed nodes
Compliance with regional data protection laws

Use Case Analysis

Use Case	Data Characteristics	Relational Fit	Better Alternative
Social Media Platform	Unstructured posts, variable metadata	Poor	Document Store (MongoDB)
IoT Data Processing	High-volume time-series data	Poor	Time-series DB (InfluxDB)
Recommendation Engine	Graph relationships, user behavior	Limited	Graph Database (Neo4j)
Financial Transactions	Structured, ACID requirements	Excellent	Relational Database

Alternative Data Models

📄

Document Stores

Examples: MongoDB, CouchDB

Best For: Semi-structured data, content management, catalogs

Advantages: Flexible schema, natural object mapping

🗂️

Column-Family

Examples: Cassandra, HBase

Best For: Time-series data, IoT applications, wide tables

Advantages: High write throughput, compression

🔗

Graph Databases

Examples: Neo4j, Amazon Neptune

Best For: Social networks, recommendation systems

Advantages: Natural relationship representation

🗝️

Key-Value Stores

Examples: Redis, DynamoDB

Best For: Caching, session management

Advantages: High performance, simplicity

CAP Theorem Impact

The CAP theorem states that distributed systems can only guarantee two of three properties simultaneously:

Consistency: All nodes see the same data simultaneously
Availability: System remains operational
Partition Tolerance: System continues despite network failures

Impact: Traditional RDBMS prioritize Consistency and Availability but struggle with Partition Tolerance in distributed environments.

Solutions and Workarounds

Extending Relational Capabilities

Modern RDBMS Extensions

JSON/XML Support: PostgreSQL, MySQL, SQL Server now support JSON data types
Array Types: PostgreSQL supports array columns and operations
Full-Text Search: Built-in text search capabilities
Spatial Data: PostGIS for geographic information systems
User-Defined Types: Custom data types and functions

Hybrid Approaches

Polyglot Persistence

Using different database technologies for different parts of an application based on data requirements.

Data Type	Database Choice	Example Use
Transactional Data	PostgreSQL/MySQL	User accounts, orders, payments
Product Catalog	MongoDB	Variable product attributes
User Sessions	Redis	Fast session storage and caching
Recommendation Data	Neo4j	User behavior and product relationships
Analytics Data	ClickHouse/BigQuery	OLAP queries and reporting

Best Practices & Guidelines

When to Use Relational Databases

Ideal Scenarios for RDBMS

ACID Compliance Required: Financial systems, inventory management
Complex Queries: Applications requiring sophisticated joins and aggregations
Data Integrity Critical: Systems where referential integrity is paramount
Structured Data: Well-defined, stable data schemas
Reporting & Analytics: Business intelligence with SQL-based tools

Decision Framework

1

Analyze Data Structure

Is your data highly structured with fixed relationships? → RDBMS
Is your data semi-structured or schema-less? → NoSQL

2

Consider Scale Requirements

Single server sufficient? → RDBMS
Need horizontal scaling? → NoSQL

3

Evaluate Query Complexity

Complex joins and aggregations needed? → RDBMS
Simple key-value lookups? → NoSQL

4

Assess Consistency Needs

Strong consistency required? → RDBMS
Eventual consistency acceptable? → NoSQL

Summary & Key Takeaways

Normal Forms Recap

Normal Form	Key Requirement	Eliminates
1NF	Atomic values, unique records	Repeating groups
2NF	No partial dependencies	Partial dependencies
3NF	No transitive dependencies	Transitive dependencies
BCNF	All determinants are super keys	Additional anomalies in 3NF

Key Insights

🎯 Normalization Benefits

Reduces redundancy, improves data integrity, eliminates update anomalies

⚡ Performance Trade-offs

Higher normalization can increase join operations; denormalization may improve query performance

🔄 Modern Challenges

Relational model faces limitations with big data, real-time processing, and complex data types

🛠️ Hybrid Solutions

Modern applications often use polyglot persistence combining multiple database technologies

Final Recommendation

Choose the right tool for the job. Relational databases excel at structured data with complex relationships and strong consistency requirements. For other scenarios, consider NoSQL alternatives or hybrid approaches. The key is understanding your specific requirements and making informed architectural decisions.

Questions & Next Steps

Discussion Questions

When might you deliberately choose not to normalize to 3NF in a production system?
How would you approach database design for a social media platform with millions of users?
What are the trade-offs between consistency and performance in distributed systems?
How can modern RDBMS extensions address some traditional limitations?

Practical Assignment

Database Design Project

Design a database for an e-commerce platform considering:

Product catalog with variable attributes
User reviews and ratings
Order processing and inventory
Recommendation system requirements

Deliverables:

Normalized relational schema (up to 3NF/BCNF)
Identification of potential limitations
Proposed hybrid approach with justification

Next Lecture Preview

Lecture 10: Unit II: Transaction Management

We'll explore The ACID Properties,Transactions and Schedules.

Thank You!

Dr. Mohsin Dar
Assistant Professor, Cloud & Software Operations Cluster
School of Computer Science, UPES
Email: mohsin.dar@upes.ac.in