Normal Forms & Limitations of Relational Data Model
Database Systems - Lecture 9
Course: Database Systems (MTech First Semester)
Unit: I - Introduction & Relational Databases
Instructor: Dr. Mohsin Dar
Department: Cloud & Software Operations Cluster | SOCS | UPES

Learning Objectives

🎯 Understand Normalization

Learn the concept and importance of database normalization in relational databases.

📊 Master Normal Forms

Explore 1NF, 2NF, 3NF, BCNF, and higher normal forms with practical examples.

⚙️ Apply Normalization

Develop skills to normalize database schemas and eliminate redundancy.

⚠️ Identify Limitations

Recognize the limitations of relational data model in modern applications.

Database Normalization

Definition

Database Normalization is a systematic approach to organizing data in a relational database to minimize redundancy and dependency by organizing fields and table relationships.

Why Normalize?

1

Eliminate Redundancy

Reduce duplicate data storage to save space and improve consistency.

2

Data Integrity

Maintain data accuracy and consistency across the database.

3

Update Anomalies

Prevent insertion, update, and deletion anomalies.

4

Logical Structure

Create a logical and efficient database structure.

Functional Dependencies

Functional Dependency (FD)

A functional dependency X → Y exists if for every valid instance of a relation, whenever two tuples have the same value for attribute X, they must have the same value for attribute Y.

Types of Functional Dependencies

Type Description Example
Trivial FD Y is a subset of X {StudentID, Name} → {StudentID}
Non-trivial FD Y is not a subset of X {StudentID} → {Name}
Partial Dependency Non-key attribute depends on part of composite key {StudentID, CourseID} → {StudentName}
Transitive Dependency A → B and B → C, then A → C StudentID → DeptID → DeptName

First Normal Form (1NF)

1NF Rules
  • Each table cell should contain a single value (atomic values)
  • Each record needs to be unique
  • No repeating groups or arrays

Before 1NF (Unnormalized)

StudentID Name Courses
101 Alice Math, Physics, Chemistry
102 Bob English, History

After 1NF

StudentID Name Course
101 Alice Math
101 Alice Physics
101 Alice Chemistry
102 Bob English
102 Bob History

Second Normal Form (2NF)

2NF Rules
  • Must be in 1NF
  • No partial dependencies on composite primary key
  • All non-key attributes must be fully functionally dependent on the entire primary key

Before 2NF (1NF but not 2NF)

StudentID CourseID StudentName CourseName Grade
101 CS101 Alice Programming A
101 CS102 Alice Data Structures B+

Problem: StudentName depends only on StudentID, not on the full composite key (StudentID, CourseID)

After 2NF

Students Table

StudentIDStudentName
101Alice
102Bob

Courses Table

CourseIDCourseName
CS101Programming
CS102Data Structures

Enrollments Table

StudentIDCourseIDGrade
101CS101A
101CS102B+

Third Normal Form (3NF)

3NF Rules
  • Must be in 2NF
  • No transitive dependencies
  • Non-key attributes should not depend on other non-key attributes

Before 3NF (2NF but not 3NF)

StudentID StudentName DepartmentID DepartmentName DepartmentHead
101 Alice CS Computer Science Dr. Smith
102 Bob CS Computer Science Dr. Smith

Problem: DepartmentName and DepartmentHead depend on DepartmentID (transitive dependency)

After 3NF

Students Table

StudentIDStudentNameDepartmentID
101AliceCS
102BobCS

Departments Table

DepartmentIDDepartmentNameDepartmentHead
CSComputer ScienceDr. Smith
EEElectrical EngineeringDr. Jones

Boyce-Codd Normal Form (BCNF)

BCNF Rules
  • Must be in 3NF
  • For every functional dependency X → Y, X should be a super key
  • Eliminates anomalies that can still exist in 3NF

Example: Course-Instructor-Time

Functional Dependencies:

  • CourseID, Time → Instructor
  • Instructor, Time → CourseID

Before BCNF (3NF but not BCNF)

CourseID Time Instructor
CS101 9:00 AM Dr. Smith
CS102 10:00 AM Dr. Smith

After BCNF

Course-Time Table

CourseIDTime
CS1019:00 AM
CS10210:00 AM

Time-Instructor Table

TimeInstructor
9:00 AMDr. Smith
10:00 AMDr. Smith

Higher Normal Forms

4

Fourth Normal Form (4NF)

Must be in BCNF and have no multi-valued dependencies. Deals with situations where a table has two or more independent multi-valued facts about an entity.

5

Fifth Normal Form (5NF)

Must be in 4NF and have no join dependencies. Every join dependency in the table is implied by the candidate keys.

6

Domain-Key Normal Form

Every constraint on the relation is a logical consequence of the definition of keys and domains. This is the ultimate normal form.

4NF Example: Multi-valued Dependencies

Problem: A professor can teach multiple courses and have multiple research interests independently.

Before 4NF

Professor Course Research Interest
Dr. Smith Database Systems Data Mining
Dr. Smith Database Systems Machine Learning
Dr. Smith Data Structures Data Mining
Dr. Smith Data Structures Machine Learning

After 4NF

Professor-Course Table

ProfessorCourse
Dr. SmithDatabase Systems
Dr. SmithData Structures

Professor-Research Table

ProfessorResearch Interest
Dr. SmithData Mining
Dr. SmithMachine Learning

Normalization Process

Step-by-Step Normalization Algorithm

  1. Identify Functional Dependencies: List all FDs in the relation
  2. Find Candidate Keys: Determine all possible candidate keys
  3. Check 1NF: Ensure atomic values and unique records
  4. Check 2NF: Eliminate partial dependencies
  5. Check 3NF: Eliminate transitive dependencies
  6. Check BCNF: Ensure all determinants are super keys
  7. Check Higher Forms: If needed, check for 4NF and 5NF

Comprehensive Example: Student Management System

Original Unnormalized Table

StudentID StudentName Courses Grades Instructors DeptID DeptName DeptHead
S001 Alice Johnson CS101, CS102 A, B+ Dr. Smith, Dr. Jones CS Computer Science Dr. Brown
-- Final Normalized Schema (3NF/BCNF) -- Students Table CREATE TABLE Students ( StudentID VARCHAR(10) PRIMARY KEY, StudentName VARCHAR(50) NOT NULL, DeptID VARCHAR(10), FOREIGN KEY (DeptID) REFERENCES Departments(DeptID) ); -- Departments Table CREATE TABLE Departments ( DeptID VARCHAR(10) PRIMARY KEY, DeptName VARCHAR(50) NOT NULL, DeptHead VARCHAR(50) ); -- Courses Table CREATE TABLE Courses ( CourseID VARCHAR(10) PRIMARY KEY, CourseName VARCHAR(50) NOT NULL, InstructorID VARCHAR(10), FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID) ); -- Enrollments Table CREATE TABLE Enrollments ( StudentID VARCHAR(10), CourseID VARCHAR(10), Grade VARCHAR(5), PRIMARY KEY (StudentID, CourseID), FOREIGN KEY (StudentID) REFERENCES Students(StudentID), FOREIGN KEY (CourseID) REFERENCES Courses(CourseID) ); -- Instructors Table CREATE TABLE Instructors ( InstructorID VARCHAR(10) PRIMARY KEY, InstructorName VARCHAR(50) NOT NULL, DeptID VARCHAR(10), FOREIGN KEY (DeptID) REFERENCES Departments(DeptID) );

Denormalization

What is Denormalization?

Denormalization is the process of intentionally introducing redundancy into a database design to improve query performance, typically by combining normalized tables.

When to Denormalize?

📊

Performance Requirements

When query performance is more critical than storage space or update efficiency.

🔍

Read-Heavy Workloads

Systems with frequent reads but infrequent updates benefit from denormalization.

📈

Data Warehousing

OLAP systems often use denormalized structures for faster analytics.

Real-time Systems

Applications requiring sub-second response times may need denormalized data.

Denormalization Example

Instead of joining Students and Departments tables repeatedly:

Normalized (Multiple Tables)

SELECT s.StudentName, d.DeptName FROM Students s JOIN Departments d ON s.DeptID = d.DeptID;

Denormalized (Single Table)

StudentID StudentName DeptID DeptName
S001 Alice Johnson CS Computer Science
S002 Bob Smith CS Computer Science
SELECT StudentName, DeptName FROM Student_Dept_View;

Limitations of Relational Data Model

Structural Limitations

1. Complex Data Types

Problem: Traditional relational databases struggle with complex data types like arrays, nested objects, multimedia content, and hierarchical data structures.

Example: Storing JSON documents, XML data, or multimedia files requires workarounds or separate storage systems.

2. Object-Relational Impedance Mismatch

Problem: Difficulty in mapping object-oriented programming concepts to relational database structures.

Impact: Complex inheritance hierarchies and polymorphic relationships are hard to represent efficiently.

3. Fixed Schema Requirements

Problem: Relational databases require predefined schemas, making it difficult to handle dynamic or evolving data structures.

Example: Adding new attributes requires schema modifications and potential data migration.

Performance & Scalability Limitations

1. Horizontal Scalability Challenges

Problem: Traditional RDBMS are designed for vertical scaling (scaling up) rather than horizontal scaling (scaling out).

Impact: Difficult to distribute data across multiple servers while maintaining ACID properties and referential integrity.

2. Join Operation Overhead

Problem: Complex queries involving multiple joins can be computationally expensive.

Example: Normalized designs may require numerous joins for simple data retrieval operations.

3. Big Data Processing

Problem: Traditional relational databases are not optimized for processing massive volumes of unstructured or semi-structured data.

Limitations:

  • Limited support for parallel processing
  • Inefficient for analytical workloads on large datasets
  • Poor performance with real-time data streaming

Scalability Comparison

Aspect Relational Database NoSQL Solutions
Scaling Type Vertical (Scale Up) Horizontal (Scale Out)
Data Volume GB to TB range TB to PB range
Schema Flexibility Fixed Schema Dynamic Schema
Complex Queries SQL - Very Good Limited - Good for Simple Queries

Modern Data Requirements vs Relational Model

1. Semi-Structured and Unstructured Data

Challenge: Modern applications generate diverse data types including social media posts, sensor data, log files, and multimedia content.

Relational Limitation: Rigid tabular structure cannot efficiently handle variable schemas and nested data structures.

2. Real-Time Analytics

Requirement: Need for real-time data processing and analytics for immediate decision-making.

Limitation: Traditional RDBMS are optimized for transactional processing (OLTP) rather than analytical processing (OLAP).

3. Geographic Distribution

Challenge: Global applications require data to be distributed across multiple geographic locations.

Issues:

  • Network latency affects performance
  • Maintaining consistency across distributed nodes
  • Compliance with regional data protection laws

Use Case Analysis

Use Case Data Characteristics Relational Fit Better Alternative
Social Media Platform Unstructured posts, variable metadata Poor Document Store (MongoDB)
IoT Data Processing High-volume time-series data Poor Time-series DB (InfluxDB)
Recommendation Engine Graph relationships, user behavior Limited Graph Database (Neo4j)
Financial Transactions Structured, ACID requirements Excellent Relational Database

Alternative Data Models

📄

Document Stores

Examples: MongoDB, CouchDB

Best For: Semi-structured data, content management, catalogs

Advantages: Flexible schema, natural object mapping

🗂️

Column-Family

Examples: Cassandra, HBase

Best For: Time-series data, IoT applications, wide tables

Advantages: High write throughput, compression

🔗

Graph Databases

Examples: Neo4j, Amazon Neptune

Best For: Social networks, recommendation systems

Advantages: Natural relationship representation

🗝️

Key-Value Stores

Examples: Redis, DynamoDB

Best For: Caching, session management

Advantages: High performance, simplicity

CAP Theorem Impact

The CAP theorem states that distributed systems can only guarantee two of three properties simultaneously:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition Tolerance: System continues despite network failures

Impact: Traditional RDBMS prioritize Consistency and Availability but struggle with Partition Tolerance in distributed environments.

Solutions and Workarounds

Extending Relational Capabilities

Modern RDBMS Extensions

  • JSON/XML Support: PostgreSQL, MySQL, SQL Server now support JSON data types
  • Array Types: PostgreSQL supports array columns and operations
  • Full-Text Search: Built-in text search capabilities
  • Spatial Data: PostGIS for geographic information systems
  • User-Defined Types: Custom data types and functions

Hybrid Approaches

Polyglot Persistence

Using different database technologies for different parts of an application based on data requirements.

Data Type Database Choice Example Use
Transactional Data PostgreSQL/MySQL User accounts, orders, payments
Product Catalog MongoDB Variable product attributes
User Sessions Redis Fast session storage and caching
Recommendation Data Neo4j User behavior and product relationships
Analytics Data ClickHouse/BigQuery OLAP queries and reporting

Best Practices & Guidelines

When to Use Relational Databases

Ideal Scenarios for RDBMS

  • ACID Compliance Required: Financial systems, inventory management
  • Complex Queries: Applications requiring sophisticated joins and aggregations
  • Data Integrity Critical: Systems where referential integrity is paramount
  • Structured Data: Well-defined, stable data schemas
  • Reporting & Analytics: Business intelligence with SQL-based tools

Decision Framework

1

Analyze Data Structure

Is your data highly structured with fixed relationships? → RDBMS
Is your data semi-structured or schema-less? → NoSQL

2

Consider Scale Requirements

Single server sufficient? → RDBMS
Need horizontal scaling? → NoSQL

3

Evaluate Query Complexity

Complex joins and aggregations needed? → RDBMS
Simple key-value lookups? → NoSQL

4

Assess Consistency Needs

Strong consistency required? → RDBMS
Eventual consistency acceptable? → NoSQL

Summary & Key Takeaways

Normal Forms Recap

Normal Form Key Requirement Eliminates
1NF Atomic values, unique records Repeating groups
2NF No partial dependencies Partial dependencies
3NF No transitive dependencies Transitive dependencies
BCNF All determinants are super keys Additional anomalies in 3NF

Key Insights

🎯 Normalization Benefits

Reduces redundancy, improves data integrity, eliminates update anomalies

⚡ Performance Trade-offs

Higher normalization can increase join operations; denormalization may improve query performance

🔄 Modern Challenges

Relational model faces limitations with big data, real-time processing, and complex data types

🛠️ Hybrid Solutions

Modern applications often use polyglot persistence combining multiple database technologies

Final Recommendation

Choose the right tool for the job. Relational databases excel at structured data with complex relationships and strong consistency requirements. For other scenarios, consider NoSQL alternatives or hybrid approaches. The key is understanding your specific requirements and making informed architectural decisions.

Questions & Next Steps

Discussion Questions

  1. When might you deliberately choose not to normalize to 3NF in a production system?
  2. How would you approach database design for a social media platform with millions of users?
  3. What are the trade-offs between consistency and performance in distributed systems?
  4. How can modern RDBMS extensions address some traditional limitations?

Practical Assignment

Database Design Project

Design a database for an e-commerce platform considering:

  • Product catalog with variable attributes
  • User reviews and ratings
  • Order processing and inventory
  • Recommendation system requirements

Deliverables:

  • Normalized relational schema (up to 3NF/BCNF)
  • Identification of potential limitations
  • Proposed hybrid approach with justification

Next Lecture Preview

Lecture 10: Unit II: Transaction Management

We'll explore The ACID Properties,Transactions and Schedules.

Thank You!

Dr. Mohsin Dar
Assistant Professor, Cloud & Software Operations Cluster
School of Computer Science, UPES
Email: mohsin.dar@upes.ac.in