Learn the concept and importance of database normalization in relational databases.
Explore 1NF, 2NF, 3NF, BCNF, and higher normal forms with practical examples.
Develop skills to normalize database schemas and eliminate redundancy.
Recognize the limitations of relational data model in modern applications.
Database Normalization is a systematic approach to organizing data in a relational database to minimize redundancy and dependency by organizing fields and table relationships.
Reduce duplicate data storage to save space and improve consistency.
Maintain data accuracy and consistency across the database.
Prevent insertion, update, and deletion anomalies.
Create a logical and efficient database structure.
A functional dependency X → Y exists if for every valid instance of a relation, whenever two tuples have the same value for attribute X, they must have the same value for attribute Y.
| Type | Description | Example |
|---|---|---|
| Trivial FD | Y is a subset of X | {StudentID, Name} → {StudentID} |
| Non-trivial FD | Y is not a subset of X | {StudentID} → {Name} |
| Partial Dependency | Non-key attribute depends on part of composite key | {StudentID, CourseID} → {StudentName} |
| Transitive Dependency | A → B and B → C, then A → C | StudentID → DeptID → DeptName |
| StudentID | Name | Courses |
|---|---|---|
| 101 | Alice | Math, Physics, Chemistry |
| 102 | Bob | English, History |
| StudentID | Name | Course |
|---|---|---|
| 101 | Alice | Math |
| 101 | Alice | Physics |
| 101 | Alice | Chemistry |
| 102 | Bob | English |
| 102 | Bob | History |
| StudentID | CourseID | StudentName | CourseName | Grade |
|---|---|---|---|---|
| 101 | CS101 | Alice | Programming | A |
| 101 | CS102 | Alice | Data Structures | B+ |
Problem: StudentName depends only on StudentID, not on the full composite key (StudentID, CourseID)
| StudentID | StudentName |
|---|---|
| 101 | Alice |
| 102 | Bob |
| CourseID | CourseName |
|---|---|
| CS101 | Programming |
| CS102 | Data Structures |
| StudentID | CourseID | Grade |
|---|---|---|
| 101 | CS101 | A |
| 101 | CS102 | B+ |
| StudentID | StudentName | DepartmentID | DepartmentName | DepartmentHead |
|---|---|---|---|---|
| 101 | Alice | CS | Computer Science | Dr. Smith |
| 102 | Bob | CS | Computer Science | Dr. Smith |
Problem: DepartmentName and DepartmentHead depend on DepartmentID (transitive dependency)
| StudentID | StudentName | DepartmentID |
|---|---|---|
| 101 | Alice | CS |
| 102 | Bob | CS |
| DepartmentID | DepartmentName | DepartmentHead |
|---|---|---|
| CS | Computer Science | Dr. Smith |
| EE | Electrical Engineering | Dr. Jones |
Functional Dependencies:
| CourseID | Time | Instructor |
|---|---|---|
| CS101 | 9:00 AM | Dr. Smith |
| CS102 | 10:00 AM | Dr. Smith |
| CourseID | Time |
|---|---|
| CS101 | 9:00 AM |
| CS102 | 10:00 AM |
| Time | Instructor |
|---|---|
| 9:00 AM | Dr. Smith |
| 10:00 AM | Dr. Smith |
Must be in BCNF and have no multi-valued dependencies. Deals with situations where a table has two or more independent multi-valued facts about an entity.
Must be in 4NF and have no join dependencies. Every join dependency in the table is implied by the candidate keys.
Every constraint on the relation is a logical consequence of the definition of keys and domains. This is the ultimate normal form.
Problem: A professor can teach multiple courses and have multiple research interests independently.
| Professor | Course | Research Interest |
|---|---|---|
| Dr. Smith | Database Systems | Data Mining |
| Dr. Smith | Database Systems | Machine Learning |
| Dr. Smith | Data Structures | Data Mining |
| Dr. Smith | Data Structures | Machine Learning |
| Professor | Course |
|---|---|
| Dr. Smith | Database Systems |
| Dr. Smith | Data Structures |
| Professor | Research Interest |
|---|---|
| Dr. Smith | Data Mining |
| Dr. Smith | Machine Learning |
| StudentID | StudentName | Courses | Grades | Instructors | DeptID | DeptName | DeptHead |
|---|---|---|---|---|---|---|---|
| S001 | Alice Johnson | CS101, CS102 | A, B+ | Dr. Smith, Dr. Jones | CS | Computer Science | Dr. Brown |
Denormalization is the process of intentionally introducing redundancy into a database design to improve query performance, typically by combining normalized tables.
When query performance is more critical than storage space or update efficiency.
Systems with frequent reads but infrequent updates benefit from denormalization.
OLAP systems often use denormalized structures for faster analytics.
Applications requiring sub-second response times may need denormalized data.
Instead of joining Students and Departments tables repeatedly:
| StudentID | StudentName | DeptID | DeptName |
|---|---|---|---|
| S001 | Alice Johnson | CS | Computer Science |
| S002 | Bob Smith | CS | Computer Science |
Problem: Traditional relational databases struggle with complex data types like arrays, nested objects, multimedia content, and hierarchical data structures.
Example: Storing JSON documents, XML data, or multimedia files requires workarounds or separate storage systems.
Problem: Difficulty in mapping object-oriented programming concepts to relational database structures.
Impact: Complex inheritance hierarchies and polymorphic relationships are hard to represent efficiently.
Problem: Relational databases require predefined schemas, making it difficult to handle dynamic or evolving data structures.
Example: Adding new attributes requires schema modifications and potential data migration.
Problem: Traditional RDBMS are designed for vertical scaling (scaling up) rather than horizontal scaling (scaling out).
Impact: Difficult to distribute data across multiple servers while maintaining ACID properties and referential integrity.
Problem: Complex queries involving multiple joins can be computationally expensive.
Example: Normalized designs may require numerous joins for simple data retrieval operations.
Problem: Traditional relational databases are not optimized for processing massive volumes of unstructured or semi-structured data.
Limitations:
| Aspect | Relational Database | NoSQL Solutions |
|---|---|---|
| Scaling Type | Vertical (Scale Up) | Horizontal (Scale Out) |
| Data Volume | GB to TB range | TB to PB range |
| Schema Flexibility | Fixed Schema | Dynamic Schema |
| Complex Queries | SQL - Very Good | Limited - Good for Simple Queries |
Challenge: Modern applications generate diverse data types including social media posts, sensor data, log files, and multimedia content.
Relational Limitation: Rigid tabular structure cannot efficiently handle variable schemas and nested data structures.
Requirement: Need for real-time data processing and analytics for immediate decision-making.
Limitation: Traditional RDBMS are optimized for transactional processing (OLTP) rather than analytical processing (OLAP).
Challenge: Global applications require data to be distributed across multiple geographic locations.
Issues:
| Use Case | Data Characteristics | Relational Fit | Better Alternative |
|---|---|---|---|
| Social Media Platform | Unstructured posts, variable metadata | Poor | Document Store (MongoDB) |
| IoT Data Processing | High-volume time-series data | Poor | Time-series DB (InfluxDB) |
| Recommendation Engine | Graph relationships, user behavior | Limited | Graph Database (Neo4j) |
| Financial Transactions | Structured, ACID requirements | Excellent | Relational Database |
Examples: MongoDB, CouchDB
Best For: Semi-structured data, content management, catalogs
Advantages: Flexible schema, natural object mapping
Examples: Cassandra, HBase
Best For: Time-series data, IoT applications, wide tables
Advantages: High write throughput, compression
Examples: Neo4j, Amazon Neptune
Best For: Social networks, recommendation systems
Advantages: Natural relationship representation
Examples: Redis, DynamoDB
Best For: Caching, session management
Advantages: High performance, simplicity
The CAP theorem states that distributed systems can only guarantee two of three properties simultaneously:
Impact: Traditional RDBMS prioritize Consistency and Availability but struggle with Partition Tolerance in distributed environments.
Using different database technologies for different parts of an application based on data requirements.
| Data Type | Database Choice | Example Use |
|---|---|---|
| Transactional Data | PostgreSQL/MySQL | User accounts, orders, payments |
| Product Catalog | MongoDB | Variable product attributes |
| User Sessions | Redis | Fast session storage and caching |
| Recommendation Data | Neo4j | User behavior and product relationships |
| Analytics Data | ClickHouse/BigQuery | OLAP queries and reporting |
Is your data highly structured with fixed relationships? → RDBMS
Is your data semi-structured or schema-less? → NoSQL
Single server sufficient? → RDBMS
Need horizontal scaling? → NoSQL
Complex joins and aggregations needed? → RDBMS
Simple key-value lookups? → NoSQL
Strong consistency required? → RDBMS
Eventual consistency acceptable? → NoSQL
| Normal Form | Key Requirement | Eliminates |
|---|---|---|
| 1NF | Atomic values, unique records | Repeating groups |
| 2NF | No partial dependencies | Partial dependencies |
| 3NF | No transitive dependencies | Transitive dependencies |
| BCNF | All determinants are super keys | Additional anomalies in 3NF |
Reduces redundancy, improves data integrity, eliminates update anomalies
Higher normalization can increase join operations; denormalization may improve query performance
Relational model faces limitations with big data, real-time processing, and complex data types
Modern applications often use polyglot persistence combining multiple database technologies
Choose the right tool for the job. Relational databases excel at structured data with complex relationships and strong consistency requirements. For other scenarios, consider NoSQL alternatives or hybrid approaches. The key is understanding your specific requirements and making informed architectural decisions.
Design a database for an e-commerce platform considering:
Deliverables:
We'll explore The ACID Properties,Transactions and Schedules.
Dr. Mohsin Dar
Assistant Professor, Cloud & Software Operations Cluster
School of Computer Science, UPES
Email: mohsin.dar@upes.ac.in