Dr. Mohsin Dar
Assistant Professor
Cloud & Software Operations Cluster
UPES

Unit V

Dimensional Database Design

Database Systems - MTech First Semester

Introduction to Dimensional Modeling

What is Dimensional Modeling?

Dimensional modeling is a data modeling technique optimized for data warehousing and business intelligence applications.

Benefits of Fact Constellation:

Integrated Analysis: Analyze multiple business processes together
Dimension Reusability: Conformed dimensions ensure consistency
Drill-Across: Query across multiple fact tables using shared dimensions
Enterprise-Wide View: Complete picture of business operations

OLAP Cubes

Multidimensional Data Structures

What is an OLAP Cube?

An OLAP (Online Analytical Processing) Cube is a multidimensional data structure that enables fast analysis of data across multiple dimensions simultaneously.

Products

Time

Customers

Locations

Sales $

Metrics

Cube Characteristics

3D or multidimensional structure
Pre-aggregated data for speed
Supports complex calculations
Enables interactive analysis

Cube Components

Dimensions: Axes of analysis
Measures: Numeric values
Cells: Intersection points
Hierarchies: Drill paths

Example: Sales Cube

Dimensions: Product, Time, Geography

Measures: Sales Amount, Quantity, Profit

Sample Query: "What were the total sales of Electronics in Q3 2024 in the Western region?"

OLAP Cube Operations

Key Operations on OLAP Cubes

1. Slicing

Selecting a single dimension value, creating a sub-cube.

Example: View sales for "2024" only (fixing Time dimension)

2. Dicing

Selecting specific ranges on multiple dimensions, creating a smaller cube.

Example: Sales in "Q1-Q2", "Electronics", "North Region"

3. Drill-Down

Navigate from higher to lower levels of detail in a dimension hierarchy.

Example: Year → Quarter → Month → Day

4. Roll-Up

Aggregate data by moving up the hierarchy (opposite of drill-down).

Example: Day → Month → Quarter → Year

5. Pivot (Rotate)

Rotate the cube to view data from different perspectives.

Example: Swap rows and columns in a report

6. Drill-Across

Query multiple fact tables that share conformed dimensions.

Example: Compare Sales and Inventory data

Types of OLAP Systems

Type	Storage	Performance	Advantages	Disadvantages
MOLAP (Multidimensional OLAP)	Proprietary multidimensional database	Fastest query performance	• Excellent performance • Pre-calculated aggregations • Complex calculations	• Storage limitations • Data redundancy • Long load times
ROLAP (Relational OLAP)	Relational database (RDBMS)	Moderate performance	• Scalable • No data duplication • Leverages SQL • Large data volumes	• Slower than MOLAP • Complex SQL queries • Performance depends on RDBMS
HOLAP (Hybrid OLAP)	Combination of both	Balanced	• Best of both worlds • Flexible storage • Optimized for different scenarios	• Complex architecture • Synchronization needed • Higher maintenance

Choosing the Right OLAP Type:

MOLAP: Small to medium datasets, need maximum speed
ROLAP: Very large datasets, existing RDBMS infrastructure
HOLAP: Mixed requirements, balance between speed and scalability

Best Practices in Dimensional Modeling

Dimension Design

Use surrogate keys (integer)
Denormalize for performance
Include descriptive attributes
Handle slowly changing dimensions (SCD)
Create meaningful hierarchies

Fact Table Design

Store only numeric measures
Use foreign keys to dimensions
Determine appropriate grain
Include all relevant facts
Consider factless fact tables

Performance Optimization

Create appropriate indexes
Partition large fact tables
Use aggregate tables
Implement incremental loads
Monitor query patterns

Data Quality

Implement data validation
Ensure referential integrity
Handle NULL values properly
Document dimension values
Maintain conformed dimensions

The Grain Principle:

Most Important Decision: Determine the grain (level of detail) of your fact table early. All dimensions and facts must be consistent with this grain.

Example: If grain is "one row per product per day per store", you cannot have weekly aggregates in the same fact table.

Slowly Changing Dimensions (SCD)

Handling Changes in Dimension Attributes

Dimension attributes change over time. SCD techniques manage these changes while maintaining historical accuracy.

Type 1: Overwrite (No History)

Method: Simply update the dimension record, losing historical data.

Use Case: Corrections, insignificant changes (e.g., fixing typos)

customer_key	name	city
101	John Smith	Mumbai → Delhi

Impact: Historical reports show new value for all time periods

Type 2: Add New Row (Full History)

Method: Create a new dimension record with new surrogate key, preserve old record.

Use Case: Important changes requiring historical tracking

customer_key	name	city	start_date	end_date	current_flag
101	John Smith	Mumbai	2020-01-01	2024-06-30	N
102	John Smith	Delhi	2024-07-01	9999-12-31	Y

Impact: Complete history maintained, accurate reporting for all periods

Type 3: Add New Column (Limited History)

Method: Add columns to track previous value (e.g., current_city, previous_city).

Use Case: Track only the most recent change

Real-World Applications

Industry Examples of Dimensional Modeling

Retail & E-Commerce

Sales analysis by product, store, time
Customer purchasing patterns
Inventory optimization
Promotion effectiveness

Banking & Finance

Transaction analysis
Customer profitability
Risk assessment
Regulatory reporting

Healthcare

Patient treatment outcomes
Resource utilization
Disease patterns
Cost analysis

Telecommunications

Call detail records analysis
Network performance
Customer churn prediction
Revenue assurance

Common Success Factors:

Business Understanding: Work closely with stakeholders to understand requirements
Conformed Dimensions: Ensure consistency across the enterprise
Incremental Development: Build and deploy in phases
Performance Monitoring: Continuously optimize based on usage patterns

Tools & Technologies

Popular Tools for Dimensional Modeling

Data Warehouse Platforms

Snowflake: Cloud data warehouse
Amazon Redshift: AWS data warehouse
Google BigQuery: Serverless warehouse
Azure Synapse: Microsoft analytics
Oracle Exadata: Enterprise DW

OLAP Engines

Microsoft Analysis Services: SSAS (MOLAP/ROLAP)
Oracle OLAP: Multidimensional engine
IBM Cognos TM1: In-memory OLAP
Apache Kylin: Open-source OLAP

ETL Tools

Informatica PowerCenter: Enterprise ETL
Talend: Open-source integration
Apache NiFi: Data flow automation
Microsoft SSIS: SQL Server Integration

BI & Visualization

Tableau: Interactive dashboards
Power BI: Microsoft analytics
QlikView: Associative analysis
Looker: Data exploration

Summary

Key Takeaways

Multidimensional Model Basics:

Facts: Numeric measurements representing business events
Dimensions: Descriptive context for facts (who, what, where, when, why)
Star Schema: Simple design with denormalized dimensions around a central fact table

Advanced Concepts:

Snowflake Schema: Normalized variant of star schema, reduces redundancy but adds complexity
Fact Constellation: Multiple fact tables sharing conformed dimensions for enterprise-wide analysis
OLAP Cubes: Multidimensional structures enabling fast analytical queries with operations like slice, dice, drill-down, and roll-up

Remember:

Dimensional modeling is about optimizing for analytical queries and business user understanding, not transactional efficiency.

The goal is to make data accessible, understandable, and fast for decision-making!

Questions?

Dr. Mohsin Dar

Assistant Professor

Cloud & Software Operations Cluster

UPES - University of Petroleum and Energy Studies

Database Systems - MTech First Semester

1 / 19

Key Characteristics:

Query Performance: Optimized for fast query execution and data retrieval
User Understanding: Intuitive structure that business users can easily comprehend
Flexibility: Accommodates changing business requirements
Analytical Focus: Designed specifically for analytical processing (OLAP)

Why Dimensional Modeling?

Traditional normalized databases are optimized for OLTP (Online Transaction Processing)
Analytical queries on normalized databases can be slow and complex
Dimensional models provide denormalized structures for faster analytics

Facts - The Measurements

What are Facts?

Facts are the numeric measurements or metrics that represent business activities or events.

Characteristics of Facts

Quantitative and numeric
Continuously valued
Additive, semi-additive, or non-additive
Stored in fact tables

Types of Facts

Additive: Can be summed across all dimensions (e.g., sales amount)
Semi-Additive: Can be summed across some dimensions (e.g., account balance)
Non-Additive: Cannot be summed (e.g., ratios, percentages)

Example: Sales Fact Table

Facts (Measures):

Sales Amount (Additive)
Quantity Sold (Additive)
Discount Amount (Additive)
Profit Margin % (Non-Additive)

Dimensions - The Context

What are Dimensions?

Dimensions provide the context for facts. They describe the "who, what, where, when, why, and how" of business events.

Dimension Characteristics:

Descriptive: Contain textual or categorical attributes
Hierarchical: Often organized in hierarchies (e.g., Product → Category → Department)
Slowly Changing: Values change infrequently over time
Denormalized: Typically stored in denormalized form for query performance

Common Dimension Examples

Dimension	Attributes	Hierarchy
Time	Date, Day, Month, Quarter, Year	Day → Month → Quarter → Year
Product	Product Name, SKU, Category, Brand	Product → Category → Department
Customer	Name, Age, Gender, Location	Customer → City → State → Country
Store	Store Name, Address, Region, Manager	Store → District → Region

Star Schema

The Foundation of Dimensional Modeling

What is a Star Schema?

A star schema is the simplest dimensional model where a central fact table is surrounded by denormalized dimension tables.

Star Schema Structure

Time
Dimension

→

Product
Dimension

SALES
FACT TABLE

Store
Dimension

←

Customer
Dimension

Advantages

Simple and intuitive structure that mirrors business processes
Exceptional query performance for analytical queries
Minimal joins required (typically just one join per dimension)
Ideal for data warehouses and business intelligence
Optimized for read-heavy analytical workloads

Disadvantages

Data redundancy in denormalized dimensions
Higher storage requirements due to denormalization
Potential for update anomalies in dimension tables
Less flexible for complex hierarchical relationships
Not suitable for OLTP systems

When to Use Star Schema:

When query performance is the highest priority
For business intelligence and reporting applications
When dimensions are relatively stable and don't change frequently
When dealing with large fact tables and relatively small dimension tables
For data marts and data warehouses

Star Schema - Detailed Example

Retail Sales Star Schema

Fact Table: SALES_FACT

Column	Type	Description
time_key (FK)	Integer	Foreign key to Time dimension
product_key (FK)	Integer	Foreign key to Product dimension
store_key (FK)	Integer	Foreign key to Store dimension
customer_key (FK)	Integer	Foreign key to Customer dimension
sales_amount	Decimal	Fact: Total sales value
quantity	Integer	Fact: Units sold
discount	Decimal	Fact: Discount amount

Dimension Table: PRODUCT_DIM

Attributes: product_key (PK), product_name, SKU, category, subcategory, brand, supplier, unit_price

Snowflake Schema

Normalized Dimensional Model

What is a Snowflake Schema?

A snowflake schema is a variant of star schema where dimension tables are normalized into multiple related tables, forming a snowflake-like structure.

Snowflake Schema Structure

Advantages

Reduced data redundancy and storage requirements
Easier maintenance and updates to dimension attributes
Better representation of complex hierarchical relationships
More flexible for handling changes in dimension hierarchies
Improved data integrity through referential integrity

Disadvantages

More complex queries with multiple joins
Potentially slower query performance
Harder to understand for business users
Requires more complex ETL processes
May require more storage for additional keys

When to Use Snowflake Schema:

When storage space is a critical concern
For dimensions with complex, multi-level hierarchies
When dimension attributes change frequently
When you need to enforce referential integrity
In environments with many shared dimensions across fact tables

Star Schema vs Snowflake Schema

Choosing the Right Approach

Criteria	Star Schema	Snowflake Schema
Structure	Single fact table with denormalized dimensions	Single fact table with normalized dimensions
Query Performance	Faster (fewer joins)	Slower (more joins)
Storage Efficiency	Less efficient (data redundancy)	More efficient (normalized)
Ease of Use	Easier to understand and query	More complex queries required
Maintenance	Harder to maintain (update anomalies)	Easier to maintain (normalized)
Best For	Simple hierarchies, read-heavy workloads	Complex hierarchies, write-heavy workloads

Example: When to Choose Each

Choose Star Schema When:
- Building a sales data mart with simple product and customer dimensions
- Need maximum query performance for dashboards and reports
- Dimensions are relatively stable and don't change often

Choose Snowflake Schema When:
- Working with complex product hierarchies (category → subcategory → product)
- Need to maintain historical changes in dimension attributes
- Have limited storage capacity

Implementation Considerations

Best Practices for Schema Design

Designing Effective Schemas

Star Schema Best Practices

Keep dimension tables denormalized for performance
Use surrogate keys for dimension tables
Include descriptive attributes in dimensions
Use conformed dimensions across fact tables
Consider aggregate fact tables for performance

Snowflake Schema Best Practices

Normalize only when necessary
Be cautious with deep hierarchies (limit to 3-4 levels)
Consider materialized views for performance
Document the relationships thoroughly
Consider starflake approach (partially normalized)

Performance Optimization

Indexing: Create indexes on all foreign keys and frequently filtered columns
Partitioning: Partition large fact tables by date ranges
Materialized Views: Pre-compute and store aggregations
Denormalization: Consider denormalizing some snowflaked dimensions for performance
Query Optimization: Use star transformation where supported by the database

Star vs Snowflake Schema

Aspect	Star Schema	Snowflake Schema
Structure	Denormalized dimensions	Normalized dimensions
Query Performance	Faster (fewer joins)	Slower (more joins required)
Storage Space	More space (redundancy)	Less space (normalized)
Complexity	Simple to understand	More complex structure
Maintenance	Easier updates to facts	Easier dimension maintenance
Data Integrity	Update anomalies possible	Better data integrity
Use Case	Most data warehouses	When storage is critical

When to Use Each?

Star Schema: Preferred for most data warehousing scenarios due to query performance
Snowflake Schema: Use when storage optimization is critical or when dimension hierarchies need strict normalization

Fact Constellation (Galaxy Schema)

Multiple Fact Tables Sharing Dimensions

What is a Fact Constellation?

A fact constellation, also called a galaxy schema, is a sophisticated dimensional model with multiple fact tables that share common dimension tables.

Fact Constellation Example

SALES
FACT

↓ ↓ ↓

Time

Product

Store

INVENTORY
FACT

↓ ↓ ↓

Time

Product

Warehouse

Note: Time and Product dimensions are shared between both fact tables

Key Characteristics:

Multiple Fact Tables: Each representing different business processes
Shared Dimensions: Common dimensions (e.g., Time, Product) used by multiple facts
Conformed Dimensions: Dimensions have consistent structure and content across facts
Complex Analysis: Enables cross-process analysis and drill-across queries

Fact Constellation - Advanced Concepts

Real-World Example: Retail Analytics

Fact Table 1: SALES_FACT

Measures: sales_amount, quantity_sold, discount

Dimensions: Time, Product, Store, Customer, Promotion

Fact Table 2: INVENTORY_FACT

Measures: stock_level, reorder_quantity, stock_value

Dimensions: Time, Product, Warehouse, Supplier

Fact Table 3: SHIPMENT_FACT

Measures: shipment_cost, delivery_time, items_shipped

Dimensions: Time, Product, Store, Carrier

Shared Dimensions: Time and Product are conformed dimensions used across all three fact tables.