Dr. Mohsin Dar
Assistant Professor
Cloud & Software Operations Cluster
UPES

Unit V

Dimensional Database Design

Database Systems - MTech First Semester

Introduction to Dimensional Modeling

What is Dimensional Modeling?

Dimensional modeling is a data modeling technique optimized for data warehousing and business intelligence applications.

Benefits of Fact Constellation:

  • Integrated Analysis: Analyze multiple business processes together
  • Dimension Reusability: Conformed dimensions ensure consistency
  • Drill-Across: Query across multiple fact tables using shared dimensions
  • Enterprise-Wide View: Complete picture of business operations

OLAP Cubes

Multidimensional Data Structures

What is an OLAP Cube?

An OLAP (Online Analytical Processing) Cube is a multidimensional data structure that enables fast analysis of data across multiple dimensions simultaneously.

Products
Time
Customers
Locations
Sales $
Metrics

Cube Characteristics

  • 3D or multidimensional structure
  • Pre-aggregated data for speed
  • Supports complex calculations
  • Enables interactive analysis

Cube Components

  • Dimensions: Axes of analysis
  • Measures: Numeric values
  • Cells: Intersection points
  • Hierarchies: Drill paths

Example: Sales Cube

Dimensions: Product, Time, Geography

Measures: Sales Amount, Quantity, Profit

Sample Query: "What were the total sales of Electronics in Q3 2024 in the Western region?"

OLAP Cube Operations

Key Operations on OLAP Cubes

1. Slicing

Selecting a single dimension value, creating a sub-cube.

Example: View sales for "2024" only (fixing Time dimension)

2. Dicing

Selecting specific ranges on multiple dimensions, creating a smaller cube.

Example: Sales in "Q1-Q2", "Electronics", "North Region"

3. Drill-Down

Navigate from higher to lower levels of detail in a dimension hierarchy.

Example: Year → Quarter → Month → Day

4. Roll-Up

Aggregate data by moving up the hierarchy (opposite of drill-down).

Example: Day → Month → Quarter → Year

5. Pivot (Rotate)

Rotate the cube to view data from different perspectives.

Example: Swap rows and columns in a report

6. Drill-Across

Query multiple fact tables that share conformed dimensions.

Example: Compare Sales and Inventory data

Types of OLAP Systems

Type Storage Performance Advantages Disadvantages
MOLAP
(Multidimensional OLAP)
Proprietary multidimensional database Fastest query performance • Excellent performance
• Pre-calculated aggregations
• Complex calculations
• Storage limitations
• Data redundancy
• Long load times
ROLAP
(Relational OLAP)
Relational database (RDBMS) Moderate performance • Scalable
• No data duplication
• Leverages SQL
• Large data volumes
• Slower than MOLAP
• Complex SQL queries
• Performance depends on RDBMS
HOLAP
(Hybrid OLAP)
Combination of both Balanced • Best of both worlds
• Flexible storage
• Optimized for different scenarios
• Complex architecture
• Synchronization needed
• Higher maintenance

Choosing the Right OLAP Type:

  • MOLAP: Small to medium datasets, need maximum speed
  • ROLAP: Very large datasets, existing RDBMS infrastructure
  • HOLAP: Mixed requirements, balance between speed and scalability

Best Practices in Dimensional Modeling

Dimension Design

  • Use surrogate keys (integer)
  • Denormalize for performance
  • Include descriptive attributes
  • Handle slowly changing dimensions (SCD)
  • Create meaningful hierarchies

Fact Table Design

  • Store only numeric measures
  • Use foreign keys to dimensions
  • Determine appropriate grain
  • Include all relevant facts
  • Consider factless fact tables

Performance Optimization

  • Create appropriate indexes
  • Partition large fact tables
  • Use aggregate tables
  • Implement incremental loads
  • Monitor query patterns

Data Quality

  • Implement data validation
  • Ensure referential integrity
  • Handle NULL values properly
  • Document dimension values
  • Maintain conformed dimensions

The Grain Principle:

Most Important Decision: Determine the grain (level of detail) of your fact table early. All dimensions and facts must be consistent with this grain.

Example: If grain is "one row per product per day per store", you cannot have weekly aggregates in the same fact table.

Slowly Changing Dimensions (SCD)

Handling Changes in Dimension Attributes

Dimension attributes change over time. SCD techniques manage these changes while maintaining historical accuracy.

Type 1: Overwrite (No History)

Method: Simply update the dimension record, losing historical data.

Use Case: Corrections, insignificant changes (e.g., fixing typos)

customer_keynamecity
101John SmithMumbai → Delhi

Impact: Historical reports show new value for all time periods

Type 2: Add New Row (Full History)

Method: Create a new dimension record with new surrogate key, preserve old record.

Use Case: Important changes requiring historical tracking

customer_keynamecitystart_dateend_datecurrent_flag
101John SmithMumbai2020-01-012024-06-30N
102John SmithDelhi2024-07-019999-12-31Y

Impact: Complete history maintained, accurate reporting for all periods

Type 3: Add New Column (Limited History)

Method: Add columns to track previous value (e.g., current_city, previous_city).

Use Case: Track only the most recent change

Real-World Applications

Industry Examples of Dimensional Modeling

Retail & E-Commerce

  • Sales analysis by product, store, time
  • Customer purchasing patterns
  • Inventory optimization
  • Promotion effectiveness

Banking & Finance

  • Transaction analysis
  • Customer profitability
  • Risk assessment
  • Regulatory reporting

Healthcare

  • Patient treatment outcomes
  • Resource utilization
  • Disease patterns
  • Cost analysis

Telecommunications

  • Call detail records analysis
  • Network performance
  • Customer churn prediction
  • Revenue assurance

Common Success Factors:

  • Business Understanding: Work closely with stakeholders to understand requirements
  • Conformed Dimensions: Ensure consistency across the enterprise
  • Incremental Development: Build and deploy in phases
  • Performance Monitoring: Continuously optimize based on usage patterns

Tools & Technologies

Popular Tools for Dimensional Modeling

Data Warehouse Platforms

  • Snowflake: Cloud data warehouse
  • Amazon Redshift: AWS data warehouse
  • Google BigQuery: Serverless warehouse
  • Azure Synapse: Microsoft analytics
  • Oracle Exadata: Enterprise DW

OLAP Engines

  • Microsoft Analysis Services: SSAS (MOLAP/ROLAP)
  • Oracle OLAP: Multidimensional engine
  • IBM Cognos TM1: In-memory OLAP
  • Apache Kylin: Open-source OLAP

ETL Tools

  • Informatica PowerCenter: Enterprise ETL
  • Talend: Open-source integration
  • Apache NiFi: Data flow automation
  • Microsoft SSIS: SQL Server Integration

BI & Visualization

  • Tableau: Interactive dashboards
  • Power BI: Microsoft analytics
  • QlikView: Associative analysis
  • Looker: Data exploration

Summary

Key Takeaways

Multidimensional Model Basics:

  • Facts: Numeric measurements representing business events
  • Dimensions: Descriptive context for facts (who, what, where, when, why)
  • Star Schema: Simple design with denormalized dimensions around a central fact table

Advanced Concepts:

  • Snowflake Schema: Normalized variant of star schema, reduces redundancy but adds complexity
  • Fact Constellation: Multiple fact tables sharing conformed dimensions for enterprise-wide analysis
  • OLAP Cubes: Multidimensional structures enabling fast analytical queries with operations like slice, dice, drill-down, and roll-up

Remember:

Dimensional modeling is about optimizing for analytical queries and business user understanding, not transactional efficiency.

The goal is to make data accessible, understandable, and fast for decision-making!

Questions?

Dr. Mohsin Dar

Assistant Professor

Cloud & Software Operations Cluster

UPES - University of Petroleum and Energy Studies

Database Systems - MTech First Semester

1 / 19
">

Key Characteristics:

Why Dimensional Modeling?

Facts - The Measurements

What are Facts?

Facts are the numeric measurements or metrics that represent business activities or events.

Characteristics of Facts

  • Quantitative and numeric
  • Continuously valued
  • Additive, semi-additive, or non-additive
  • Stored in fact tables

Types of Facts

  • Additive: Can be summed across all dimensions (e.g., sales amount)
  • Semi-Additive: Can be summed across some dimensions (e.g., account balance)
  • Non-Additive: Cannot be summed (e.g., ratios, percentages)

Example: Sales Fact Table

Facts (Measures):

  • Sales Amount (Additive)
  • Quantity Sold (Additive)
  • Discount Amount (Additive)
  • Profit Margin % (Non-Additive)

Dimensions - The Context

What are Dimensions?

Dimensions provide the context for facts. They describe the "who, what, where, when, why, and how" of business events.

Dimension Characteristics:

  • Descriptive: Contain textual or categorical attributes
  • Hierarchical: Often organized in hierarchies (e.g., Product → Category → Department)
  • Slowly Changing: Values change infrequently over time
  • Denormalized: Typically stored in denormalized form for query performance

Common Dimension Examples

Dimension Attributes Hierarchy
Time Date, Day, Month, Quarter, Year Day → Month → Quarter → Year
Product Product Name, SKU, Category, Brand Product → Category → Department
Customer Name, Age, Gender, Location Customer → City → State → Country
Store Store Name, Address, Region, Manager Store → District → Region

Star Schema

The Foundation of Dimensional Modeling

What is a Star Schema?

A star schema is the simplest dimensional model where a central fact table is surrounded by denormalized dimension tables.

Star Schema Structure

Time
Dimension
Product
Dimension
SALES
FACT TABLE
Store
Dimension
Customer
Dimension

Advantages

  • Simple and intuitive structure that mirrors business processes
  • Exceptional query performance for analytical queries
  • Minimal joins required (typically just one join per dimension)
  • Ideal for data warehouses and business intelligence
  • Optimized for read-heavy analytical workloads

Disadvantages

  • Data redundancy in denormalized dimensions
  • Higher storage requirements due to denormalization
  • Potential for update anomalies in dimension tables
  • Less flexible for complex hierarchical relationships
  • Not suitable for OLTP systems

When to Use Star Schema:

Star Schema - Detailed Example

Retail Sales Star Schema

Fact Table: SALES_FACT

Column Type Description
time_key (FK) Integer Foreign key to Time dimension
product_key (FK) Integer Foreign key to Product dimension
store_key (FK) Integer Foreign key to Store dimension
customer_key (FK) Integer Foreign key to Customer dimension
sales_amount Decimal Fact: Total sales value
quantity Integer Fact: Units sold
discount Decimal Fact: Discount amount

Dimension Table: PRODUCT_DIM

Attributes: product_key (PK), product_name, SKU, category, subcategory, brand, supplier, unit_price

Snowflake Schema

Normalized Dimensional Model

What is a Snowflake Schema?

A snowflake schema is a variant of star schema where dimension tables are normalized into multiple related tables, forming a snowflake-like structure.

Snowflake Schema Structure

Category
Subcategory
Product
SALES
FACT
Country
State
City

Advantages

  • Reduced data redundancy and storage requirements
  • Easier maintenance and updates to dimension attributes
  • Better representation of complex hierarchical relationships
  • More flexible for handling changes in dimension hierarchies
  • Improved data integrity through referential integrity

Disadvantages

  • More complex queries with multiple joins
  • Potentially slower query performance
  • Harder to understand for business users
  • Requires more complex ETL processes
  • May require more storage for additional keys

When to Use Snowflake Schema:

Star Schema vs Snowflake Schema

Choosing the Right Approach

Criteria Star Schema Snowflake Schema
Structure Single fact table with denormalized dimensions Single fact table with normalized dimensions
Query Performance Faster (fewer joins) Slower (more joins)
Storage Efficiency Less efficient (data redundancy) More efficient (normalized)
Ease of Use Easier to understand and query More complex queries required
Maintenance Harder to maintain (update anomalies) Easier to maintain (normalized)
Best For Simple hierarchies, read-heavy workloads Complex hierarchies, write-heavy workloads

Example: When to Choose Each

Choose Star Schema When:
- Building a sales data mart with simple product and customer dimensions
- Need maximum query performance for dashboards and reports
- Dimensions are relatively stable and don't change often

Choose Snowflake Schema When:
- Working with complex product hierarchies (category → subcategory → product)
- Need to maintain historical changes in dimension attributes
- Have limited storage capacity

Implementation Considerations

Best Practices for Schema Design

Designing Effective Schemas

Star Schema Best Practices

  • Keep dimension tables denormalized for performance
  • Use surrogate keys for dimension tables
  • Include descriptive attributes in dimensions
  • Use conformed dimensions across fact tables
  • Consider aggregate fact tables for performance

Snowflake Schema Best Practices

  • Normalize only when necessary
  • Be cautious with deep hierarchies (limit to 3-4 levels)
  • Consider materialized views for performance
  • Document the relationships thoroughly
  • Consider starflake approach (partially normalized)

Performance Optimization

Star vs Snowflake Schema

Aspect Star Schema Snowflake Schema
Structure Denormalized dimensions Normalized dimensions
Query Performance Faster (fewer joins) Slower (more joins required)
Storage Space More space (redundancy) Less space (normalized)
Complexity Simple to understand More complex structure
Maintenance Easier updates to facts Easier dimension maintenance
Data Integrity Update anomalies possible Better data integrity
Use Case Most data warehouses When storage is critical

When to Use Each?

  • Star Schema: Preferred for most data warehousing scenarios due to query performance
  • Snowflake Schema: Use when storage optimization is critical or when dimension hierarchies need strict normalization

Fact Constellation (Galaxy Schema)

Multiple Fact Tables Sharing Dimensions

What is a Fact Constellation?

A fact constellation, also called a galaxy schema, is a sophisticated dimensional model with multiple fact tables that share common dimension tables.

Fact Constellation Example

SALES
FACT
↓ ↓ ↓
Time
Product
Store
INVENTORY
FACT
↓ ↓ ↓
Time
Product
Warehouse

Note: Time and Product dimensions are shared between both fact tables

Key Characteristics:

Fact Constellation - Advanced Concepts

Real-World Example: Retail Analytics

Fact Table 1: SALES_FACT

Measures: sales_amount, quantity_sold, discount

Dimensions: Time, Product, Store, Customer, Promotion

Fact Table 2: INVENTORY_FACT

Measures: stock_level, reorder_quantity, stock_value

Dimensions: Time, Product, Warehouse, Supplier

Fact Table 3: SHIPMENT_FACT

Measures: shipment_cost, delivery_time, items_shipped

Dimensions: Time, Product, Store, Carrier

Shared Dimensions: Time and Product are conformed dimensions used across all three fact tables.