Star Schema & Snowflake Schema

Dimensional Modeling in Data Warehousing

Dr. Mohsin Dar

Assistant Professor

Cloud & Software Operations Cluster | SOCS

University of Petroleum and Energy Studies (UPES)

Database Systems - MTech First Semester

Introduction to Dimensional Modeling

What is Dimensional Modeling?
A design technique used in data warehousing to organize data for efficient querying and analysis.

Key Components:

Primary Schemas: Star Schema and Snowflake Schema

Star Schema

Definition:

A star schema is a database organizational structure where a central fact table is surrounded by denormalized dimension tables, resembling a star shape.

Star Schema Structure

Dimension 1
(Customer)
Dimension 2
(Product)
FACT TABLE
(Sales)
Dimension 3
(Time)
Dimension 4
(Location)

Characteristics:

Snowflake Schema

Definition:

A snowflake schema is an extension of the star schema where dimension tables are normalized into multiple related tables, creating a snowflake-like pattern.

Snowflake Schema Structure

Customer
City
Country
Product
Category
Supplier
FACT TABLE (Sales)

Characteristics:

Comparison: Advantages & Disadvantages

Aspect Star Schema Snowflake Schema
Query Performance ✓ Faster (fewer joins) ✗ Slower (more joins)
Storage Space ✗ More space (redundancy) ✓ Less space (normalized)
Query Complexity ✓ Simple queries ✗ Complex queries
Data Integrity ✗ Lower (redundancy) ✓ Higher (normalized)
Maintenance ✗ More updates needed ✓ Easier maintenance
ETL Complexity ✓ Simpler ETL process ✗ Complex ETL process

Star Schema - Advantages

Best Use Case: When query performance is critical and storage is not a major concern

Star Schema - Disadvantages

Trade-off: Performance vs. Storage and Data Integrity

Snowflake Schema - Advantages

Best Use Case: When data integrity and storage efficiency are priorities

Snowflake Schema - Disadvantages

Trade-off: Storage Efficiency vs. Query Performance and Complexity

Performance Considerations

Star Schema Performance:

Snowflake Schema Performance:

Performance Optimization: Use materialized views, partitioning, and proper indexing strategies

Decision Criteria: Which Schema to Use?

Use Star Schema When:

  • Query performance is the top priority
  • Storage space is not a constraint
  • Dimension tables are relatively small
  • Users need simple, fast reporting
  • OLAP and BI tools are primary consumers
  • Data warehouse is read-intensive

Use Snowflake Schema When:

  • Storage optimization is critical
  • Complex hierarchical dimensions exist
  • Data integrity is paramount
  • Dimension tables are very large
  • Frequent dimension updates occur
  • Normalized data is required for compliance

Example: Retail Sales Star Schema

Business Scenario:

A retail company wants to analyze sales data across products, customers, time, and stores.

Fact Table: SALES_FACT

Primary Keys (Composite): - sale_id (Surrogate Key) Foreign Keys: - product_key → PRODUCT_DIM - customer_key → CUSTOMER_DIM - time_key → TIME_DIM - store_key → STORE_DIM Measures: - quantity_sold (Additive) - sales_amount (Additive) - discount_amount (Additive) - profit_amount (Additive) - unit_price (Non-additive)

Star Schema - Dimension Tables

PRODUCT_DIM
PK: product_key
Attributes: product_id, product_name, category, subcategory, brand, supplier_name, supplier_city, supplier_country, unit_cost
CUSTOMER_DIM
PK: customer_key
Attributes: customer_id, customer_name, age, gender, email, phone, address, city, state, country, postal_code, customer_segment
TIME_DIM
PK: time_key
Attributes: date, day, month, quarter, year, day_of_week, week_of_year, is_weekend, is_holiday, fiscal_period
STORE_DIM
PK: store_key
Attributes: store_id, store_name, store_type, manager_name, address, city, state, country, region, store_size, opening_date

Star Schema - Complete Visual

PRODUCT_DIM
product_key (PK)
product_name
category
brand
supplier_name
CUSTOMER_DIM
customer_key (PK)
customer_name
city
state
country
SALES_FACT
sale_id (PK)
product_key (FK)
customer_key (FK)
time_key (FK)
store_key (FK)
─────────────
quantity_sold
sales_amount
profit_amount
TIME_DIM
time_key (PK)
date
month
quarter
year
STORE_DIM
store_key (PK)
store_name
city
region
store_type
Note: All dimensions connect directly to the fact table with single-level relationships

Example: Retail Sales Snowflake Schema

Same Business Scenario - Normalized Approach

Same retail company with normalized dimension tables to reduce redundancy.

Fact Table: SALES_FACT (Same as Star Schema)

Keys and Measures remain identical Difference: Dimension tables are normalized into multiple related tables

Key Changes:

Snowflake Schema - Normalized Dimensions

PRODUCT Hierarchy:

PRODUCT
product_key (PK)
category_key (FK)
supplier_key (FK)
CATEGORY
category_key (PK)
category_name
subcategory
SUPPLIER
supplier_key (PK)
supplier_name
city_key (FK)

CUSTOMER Hierarchy:

CUSTOMER
customer_key (PK)
customer_name
city_key (FK)
CITY
city_key (PK)
city_name
state_key (FK)
STATE
state_key (PK)
state_name
country_key (FK)
COUNTRY
country_key (PK)
country_name

STORE Hierarchy:

STORE
store_key (PK)
store_name
city_key (FK)
CITY
(Shared with Customer)

Snowflake Schema - Complete Visual

SALES_FACT
sale_id (PK)
product_key (FK)
customer_key (FK)
store_key (FK)
time_key (FK)
quantity_sold
total_amount
PRODUCT
product_key (PK)
product_name
category_key (FK)
supplier_key (FK)
CATEGORY
category_key (PK)
category_name
subcategory
SUPPLIER
supplier_key (PK)
supplier_name
city_key (FK)
CUSTOMER
customer_key (PK)
customer_name
city_key (FK)
CITY
city_key (PK)
city_name
state_key (FK)
STATE
state_key (PK)
state_name
country_key (FK)
COUNTRY
country_key (PK)
country_name
STORE
store_key (PK)
store_name
city_key (FK)
CITY
(Shared with Customer)
TIME
time_key (PK)
date
day_of_week
month
quarter
year
is_holiday

Summary and Key Takeaways

Star Schema

  • Simple and denormalized structure
  • Single fact table connected to dimension tables
  • Faster query performance for analytical queries
  • Higher storage requirements due to data redundancy

Snowflake Schema

  • Normalized dimension tables
  • Reduced data redundancy and storage requirements
  • More complex queries due to multiple joins
  • Better for complex hierarchies and slowly changing dimensions

Choosing the Right Schema

  • Use Star Schema for:
    • Read-heavy analytical workloads
    • Simpler data models
    • When query performance is critical
  • Use Snowflake Schema when:
    • Storage optimization is important
    • Working with complex hierarchies
    • Data integrity is a top priority

Final Thoughts

Both star and snowflake schemas have their place in data warehousing. The choice depends on your specific requirements for query performance, storage efficiency, and data complexity. Understanding these schemas is crucial for designing effective data warehouses that meet your organization's analytical needs.

Slide 18 of 18