NoSQL Databases

Overview, Types, Use Cases & Comparison with RDBMS

Instructor: Mohsin F. Dar

Assistant Professor, Cloud & Software Operations Cluster | SOCS | UPES

M.Tech. - Database Systems | First Semester

What is NoSQL?

Definition

NoSQL (Not Only SQL) refers to non-relational database management systems that provide mechanisms for storage and retrieval of data modeled in means other than tabular relations used in relational databases.

Key Characteristics

1

Schema-less or Flexible Schema: Data can be inserted without a predefined schema

2

Horizontal Scalability: Easily scale out by adding more servers

3

High Performance: Optimized for specific data models and access patterns

4

Distributed Architecture: Designed to run on clusters of commodity hardware

5

BASE Properties: Basically Available, Soft state, Eventually consistent

Key-Value Stores

Overview

The simplest NoSQL database model. Data is stored as a collection of key-value pairs where a key serves as a unique identifier.

                    Data Model: Similar to a hash table or dictionary where each key maps to exactly one value.
                

Examples

Key: "user:1001"
Value: {
    "name": "John Doe",
    "email": "john@example.com",
    "age": 25
}

Key: "session:abc123"
Value: "active_user_token_xyz"

Popular Implementations

Redis: In-memory data store, extremely fast
Amazon DynamoDB: Fully managed, highly scalable
Riak: Distributed, fault-tolerant
Memcached: High-performance distributed caching

Common Use Cases

Session management
Caching frequently accessed data
User preferences and profiles
Shopping carts in e-commerce
Real-time recommendations

Document Stores

Overview

Store data as documents (typically JSON, BSON, or XML). Each document is self-contained and can have a different structure.

Document Example

{
    "_id": "507f1f77bcf86cd799439011",
    "firstName": "Priya",
    "lastName": "Sharma",
    "email": "priya.sharma@example.com",
    "address": {
        "street": "123 MG Road",
        "city": "Bangalore",
        "state": "Karnataka",
        "pincode": "560001"
    },
    "orders": [
        {
            "orderId": "ORD001",
            "date": "2024-01-15",
            "total": 2500
        }
    ],
    "tags": ["premium", "frequent_buyer"]
}

Popular Implementations

MongoDB: Most widely used, rich query language
CouchDB: Master-master replication, HTTP API
Amazon DocumentDB: MongoDB compatible, managed
RavenDB: ACID compliant, .NET focused

Common Use Cases

Content management systems
E-commerce product catalogs
User profiles with varying attributes
Real-time analytics
Mobile application backends

Column-Family Stores

Overview

Store data in columns rather than rows. Optimized for queries over large datasets. Data is stored in column families (groups of related columns).

                    Key Concept: Instead of storing data row-by-row, data is stored column-by-column, allowing for efficient compression and faster analytical queries.
                

Data Model Structure

Row Key: "user:1001"
Column Family: "personal_info"
    - first_name: "Amit"
    - last_name: "Kumar"
    - email: "amit@example.com"

Column Family: "preferences"
    - language: "Hindi"
    - timezone: "IST"
    - theme: "dark"

Popular Implementations

Apache Cassandra: Highly scalable, peer-to-peer architecture
Apache HBase: Runs on Hadoop, strong consistency
Google Bigtable: Managed, petabyte-scale
ScyllaDB: Cassandra-compatible, written in C++

Common Use Cases

Time-series data (IoT sensors, logs)
Write-heavy applications
Large-scale data warehousing
Event logging and monitoring
Messaging platforms

Graph Databases

Overview

Store data as nodes, edges, and properties. Optimized for relationship-heavy data and traversal queries.

                    Core Components:

                    • Nodes: Entities (people, places, things)

                    • Edges: Relationships between nodes

                    • Properties: Information about nodes and edges

Example Query (Cypher for Neo4j)

// Find friends of friends
MATCH (person:Person {name: "Rahul"})
      -[:FRIENDS_WITH]->(:Person)
      -[:FRIENDS_WITH]->(fof:Person)
WHERE NOT (person)-[:FRIENDS_WITH]->(fof)
RETURN fof.name AS SuggestedFriend

// Find shortest path
MATCH path = shortestPath(
  (person1:Person {name: "Amit"})-[*]-(person2:Person {name: "Priya"})
)
RETURN path

Popular Implementations

Neo4j: Most popular, Cypher query language
Amazon Neptune: Fully managed, supports multiple models
ArangoDB: Multi-model (document, graph, key-value)
JanusGraph: Distributed, scalable

Common Use Cases

Social networks and recommendations
Fraud detection
Knowledge graphs
Network and IT operations
Supply chain management

CAP Theorem & NoSQL

CAP Theorem States:

A distributed database system can provide at most TWO out of three guarantees simultaneously:

Consistency (C)

All nodes see the same data at the same time. Every read receives the most recent write.

Availability (A)

Every request receives a response, without guarantee that it contains the most recent data.

Partition Tolerance (P)

The system continues to operate despite network partitions between nodes.

NoSQL Database Classifications

Database Type	CAP Choice	Example
MongoDB	CP (Consistency + Partition Tolerance)	Prioritizes consistency over availability
Cassandra	AP (Availability + Partition Tolerance)	Eventually consistent, always available
Redis	CP with AP options	Configurable based on use case
Neo4j	CA (when no partitions)	Strong consistency in cluster

NoSQL vs Traditional RDBMS

Aspect	RDBMS	NoSQL
Data Model	Tables with fixed schema, rows and columns	Flexible: documents, key-value, graphs, columns
Schema	Fixed, predefined schema (schema-on-write)	Dynamic, flexible schema (schema-on-read)
Scalability	Vertical scaling (scale-up)	Horizontal scaling (scale-out)
Transactions	ACID compliant (Atomicity, Consistency, Isolation, Durability)	BASE (Basically Available, Soft state, Eventually consistent)
Query Language	SQL (Structured Query Language)	Varies by database (MongoDB Query Language, Cypher, etc.)
Relationships	Complex joins, foreign keys, normalized data	Denormalized, embedded documents, or graph edges

NoSQL vs RDBMS: Detailed Comparison

When to Use RDBMS

Complex queries with multiple joins
Transactions requiring ACID properties
Structured data with clear relationships
Data integrity is critical
Well-defined schema that rarely changes
Financial systems, ERP, CRM

Examples: MySQL, PostgreSQL, Oracle, SQL Server

When to Use NoSQL

Massive volume of data (Big Data)
Rapid development with changing requirements
Distributed systems across multiple data centers
High throughput, low latency requirements
Flexible or unstructured data
Social media, IoT, real-time analytics

Examples: MongoDB, Cassandra, Redis, Neo4j

                    Important Note: The choice between NoSQL and RDBMS is not binary. Many modern applications use both (polyglot persistence) to leverage the strengths of each for different aspects of the application.
                

Real-World Use Cases

🛒 E-Commerce: Amazon

Database: DynamoDB (Key-Value)

Use: Shopping carts, session management, product catalog

Why: High availability, scalability, microsecond latency

👥 Social Media: Facebook

Database: Cassandra (Column-Family)

Use: Inbox search, user activity logs

Why: Write-heavy workloads, massive scale

🎵 Spotify

Database: Cassandra + PostgreSQL

Use: User playlists, recommendations, metadata

Why: Polyglot persistence for different needs

🌐 LinkedIn

Database: Graph Database

Use: Professional network, connections, recommendations

Why: Complex relationship queries

📺 Netflix

Database: Cassandra + Others

Use: Viewing history, recommendations, user preferences

Why: Global distribution, high availability

🏦 Banking: ICICI, HDFC

Database: RDBMS (Oracle, DB2)

Use: Transactions, account management

Why: ACID compliance critical for financial data

Advantages & Challenges

Advantages of NoSQL

✓

Flexibility: Schema-less design allows rapid iteration

✓

Scalability: Horizontal scaling across commodity hardware

✓

Performance: Optimized for specific data models and access patterns

✓

High Availability: Built-in replication and fault tolerance

✓

Cost-Effective: Uses commodity hardware efficiently

Challenges of NoSQL

!

Consistency: Eventual consistency can be complex to handle

!

Maturity: Less mature than RDBMS, fewer tools

!

Standardization: No standard query language across databases

!

Expertise: Requires specialized knowledge

!

Complex Queries: Joins and complex transactions are challenging

                    Key Takeaway: Choose the right tool for the right job. Consider your specific requirements: data model, scalability needs, consistency requirements, team expertise, and budget.
                

Summary & Key Takeaways

                    NoSQL Database Types
                    Key-Value: Redis, DynamoDB → Caching, sessions
Document: MongoDB, CouchDB → CMS, catalogs, profiles
Column-Family: Cassandra, HBase → Time-series, logs, IoT
Graph: Neo4j, Neptune → Social networks, fraud detection

                

Decision Framework

Requirement	Choose RDBMS If...	Choose NoSQL If...
Data Structure	Structured, relational	Unstructured, semi-structured, hierarchical
Scalability	Vertical scaling sufficient	Need horizontal scaling
Consistency	Strong consistency required	Eventual consistency acceptable
Schema	Stable, well-defined	Evolving, flexible
Volume	Moderate data volumes	Big Data, high velocity

Remember:

Modern applications often use polyglot persistence – combining multiple database types to leverage the strengths of each. The key is understanding your data, access patterns, and business requirements.

Questions?