1 / 12

NoSQL Databases

Overview, Types, Use Cases & Comparison with RDBMS

Database

Instructor: Mohsin F. Dar

Assistant Professor, Cloud & Software Operations Cluster | SOCS | UPES

M.Tech. - Database Systems | First Semester

What is NoSQL?

Definition

NoSQL (Not Only SQL) refers to non-relational database management systems that provide mechanisms for storage and retrieval of data modeled in means other than tabular relations used in relational databases.

Key Characteristics

1
Schema-less or Flexible Schema: Data can be inserted without a predefined schema
2
Horizontal Scalability: Easily scale out by adding more servers
3
High Performance: Optimized for specific data models and access patterns
4
Distributed Architecture: Designed to run on clusters of commodity hardware
5
BASE Properties: Basically Available, Soft state, Eventually consistent

Key-Value Stores

Overview

The simplest NoSQL database model. Data is stored as a collection of key-value pairs where a key serves as a unique identifier.

Data Model: Similar to a hash table or dictionary where each key maps to exactly one value.

Examples

Key: "user:1001" Value: { "name": "John Doe", "email": "john@example.com", "age": 25 } Key: "session:abc123" Value: "active_user_token_xyz"

Popular Implementations

  • Redis: In-memory data store, extremely fast
  • Amazon DynamoDB: Fully managed, highly scalable
  • Riak: Distributed, fault-tolerant
  • Memcached: High-performance distributed caching

Common Use Cases

  • Session management
  • Caching frequently accessed data
  • User preferences and profiles
  • Shopping carts in e-commerce
  • Real-time recommendations

Document Stores

Overview

Store data as documents (typically JSON, BSON, or XML). Each document is self-contained and can have a different structure.

Document Example

{ "_id": "507f1f77bcf86cd799439011", "firstName": "Priya", "lastName": "Sharma", "email": "priya.sharma@example.com", "address": { "street": "123 MG Road", "city": "Bangalore", "state": "Karnataka", "pincode": "560001" }, "orders": [ { "orderId": "ORD001", "date": "2024-01-15", "total": 2500 } ], "tags": ["premium", "frequent_buyer"] }

Popular Implementations

  • MongoDB: Most widely used, rich query language
  • CouchDB: Master-master replication, HTTP API
  • Amazon DocumentDB: MongoDB compatible, managed
  • RavenDB: ACID compliant, .NET focused

Common Use Cases

  • Content management systems
  • E-commerce product catalogs
  • User profiles with varying attributes
  • Real-time analytics
  • Mobile application backends

Column-Family Stores

Overview

Store data in columns rather than rows. Optimized for queries over large datasets. Data is stored in column families (groups of related columns).

Key Concept: Instead of storing data row-by-row, data is stored column-by-column, allowing for efficient compression and faster analytical queries.

Data Model Structure

Row Key: "user:1001" Column Family: "personal_info" - first_name: "Amit" - last_name: "Kumar" - email: "amit@example.com" Column Family: "preferences" - language: "Hindi" - timezone: "IST" - theme: "dark"

Popular Implementations

  • Apache Cassandra: Highly scalable, peer-to-peer architecture
  • Apache HBase: Runs on Hadoop, strong consistency
  • Google Bigtable: Managed, petabyte-scale
  • ScyllaDB: Cassandra-compatible, written in C++

Common Use Cases

  • Time-series data (IoT sensors, logs)
  • Write-heavy applications
  • Large-scale data warehousing
  • Event logging and monitoring
  • Messaging platforms

Graph Databases

Overview

Store data as nodes, edges, and properties. Optimized for relationship-heavy data and traversal queries.

Core Components:
Nodes: Entities (people, places, things)
Edges: Relationships between nodes
Properties: Information about nodes and edges

Example Query (Cypher for Neo4j)

// Find friends of friends MATCH (person:Person {name: "Rahul"}) -[:FRIENDS_WITH]->(:Person) -[:FRIENDS_WITH]->(fof:Person) WHERE NOT (person)-[:FRIENDS_WITH]->(fof) RETURN fof.name AS SuggestedFriend // Find shortest path MATCH path = shortestPath( (person1:Person {name: "Amit"})-[*]-(person2:Person {name: "Priya"}) ) RETURN path

Popular Implementations

  • Neo4j: Most popular, Cypher query language
  • Amazon Neptune: Fully managed, supports multiple models
  • ArangoDB: Multi-model (document, graph, key-value)
  • JanusGraph: Distributed, scalable

Common Use Cases

  • Social networks and recommendations
  • Fraud detection
  • Knowledge graphs
  • Network and IT operations
  • Supply chain management

CAP Theorem & NoSQL

CAP Theorem States:

A distributed database system can provide at most TWO out of three guarantees simultaneously:

Consistency (C)

All nodes see the same data at the same time. Every read receives the most recent write.

Availability (A)

Every request receives a response, without guarantee that it contains the most recent data.

Partition Tolerance (P)

The system continues to operate despite network partitions between nodes.

NoSQL Database Classifications

Database Type CAP Choice Example
MongoDB CP (Consistency + Partition Tolerance) Prioritizes consistency over availability
Cassandra AP (Availability + Partition Tolerance) Eventually consistent, always available
Redis CP with AP options Configurable based on use case
Neo4j CA (when no partitions) Strong consistency in cluster

NoSQL vs Traditional RDBMS

Aspect RDBMS NoSQL
Data Model Tables with fixed schema, rows and columns Flexible: documents, key-value, graphs, columns
Schema Fixed, predefined schema (schema-on-write) Dynamic, flexible schema (schema-on-read)
Scalability Vertical scaling (scale-up) Horizontal scaling (scale-out)
Transactions ACID compliant (Atomicity, Consistency, Isolation, Durability) BASE (Basically Available, Soft state, Eventually consistent)
Query Language SQL (Structured Query Language) Varies by database (MongoDB Query Language, Cypher, etc.)
Relationships Complex joins, foreign keys, normalized data Denormalized, embedded documents, or graph edges

NoSQL vs RDBMS: Detailed Comparison

When to Use RDBMS

  • Complex queries with multiple joins
  • Transactions requiring ACID properties
  • Structured data with clear relationships
  • Data integrity is critical
  • Well-defined schema that rarely changes
  • Financial systems, ERP, CRM
Examples: MySQL, PostgreSQL, Oracle, SQL Server

When to Use NoSQL

  • Massive volume of data (Big Data)
  • Rapid development with changing requirements
  • Distributed systems across multiple data centers
  • High throughput, low latency requirements
  • Flexible or unstructured data
  • Social media, IoT, real-time analytics
Examples: MongoDB, Cassandra, Redis, Neo4j
Important Note: The choice between NoSQL and RDBMS is not binary. Many modern applications use both (polyglot persistence) to leverage the strengths of each for different aspects of the application.

Real-World Use Cases

🛒 E-Commerce: Amazon

Database: DynamoDB (Key-Value)

Use: Shopping carts, session management, product catalog

Why: High availability, scalability, microsecond latency

👥 Social Media: Facebook

Database: Cassandra (Column-Family)

Use: Inbox search, user activity logs

Why: Write-heavy workloads, massive scale

🎵 Spotify

Database: Cassandra + PostgreSQL

Use: User playlists, recommendations, metadata

Why: Polyglot persistence for different needs

🌐 LinkedIn

Database: Graph Database

Use: Professional network, connections, recommendations

Why: Complex relationship queries

📺 Netflix

Database: Cassandra + Others

Use: Viewing history, recommendations, user preferences

Why: Global distribution, high availability

🏦 Banking: ICICI, HDFC

Database: RDBMS (Oracle, DB2)

Use: Transactions, account management

Why: ACID compliance critical for financial data

Advantages & Challenges

Advantages of NoSQL

Flexibility: Schema-less design allows rapid iteration
Scalability: Horizontal scaling across commodity hardware
Performance: Optimized for specific data models and access patterns
High Availability: Built-in replication and fault tolerance
Cost-Effective: Uses commodity hardware efficiently

Challenges of NoSQL

!
Consistency: Eventual consistency can be complex to handle
!
Maturity: Less mature than RDBMS, fewer tools
!
Standardization: No standard query language across databases
!
Expertise: Requires specialized knowledge
!
Complex Queries: Joins and complex transactions are challenging
Key Takeaway: Choose the right tool for the right job. Consider your specific requirements: data model, scalability needs, consistency requirements, team expertise, and budget.

Summary & Key Takeaways

NoSQL Database Types

  • Key-Value: Redis, DynamoDB → Caching, sessions
  • Document: MongoDB, CouchDB → CMS, catalogs, profiles
  • Column-Family: Cassandra, HBase → Time-series, logs, IoT
  • Graph: Neo4j, Neptune → Social networks, fraud detection

Decision Framework

Requirement Choose RDBMS If... Choose NoSQL If...
Data Structure Structured, relational Unstructured, semi-structured, hierarchical
Scalability Vertical scaling sufficient Need horizontal scaling
Consistency Strong consistency required Eventual consistency acceptable
Schema Stable, well-defined Evolving, flexible
Volume Moderate data volumes Big Data, high velocity

Remember:

Modern applications often use polyglot persistence – combining multiple database types to leverage the strengths of each. The key is understanding your data, access patterns, and business requirements.

Questions?