Tree-Based Indexing

B-Tree & B+ Tree

Lecture-20 | Database Systems

Dr. Mohsin Dar

Cloud & Software Operations Cluster | UPES

MTech First Semester

Learning Objectives

                By the end of this lecture, you will be able to:
                Understand why tree structures are preferred for database indexing
Explain the structure and operations of B-trees
Differentiate between B-trees and B+ trees
Analyze why modern DBMS prefer B+ trees for indexing
Perform search, insert, and delete operations on tree-based indexes

            

Why Trees for Indexing?

The Problem with Alternatives

1. Arrays

Search: O(log n) with binary search (sorted array)
Insert/Delete: O(n) - requires shifting elements
Problem: Inefficient for dynamic data with frequent updates

2. Binary Search Trees (BST)

Average case: O(log n) for all operations
Worst case: O(n) - degenerates to linked list
Problem: Not balanced, poor disk I/O performance
Issue: High tree height = more disk accesses

Key Insight: Database indexes are stored on disk. Minimizing disk I/O is critical. We need balanced trees with high fan-out to reduce tree height and disk accesses.

B-Trees: The Solution

What is a B-Tree?

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.

Key Characteristics

✓ Balanced

All leaf nodes are at the same level

✓ High Fan-out

Many children per node (not just 2)

✓ Sorted Order

Keys within nodes are sorted

✓ Disk-Friendly

Optimized for block-based I/O

Order of B-Tree (m): Maximum number of children a node can have

B-Tree Structure

Properties of a B-Tree of order m

Every node has at most m children
Every non-leaf node (except root) has at least ⌈m/2⌉ children
Root has at least 2 children (if not a leaf)
All leaves appear at the same level
A non-leaf node with k children contains k-1 keys
Keys in a node are sorted: K₁ < K₂ < ... < Kₖ₋₁

Example: B-Tree of order 3

50

20 | 30

60 | 70

10 | 15

25

35 | 40

55

65

75 | 80

B-Tree Node Structure

Internal Node Structure

P₀

K₁

P₁

K₂

P₂

...

Kₙ

Pₙ

Kᵢ: Search keys | Pᵢ: Pointers to child nodes or data records

Key Ordering Property

All keys in subtree P₀ < K₁
All keys in subtree P₁ are between K₁ and K₂
All keys in subtree P₂ are between K₂ and K₃
All keys in subtree Pₙ > Kₙ

Data Location: In B-trees, data records (or pointers to data) can be stored in both internal nodes and leaf nodes.

B-Tree Search Operation

Algorithm: Search(key, node)

Start at root node
In current node:
- Perform binary search to find position i where Kᵢ₋₁ < key ≤ Kᵢ
- If key == Kᵢ, return found (data is here)
If not found and not leaf:
- Follow pointer Pᵢ to appropriate child
- Repeat step 2
If leaf reached and not found: Key doesn't exist

Time Complexity

O(log_m n) where m is the order and n is the number of keys

For m = 100 and n = 1,000,000: Only ~3 disk accesses needed!

Search(50) in B-Tree:
Root: [30, 60] → 50 < 60, go to middle child
Child: [40, 50, 55] → Found at index 1!
Result: 2 disk I/O operations
            

B-Tree Insert Operation

Algorithm: Insert(key)

Search for the appropriate leaf node where key should be inserted
Insert key in sorted order in the leaf node
If node overflows (more than m-1 keys):
- Split the node into two nodes
- Move median key to parent
- If parent overflows, split recursively
- If root splits, create new root (tree height increases)

Example: Insert 25 in B-Tree (order 3)

Before:

10 | 20 | 30

↓ Node is full, insert causes overflow

After Split:

20

↙ ↘

10

25 | 30

B-Tree Delete Operation

Algorithm: Delete(key) - Simplified

Locate the key in the tree
If key is in a leaf node:
- Simply remove it
- If underflow (< ⌈m/2⌉-1 keys), borrow from sibling or merge
If key is in internal node:
- Replace with predecessor or successor
- Delete the predecessor/successor from leaf
Handle underflow:
- Borrow: If sibling has extra keys, redistribute
- Merge: Combine with sibling if borrowing not possible

Important: Delete is the most complex operation in B-Trees due to maintaining balance and minimum key requirements. Underflow propagates upward, possibly reducing tree height.

B+ Trees: An Enhanced Variant

What is a B+ Tree?

A B+ Tree is a variation of B-Tree where all data records reside only at the leaf level, and internal nodes contain only keys for navigation.

Key Differences from B-Tree

📍 Data Location

B-Tree: Data in all nodes
B+ Tree: Data only in leaves

🔗 Leaf Structure

B-Tree: Leaves independent
B+ Tree: Leaves form linked list

🔑 Key Duplication

B-Tree: Keys appear once
B+ Tree: Keys duplicated in internal nodes

🎯 Search Path

B-Tree: Can end anywhere
B+ Tree: Always reaches leaf

B+ Tree Structure

Properties

Internal nodes: Store only keys and pointers (no data records)
Leaf nodes: Store keys and data records (or pointers to data)
Leaf nodes are linked: Forms a doubly linked list for sequential access
Key duplication: Keys in internal nodes are duplicated in leaves
All properties of B-Tree apply (balance, order, min/max children)

B+ Tree Structure

50

Internal Node (keys only)

20 | 30

60 | 70

Internal Nodes

10:D1 | 20:D2

30:D3 | 40:D4

50:D5 | 60:D6

70:D7 | 80:D8

Leaf Nodes (with data) ← → ← → ← →

B+ Tree Operations

Search Operation

Start at root, navigate using keys in internal nodes
Always traverse to leaf level (even if key found in internal node)
Search key in leaf node
Return data record if found

Time Complexity: O(log_m n)

Range Query

Major Advantage of B+ Trees!

Query: Find all records where 20 ≤ key ≤ 60

Search for key = 20 (reaches leaf)
Follow linked list of leaves from 20 to 60
Collect all records in range

Efficient Sequential Access: No need to traverse tree multiple times!

B+ Tree Insert & Delete

Insert Operation

Search for appropriate leaf node
Insert key-data pair in sorted order in leaf
If leaf overflows:
- Split leaf into two leaves
- Copy (not move) middle key to parent
- Update leaf linked list pointers
Handle parent overflow recursively

Delete Operation

Locate key in leaf node
Remove key-data pair from leaf
If underflow:
- Borrow from sibling or merge leaves
- Update parent keys accordingly
- Update linked list pointers

Note: Keys in internal nodes may remain even after deletion from leaves (not critical as they guide search).

B-Tree vs B+ Tree Comparison

Feature	B-Tree	B+ Tree
Data Location	All nodes (internal & leaf)	Only leaf nodes
Key Duplication	No duplication	Keys duplicated in internal nodes
Leaf Node Structure	Independent leaves	Linked list of leaves
Search Termination	Can end at any level	Always at leaf level
Range Queries	Requires multiple tree traversals	Efficient via leaf linked list
Sequential Access	Difficult (requires tree traversal)	Easy (follow leaf pointers)
Node Utilization	Lower (stores data)	Higher (only keys in internal)
Tree Height	Slightly shorter	Slightly taller (but negligible)
Deletion Complexity	More complex (data in all nodes)	Simpler (data only in leaves)

Why Modern DBMS Prefer B+ Trees

1️⃣ Better Range Queries

Sequential access through linked leaves is much faster than traversing the tree multiple times

2️⃣ Higher Fan-out

Internal nodes can store more keys (no data), reducing tree height and disk I/O

3️⃣ Simpler Deletion

Data only in leaves makes deletion logic cleaner and more predictable

4️⃣ Consistent Performance

All searches go to leaf level, providing predictable I/O cost

5️⃣ Better Caching

Internal nodes are smaller, more fit in memory/cache

6️⃣ Full Scans

Scanning entire table is just a leaf-level linked list traversal

Real-World Usage

MySQL InnoDB: Uses B+ Trees for indexes
PostgreSQL: Uses B+ Trees for indexes
Oracle: Uses B+ Trees for indexes
SQL Server: Uses B+ Trees for indexes

Summary

                Key Takeaways
                Tree-based indexing is essential for efficient database operations
B-Trees provide balanced, disk-friendly indexing with O(log n) operations
B+ Trees enhance B-Trees by storing data only in leaves
Linked leaf nodes in B+ Trees enable efficient range queries and sequential access
Modern DBMS universally prefer B+ Trees for their superior performance characteristics

            

What's Next?

Hash-based indexing
Bitmap indexes
Composite indexes
Index selection and optimization strategies

Practice: Try implementing B+ Tree insert and delete operations on paper to solidify your understanding!