Query Decomposition, Data Localization,
Distributed Query Processing,
Transaction Management & Concurrency Control
Database Systems | M.Tech First Semester
A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network, appearing as a single database to users.
Data is stored across multiple sites
Sites connected via communication network
Users see it as one logical database
Each site has local processing capability
Query decomposition transforms a high-level query (SQL) into an algebraic query on global relations, independent of data distribution.
Data localization transforms global queries into fragment queries by determining which database fragments are involved in the query.
| Type | Description | Example |
|---|---|---|
| Horizontal | Rows distributed based on conditions | Employees by region |
| Vertical | Columns distributed across sites | Employee personal vs salary data |
| Hybrid | Combination of both | Regional + departmental split |
Employees where
Location = 'Mumbai'
Employees where
Location = 'Delhi'
The query only needs to access Fragment F1 at Site 1, eliminating unnecessary network communication with Site 2.
Transfer one entire relation to the site of the other relation.
Best when: One relation is small, network bandwidth is high
Send only the joining attributes to reduce data transfer.
Advantage: Reduces network traffic significantly
Uses bit vectors (Bloom filters) to filter tuples before transfer.
Advantage: Even less data transfer than semi-join
All or nothing execution across all sites
Database integrity maintained globally
Transactions don't interfere across sites
Committed changes persist despite failures
Ensures atomicity in distributed transactions - either all sites commit or all abort.
Multiple transactions can access data at different sites simultaneously, requiring coordination to maintain consistency.
| Technique | Approach | Advantage |
|---|---|---|
| Locking | Use distributed locks | Prevents conflicts proactively |
| Timestamp | Assign global timestamps | No deadlocks |
| Optimistic | Validate at commit time | High concurrency for read-heavy workloads |
| Snapshot Isolation | Each transaction sees a snapshot | Readers don't block writers |
One site designated as primary for each data item and handles all lock requests for that item.
Advantage: Simple implementation, easy to detect deadlocks
Disadvantage: Primary site becomes bottleneck
Transactions acquire locks in growing phase, release in shrinking phase across all sites.
Each transaction T receives a unique timestamp TS(T) when it starts. System uses timestamps to ensure serializability.
"Minimize data movement, maximize parallel execution"