Unit V: Dimensional Database Design
MTech First Semester
Dr. Mohsin Dar
Assistant Professor
Cloud & Software Operations Cluster
UPES
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
Real-world data is often incomplete, noisy, and inconsistent. Preprocessing can consume 60-80% of the KDD effort but dramatically improves results.
Definition: Classification is the process of finding a model that describes and distinguishes data classes for predicting the class of new objects.
Definition: Clustering groups data objects based on similarity without predefined class labels. Objects in the same cluster are similar; objects in different clusters are dissimilar.
Definition: Finding frequent patterns, associations, or causal structures among sets of items in transaction databases.
Frequency of occurrence of an itemset in the database
Support(A→B) = P(A∪B)
Probability that B occurs when A occurs
Confidence(A→B) = P(B|A)
Rule: {Milk, Bread} → {Butter}
Interpretation: Customers who buy milk and bread also tend to buy butter
Support: 30% of transactions | Confidence: 75%
Most influential algorithm for mining frequent itemsets. Uses a level-wise search strategy with candidate generation.
Advanced neural networks for complex pattern recognition
Automated machine learning pipeline optimization
Efficient handling of connected data relationships
Data processing closer to the source
Secure and transparent data transactions
Responsible AI and data governance