Parallel Association Rule Mining Algorithm Based on MapReduce by Using Lift Interestingness Measure for Big Data

Abstract

Background: Big Data mining is an analytic process utilized to discover the hidden knowledge and patterns from a massive, complex, and multidimensional dataset. Single processors memory and CPU resources are very limited in this aspect, which makes the algorithm performance ineffective. Association rule mining (ARM) is traditionally used to uncover hidden knowledge in data sets. However, they were unable to handle huge big data sets. Therefore, scalable and parallel strategies for ARM based on Big Data approaches are needed. Example of this approach is parallel association rule mining algorithm based on MapReduce by using lift interestingness measure (LIM) Methods: This thesis proposes two algorithms for data mining and optimization. The first is parallel association rule mining algorithm based on MapReduce by using LIM (MapReduce Lift Association Rule (MRLAR)), to provide high scalability over parallel execution. The second is reduce dimensionality by using multiple data reduction techniques including principle component analysis (PCA), singular value decomposition (SDD), semi-discrete decomposition (SVD), applied to reduce the data into fewer dimensions as pre-processing techniques for data optimization. Results: The MRLAR was found to directly extract the association rule and type of correlation between Lift Hand Side (LHS) and Right Hand Side (RHS) in the ARM (Lift) without the need for additional computation on the confidence measure. It also provided the following advantages: High scalability by utilizing parallel execution (MapReduce), support big data, one scan dataset, no more post-processing techniques and fault tolerance. The study also proposed an algorithm for data reduction using PCA, SVD, and SDD. The SVD was also found to have better accuracy and less time execution than SDD. Conclusions: The MRLAR performed effectively in data mining. The data reduction techniques enhanced the pre-processing of data by dimensionality reduction.

Description

Import 02/11/2016

Subject(s)

Big Data, Data Mining, Association Rule, MapReduce, Lift Interesting Measurement, Data Reduction, SVD, SSD, PCA.

Citation