dc.contributor.advisor | Snášel, Václav | |
dc.contributor.author | Oweis, Nour Easa | |
dc.date.accessioned | 2016-11-01T09:39:12Z | |
dc.date.available | 2016-11-01T09:39:12Z | |
dc.date.issued | 2016 | |
dc.identifier.other | OSD002 | cs |
dc.identifier.uri | http://hdl.handle.net/10084/112232 | |
dc.description | Import 02/11/2016 | cs |
dc.description.abstract | Background: Big Data mining is an analytic process utilized to discover the hidden knowledge and patterns from a massive, complex, and multidimensional dataset. Single processors memory and CPU resources are very limited in this aspect, which makes the algorithm performance ineffective. Association rule mining (ARM) is traditionally used to uncover hidden knowledge in data sets. However, they were unable to handle huge big data sets. Therefore, scalable and parallel strategies for ARM based on Big Data approaches are needed. Example of this approach is parallel association rule mining algorithm based on MapReduce by using lift interestingness measure (LIM)
Methods: This thesis proposes two algorithms for data mining and optimization. The first is parallel association rule mining algorithm based on MapReduce by using LIM (MapReduce Lift Association Rule (MRLAR)), to provide high scalability over parallel execution. The second is reduce dimensionality by using multiple data reduction techniques including principle component analysis (PCA), singular value decomposition (SDD), semi-discrete decomposition (SVD), applied to reduce the data into fewer dimensions as pre-processing techniques for data optimization.
Results: The MRLAR was found to directly extract the association rule and type of correlation between Lift Hand Side (LHS) and Right Hand Side (RHS) in the ARM (Lift) without the need for additional computation on the confidence measure. It also provided the following advantages: High scalability by utilizing parallel execution (MapReduce), support big data, one scan dataset, no more post-processing techniques and fault tolerance. The study also proposed an algorithm for data reduction using PCA, SVD, and SDD. The SVD was also found to have better accuracy and less time execution than SDD.
Conclusions: The MRLAR performed effectively in data mining. The data reduction techniques enhanced the pre-processing of data by dimensionality reduction. | en |
dc.description.abstract | Background: Big Data mining is an analytic process utilized to discover the hidden knowledge and patterns from a massive, complex, and multidimensional dataset. Single processors memory and CPU resources are very limited in this aspect, which makes the algorithm performance ineffective. Association rule mining (ARM) is traditionally used to uncover hidden knowledge in data sets. However, they were unable to handle huge big data sets. Therefore, scalable and parallel strategies for ARM based on Big Data approaches are needed. Example of this approach is parallel association rule mining algorithm based on MapReduce by using lift interestingness measure (LIM)
Methods: This thesis proposes two algorithms for data mining and optimization. The first is parallel association rule mining algorithm based on MapReduce by using LIM (MapReduce Lift Association Rule (MRLAR)), to provide high scalability over parallel execution. The second is reduce dimensionality by using multiple data reduction techniques including principle component analysis (PCA), singular value decomposition (SDD), semi-discrete decomposition (SVD), applied to reduce the data into fewer dimensions as pre-processing techniques for data optimization.
Results: The MRLAR was found to directly extract the association rule and type of correlation between Lift Hand Side (LHS) and Right Hand Side (RHS) in the ARM (Lift) without the need for additional computation on the confidence measure. It also provided the following advantages: High scalability by utilizing parallel execution (MapReduce), support big data, one scan dataset, no more post-processing techniques and fault tolerance. The study also proposed an algorithm for data reduction using PCA, SVD, and SDD. The SVD was also found to have better accuracy and less time execution than SDD.
Conclusions: The MRLAR performed effectively in data mining. The data reduction techniques enhanced the pre-processing of data by dimensionality reduction. | cs |
dc.format | 93 s. : il. | cs |
dc.format.extent | 1526228 bytes | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Vysoká škola báňská - Technická univerzita Ostrava | cs |
dc.subject | Big Data | en |
dc.subject | Data Mining | en |
dc.subject | Association Rule | en |
dc.subject | MapReduce | en |
dc.subject | Lift Interesting Measurement | en |
dc.subject | Data Reduction | en |
dc.subject | SVD | en |
dc.subject | SSD | en |
dc.subject | PCA. | en |
dc.subject | Big Data | cs |
dc.subject | Data Mining | cs |
dc.subject | Association Rule | cs |
dc.subject | MapReduce | cs |
dc.subject | Lift Interesting Measurement | cs |
dc.subject | Data Reduction | cs |
dc.subject | SVD | cs |
dc.subject | SSD | cs |
dc.subject | PCA. | cs |
dc.title | Parallel Association Rule Mining Algorithm Based on MapReduce by Using Lift Interestingness Measure for Big Data | en |
dc.title.alternative | Paralelní algoritmy pro dolování pravidel založených na MapReduce a míře významnosti pro Big Data | cs |
dc.type | Disertační práce | cs |
dc.identifier.signature | 201600190 | cs |
dc.identifier.location | ÚK/Sklad diplomových prací | |
dc.contributor.referee | Abraham, Ajith | cs |
dc.contributor.referee | Ouddane, Nabil | cs |
dc.contributor.referee | Krömer, Pavel | cs |
dc.date.accepted | 2016-06-08 | |
dc.thesis.degree-name | Ph.D. | |
dc.thesis.degree-level | Doktorský studijní program | cs |
dc.thesis.degree-grantor | Vysoká škola báňská - Technická univerzita Ostrava. Fakulta elektrotechniky a informatiky | cs |
dc.description.department | 460 - Katedra informatiky | |
dc.thesis.degree-program | Informatika, komunikační technologie a aplikovaná matematika | cs |
dc.thesis.degree-branch | Informatika | cs |
dc.description.result | vyhověl | cs |
dc.identifier.sender | S2724 | cs |
dc.identifier.thesis | OWE001_FEI_P1807_1801V001_2016 | |
dc.rights.access | openAccess | |