Parallel Association Rule Mining Algorithm Based on MapReduce by Using Lift Interestingness Measure for Big Data

Oweis, Nour Easa

dc.contributor.advisor	Snášel, Václav
dc.contributor.author	Oweis, Nour Easa
dc.date.accessioned	2016-11-01T09:39:12Z
dc.date.available	2016-11-01T09:39:12Z
dc.date.issued	2016
dc.identifier.other	OSD002	cs
dc.identifier.uri	http://hdl.handle.net/10084/112232
dc.description	Import 02/11/2016	cs
dc.description.abstract	Background: Big Data mining is an analytic process utilized to discover the hidden knowledge and patterns from a massive, complex, and multidimensional dataset. Single processors memory and CPU resources are very limited in this aspect, which makes the algorithm performance ineffective. Association rule mining (ARM) is traditionally used to uncover hidden knowledge in data sets. However, they were unable to handle huge big data sets. Therefore, scalable and parallel strategies for ARM based on Big Data approaches are needed. Example of this approach is parallel association rule mining algorithm based on MapReduce by using lift interestingness measure (LIM) Methods: This thesis proposes two algorithms for data mining and optimization. The first is parallel association rule mining algorithm based on MapReduce by using LIM (MapReduce Lift Association Rule (MRLAR)), to provide high scalability over parallel execution. The second is reduce dimensionality by using multiple data reduction techniques including principle component analysis (PCA), singular value decomposition (SDD), semi-discrete decomposition (SVD), applied to reduce the data into fewer dimensions as pre-processing techniques for data optimization. Results: The MRLAR was found to directly extract the association rule and type of correlation between Lift Hand Side (LHS) and Right Hand Side (RHS) in the ARM (Lift) without the need for additional computation on the confidence measure. It also provided the following advantages: High scalability by utilizing parallel execution (MapReduce), support big data, one scan dataset, no more post-processing techniques and fault tolerance. The study also proposed an algorithm for data reduction using PCA, SVD, and SDD. The SVD was also found to have better accuracy and less time execution than SDD. Conclusions: The MRLAR performed effectively in data mining. The data reduction techniques enhanced the pre-processing of data by dimensionality reduction.	en
dc.description.abstract	Background: Big Data mining is an analytic process utilized to discover the hidden knowledge and patterns from a massive, complex, and multidimensional dataset. Single processors memory and CPU resources are very limited in this aspect, which makes the algorithm performance ineffective. Association rule mining (ARM) is traditionally used to uncover hidden knowledge in data sets. However, they were unable to handle huge big data sets. Therefore, scalable and parallel strategies for ARM based on Big Data approaches are needed. Example of this approach is parallel association rule mining algorithm based on MapReduce by using lift interestingness measure (LIM) Methods: This thesis proposes two algorithms for data mining and optimization. The first is parallel association rule mining algorithm based on MapReduce by using LIM (MapReduce Lift Association Rule (MRLAR)), to provide high scalability over parallel execution. The second is reduce dimensionality by using multiple data reduction techniques including principle component analysis (PCA), singular value decomposition (SDD), semi-discrete decomposition (SVD), applied to reduce the data into fewer dimensions as pre-processing techniques for data optimization. Results: The MRLAR was found to directly extract the association rule and type of correlation between Lift Hand Side (LHS) and Right Hand Side (RHS) in the ARM (Lift) without the need for additional computation on the confidence measure. It also provided the following advantages: High scalability by utilizing parallel execution (MapReduce), support big data, one scan dataset, no more post-processing techniques and fault tolerance. The study also proposed an algorithm for data reduction using PCA, SVD, and SDD. The SVD was also found to have better accuracy and less time execution than SDD. Conclusions: The MRLAR performed effectively in data mining. The data reduction techniques enhanced the pre-processing of data by dimensionality reduction.	cs
dc.format	93 s. : il.	cs
dc.format.extent	1526228 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Vysoká škola báňská - Technická univerzita Ostrava	cs
dc.subject	Big Data	en
dc.subject	Data Mining	en
dc.subject	Association Rule	en
dc.subject	MapReduce	en
dc.subject	Lift Interesting Measurement	en
dc.subject	Data Reduction	en
dc.subject	SVD	en
dc.subject	SSD	en
dc.subject	PCA.	en
dc.subject	Big Data	cs
dc.subject	Data Mining	cs
dc.subject	Association Rule	cs
dc.subject	MapReduce	cs
dc.subject	Lift Interesting Measurement	cs
dc.subject	Data Reduction	cs
dc.subject	SVD	cs
dc.subject	SSD	cs
dc.subject	PCA.	cs
dc.title	Parallel Association Rule Mining Algorithm Based on MapReduce by Using Lift Interestingness Measure for Big Data	en
dc.title.alternative	Paralelní algoritmy pro dolování pravidel založených na MapReduce a míře významnosti pro Big Data	cs
dc.type	Disertační práce	cs
dc.identifier.signature	201600190	cs
dc.identifier.location	ÚK/Sklad diplomových prací
dc.contributor.referee	Abraham, Ajith	cs
dc.contributor.referee	Ouddane, Nabil	cs
dc.contributor.referee	Krömer, Pavel	cs
dc.date.accepted	2016-06-08
dc.thesis.degree-name	Ph.D.
dc.thesis.degree-level	Doktorský studijní program	cs
dc.thesis.degree-grantor	Vysoká škola báňská - Technická univerzita Ostrava. Fakulta elektrotechniky a informatiky	cs
dc.description.department	460 - Katedra informatiky
dc.thesis.degree-program	Informatika, komunikační technologie a aplikovaná matematika	cs
dc.thesis.degree-branch	Informatika	cs
dc.description.result	vyhověl	cs
dc.identifier.sender	S2724	cs
dc.identifier.thesis	OWE001_FEI_P1807_1801V001_2016
dc.rights.access	openAccess

Soubory tohoto záznamu

Název:: OWE001_FEI_P1807_1801V001_2016.pdf
Velikost:: 1.455Mb
Formát:: PDF

Zobrazit/otevřít

Název:: OWE001_FEI_P1807_1801V001_2016 ...
Velikost:: 2.122Mb
Formát:: PDF

Zobrazit/otevřít

Název:: OWE001_FEI_P1807_1801V001_2016 ...
Velikost:: 87.71Kb
Formát:: PDF
Popis:: Posudek oponenta – Abraham, Ajith

Zobrazit/otevřít

Název:: OWE001_FEI_P1807_1801V001_2016 ...
Velikost:: 814.2Kb
Formát:: PDF
Popis:: Posudek oponenta – Krömer, Pavel

Zobrazit/otevřít

Název:: OWE001_FEI_P1807_1801V001_2016 ...
Velikost:: 224.3Kb
Formát:: PDF
Popis:: Posudek oponenta – Ouddane, Nabil

Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI) [13253]
Kolekce obsahuje vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky.
Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI) [13253]
Kolekce obsahuje vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky.

Zobrazit minimální záznam