Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases

Nguyen, Thanh Long

dc.contributor.advisor	Snášel, Václav
dc.contributor.author	Nguyen, Thanh Long
dc.date.accessioned	2021-11-08T12:13:50Z
dc.date.available	2021-11-08T12:13:50Z
dc.date.issued	2021
dc.identifier.other	OSD002
dc.identifier.uri	http://hdl.handle.net/10084/145381
dc.description.abstract	With the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses. Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking. In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role. One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint.	en
dc.description.abstract	With the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses. Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking. In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role. One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint.	cs
dc.format.extent	2600970 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Vysoká škola báňská – Technická univerzita Ostrava	cs
dc.subject	Frequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database.	en
dc.subject	Frequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database.	cs
dc.title	Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases	en
dc.title.alternative	Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases	cs
dc.type	Disertační práce	cs
dc.contributor.referee	Šenkeřík, Roman
dc.contributor.referee	Dvorský, Jiří
dc.contributor.referee	Hassanien, Aboul Ella
dc.date.accepted	2021-09-07
dc.thesis.degree-name	Ph.D.
dc.thesis.degree-level	Doktorský studijní program	cs
dc.thesis.degree-grantor	Vysoká škola báňská – Technická univerzita Ostrava. Fakulta elektrotechniky a informatiky	cs
dc.description.department	460 - Katedra informatiky	cs
dc.thesis.degree-program	Informatika, komunikační technologie a aplikovaná matematika	cs
dc.thesis.degree-branch	Informatika	cs
dc.description.result	vyhověl	cs
dc.identifier.sender	S2724
dc.identifier.thesis	NGU0032_FEI_P1807_1801V001_2021
dc.rights.access	openAccess

Soubory tohoto záznamu

Název:: NGU0032_FEI_P1807_1801V001_2021.pdf
Velikost:: 2.480Mb
Formát:: PDF
Popis:: Text práce

Zobrazit/otevřít

Název:: NGU0032_FEI_P1807_1801V001_202 ...
Velikost:: 1.870Mb
Formát:: PDF
Popis:: Autoreferát

Zobrazit/otevřít

Název:: NGU0032_FEI_P1807_1801V001_202 ...
Velikost:: 26.41Kb
Formát:: PDF
Popis:: Posudek oponenta – Dvorský, Jiří

Zobrazit/otevřít

Název:: NGU0032_FEI_P1807_1801V001_202 ...
Velikost:: 694.6Kb
Formát:: PDF
Popis:: Posudek oponenta – Hassanien, ...

Zobrazit/otevřít

Název:: NGU0032_FEI_P1807_1801V001_202 ...
Velikost:: 63.90Kb
Formát:: PDF
Popis:: Posudek oponenta – Šenkeřík, Roman

Zobrazit/otevřít

Tento záznam se objevuje v následujících kolekcích

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI) [13253]
Kolekce obsahuje vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky.

Zobrazit minimální záznam

Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases

Soubory tohoto záznamu

Tento záznam se objevuje v následujících kolekcích

Související záznamy

Zásuvný modul do programu SonarQube umožňující detekci návrhových vzorů ve zdrojovém kódu ﻿

Rozbor webových stránek z domény ubytování založený na webových vzorech použitelných při vyhledávání na Internetu ﻿

Parallel Methods for Mining Frequent Sequential patterns ﻿

Zásuvný modul do programu SonarQube umožňující detekci návrhových vzorů ve zdrojovém kódu

Rozbor webových stránek z domény ubytování založený na webových vzorech použitelných při vyhledávání na Internetu

Parallel Methods for Mining Frequent Sequential patterns