Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases

dc.contributor.advisorSnášel, Václav
dc.contributor.authorNguyen, Thanh Long
dc.contributor.refereeŠenkeřík, Roman
dc.contributor.refereeDvorský, Jiří
dc.contributor.refereeHassanien, Aboul Ella
dc.date.accepted2021-09-07
dc.date.accessioned2021-11-08T12:13:50Z
dc.date.available2021-11-08T12:13:50Z
dc.date.issued2021
dc.description.abstractWith the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses. Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking. In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role. One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint.en
dc.description.abstractWith the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses. Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking. In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role. One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint.cs
dc.description.department460 - Katedra informatikycs
dc.description.resultvyhovělcs
dc.format.extent2600970 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.otherOSD002
dc.identifier.senderS2724
dc.identifier.thesisNGU0032_FEI_P1807_1801V001_2021
dc.identifier.urihttp://hdl.handle.net/10084/145381
dc.language.isoen
dc.publisherVysoká škola báňská – Technická univerzita Ostravacs
dc.rights.accessopenAccess
dc.subjectFrequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database.en
dc.subjectFrequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database.cs
dc.thesis.degree-branchInformatikacs
dc.thesis.degree-grantorVysoká škola báňská – Technická univerzita Ostrava. Fakulta elektrotechniky a informatikycs
dc.thesis.degree-levelDoktorský studijní programcs
dc.thesis.degree-namePh.D.
dc.thesis.degree-programInformatika, komunikační technologie a aplikovaná matematikacs
dc.titleEfficient Algorithms for Mining Colossal Patterns in high Dimensional Databasesen
dc.title.alternativeEfficient Algorithms for Mining Colossal Patterns in high Dimensional Databasescs
dc.typeDisertační prácecs

Files

Original bundle

Now showing 1 - 5 out of 5 results
Loading...
Thumbnail Image
Name:
NGU0032_FEI_P1807_1801V001_2021.pdf
Size:
2.48 MB
Format:
Adobe Portable Document Format
Description:
Text práce
Loading...
Thumbnail Image
Name:
NGU0032_FEI_P1807_1801V001_2021_autoreferat.pdf
Size:
1.87 MB
Format:
Adobe Portable Document Format
Description:
Autoreferát
Loading...
Thumbnail Image
Name:
NGU0032_FEI_P1807_1801V001_2021_posudek_oponent_Dvorsky_Jiri.pdf
Size:
26.41 KB
Format:
Adobe Portable Document Format
Description:
Posudek oponenta – Dvorský, Jiří
Loading...
Thumbnail Image
Name:
NGU0032_FEI_P1807_1801V001_2021_posudek_oponent_Hassanien_Aboul_Ella.pdf
Size:
694.68 KB
Format:
Adobe Portable Document Format
Description:
Posudek oponenta – Hassanien, Aboul Ella
Loading...
Thumbnail Image
Name:
NGU0032_FEI_P1807_1801V001_2021_posudek_oponent_Senkerik_Roman.pdf
Size:
63.9 KB
Format:
Adobe Portable Document Format
Description:
Posudek oponenta – Šenkeřík, Roman