dc.contributor.advisor | Snášel, Václav | |
dc.contributor.author | Nguyen, Thanh Long | |
dc.date.accessioned | 2021-11-08T12:13:50Z | |
dc.date.available | 2021-11-08T12:13:50Z | |
dc.date.issued | 2021 | |
dc.identifier.other | OSD002 | |
dc.identifier.uri | http://hdl.handle.net/10084/145381 | |
dc.description.abstract | With the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses.
Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking.
In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role.
One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint. | en |
dc.description.abstract | With the rapid development of information technology and the application of information technology in many areas of life and socio-economy, for many years the information of humanity has been stored in database system is increasing, the accumulation of this data occurs at a burst speed. This huge amount of data is really a valuable source of "resources" because information is a key element in many areas. Data mining has helped users gain valuable insights from huge databases and data warehouses.
Data mining has been widely applied in many fields. In the field of data mining, the association rule is used to indicate the association or correlation between the "conditional → consequent" data elements between data elements. Detecting association rules is to detect those relationships within the scope of a given set of data. Association rule was first introduced in 1993 by Agrawal et al. [1] and has become one of the major data mining studies, especially in recent years. Linkage detection has been successfully applied in many socio-economic fields such as trade, health, biology, finance and banking.
In association rule, frequent pattern mining is a key and an important task. Frequent pattern mining refers to the patterns that frequently occur in databases. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent patterns, in which the downward closure property plays a fundamental role.
One of the challenges in pattern mining is the computational costs besides that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for frequent pattern mining, especial colossal pattern mining and develop methods for mining frequent colossal patterns in high dimensional databases that can tackle emerging data processing workloads while coping with larger and larger scales. Firstly, we develop CP (colossal pattern)-tree for efficient storing colosal patterns. Next, we propose CP-Miner algorithm to mine colossal patterns. CP-Miner is based on CP-tree, early pruning transactions and dynamic bit vectors to mine frequent colossal patterns. PCP-Miner, an improved version of CP-miner is also developed to reduce runtime and memory usage. In PCP-Miner, we develop theorems to prune non-colossal patterns in the mining process. We also develop methods for mining colossal with constraints. In our proposal, two case of constraints are developed including pattern constraint and length constraint. | cs |
dc.format.extent | 2600970 bytes | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Vysoká škola báňská – Technická univerzita Ostrava | cs |
dc.subject | Frequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database. | en |
dc.subject | Frequent Pattern Mining, Colossal Pattern Mining, Constraints, High Dimensional Database. | cs |
dc.title | Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases | en |
dc.title.alternative | Efficient Algorithms for Mining Colossal Patterns in high Dimensional Databases | cs |
dc.type | Disertační práce | cs |
dc.contributor.referee | Šenkeřík, Roman | |
dc.contributor.referee | Dvorský, Jiří | |
dc.contributor.referee | Hassanien, Aboul Ella | |
dc.date.accepted | 2021-09-07 | |
dc.thesis.degree-name | Ph.D. | |
dc.thesis.degree-level | Doktorský studijní program | cs |
dc.thesis.degree-grantor | Vysoká škola báňská – Technická univerzita Ostrava. Fakulta elektrotechniky a informatiky | cs |
dc.description.department | 460 - Katedra informatiky | cs |
dc.thesis.degree-program | Informatika, komunikační technologie a aplikovaná matematika | cs |
dc.thesis.degree-branch | Informatika | cs |
dc.description.result | vyhověl | cs |
dc.identifier.sender | S2724 | |
dc.identifier.thesis | NGU0032_FEI_P1807_1801V001_2021 | |
dc.rights.access | openAccess | |