Efficient Methods for Mining Subgraphs in a Single Large Graph

Loading...
Thumbnail Image

Downloads

12

Date issued

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská – Technická univerzita Ostrava

Location

ÚK/Sklad diplomových prací

Signature

202300066

Abstract

Large and complex graphs are often used for simulation of the complex relationships among objects in many applications in various fields, such as social networks, maps, computer networks, chemical structures, bioinformatics, computer vision and web analysis. Frequent subgraph mining (FSM) is a vital issue and has attracted numerous researchers in recent years, among them, MNI-based approaches are considered as state-of-the-art, such as the GraMi algorithm. FSM plays an important role in various tasks, such as data mining, model analysis, and decision support systems. It is defined as finding all subgraphs whose occurrences in the dataset are greater than or equal to a given frequency threshold. In recent applications, such as social networks, the underlying graphs are very large, therefore algorithms for mining frequent subgraphs from a single large graph have been developing rapidly lately but all of them have huge search spaces, and therefore still needs a lot of time and memory to process. For frequent subgraph mining field, in this thesis, we have proposed a method to record the support of mined subgraphs; a sorting strategy to reduce the number of generated subgraphs; a parallel processing approach to reduce the mining time; early pruning of invalid values in the domain to balance the search space. Our experiments on four real datasets (both of the directed and undirected graphs) showed that the four proposed algorithms had better results with respect to the search space, the running time and the memory requirements and enhance the performance. Besides that, closed frequent subgraph mining was also developed. This has many practical applications and is a fundamental premise for many studies. We propose a closed frequent subgraph mining algorithm based on GraMi to find all closed frequent subgraphs in a single large graph; two strategies are also developed: namely early determining for closed frequent subgraphs and early pruning non-closed subgraphs; and these are used to improve the performance of the proposed algorithm. All our experiments for closed frequent subgraph mining are performed on five real directed/undirected graph datasets and the results show that the running time as well as the memory requirements of our algorithm are better than those of the GraMi-based algorithm.

Description

Subject(s)

Data mining, parallel strategy, sorting strategy, early pruning, frequent subgraph mining, closed subgraph mining.

Citation