Shlukování na základě hustoty pro velká data

Abstract

This diploma thesis focuses on clustering with special interest in density based cluster analysis for big data. In the beginnig, there is a theory behind clustering and mainly behind density based cluster analysis and the DBSCAN algorithm. Significant part of the first half of this theses consists of the data structures for efficient data storage and quering. In the second part, we propose our own version of DBSCAN with kd-tree used as a data structure and with parallel aproach of some of DBSCAN’s steps. We than measure the impact of parallelizing the DBSCAN algorithm and compare the basic approach of querying data using brute force in contrast to kd-tree. In the final part we propose possible enhancements and functionality for further improvement.

Description

Subject(s)

clustering, DBSCAN, data structure, k-d tree, parallelization, OpenMP

Citation