Zpracování dat s Wikipedie

Abstract

Goal of this master thesis is to describe options of how to process data from Wikipedia. First part is about how to get the data, process them and save for further analysis. The database is viewed as a network, so it's focused on pages and their connections through links. The analysis is made in Python environment. Thesis describes how to create a graph and how to calculate his basic properties an metrices. It further documents the procedure of finding the communities, including custom implementation of Label Propagation algorithm. Presented are results of each step.

Description

Subject(s)

Wikipedia, data analysis, data processing, C#, Python, network, graph, CSR, NetworkX, word cloud

Citation