Text Clustering

Mani, Balakothandaraman

Text Clustering

Files

MAN0122_FEI_N2647_2612T025_2020.pdf (2.17 MB)

MAN0122_FEI_N2647_2612T025_2020_priloha.zip (7.84 MB)

MAN0122_FEI_N2647_2612T025_2020_posudek_vedouci_Platos_Jan.pdf (53.91 KB)

MAN0122_FEI_N2647_2612T025_2020_posudek_oponent_Drazdilova_Pavla.pdf (55.23 KB)

Downloads

40

Date issued

2020

Authors

Mani, Balakothandaraman

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Abstract

This thesis tries to analyse the procedures and the methods used for clustering text documents. Also, explains the challenges in performing the document clustering techniques. We will be performing the document clustering by analysing two real world text datasets: 20 News group and Reuters, where 20 News group has been split into two variants, in which one variant is based on headers, footers and quotes present inside the text documents and the other variant have text documents without these details. Here we will discuss different document clustering methods, their similarities and the challenges in performing these clustering algorithms, its cluster quality validation techniques and its detailed comparison. We will also discuss the dimension reduction techniques, their advantages with their detailed comparison. Finally we discuss and conclude whether these dimension reduction methods produce any better results on both these algorithms.

Subject(s)

Document clustering, text clustering, 20 News group, Reuters, HAC, kmeans, buckshot

Item identifier

http://hdl.handle.net/10084/140520

Collections

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI)

Show full item record

Text Clustering

Files

Downloads

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Location

Signature

Abstract

Description

Delayed publication

Available after

Subject(s)

Citation

Item identifier

Collections