Vyhledávání nelegálního obsahu na internetu

Abstract

Searching of Malicious Content Retrieval is aimed at analysis of web pages with child pornography. The thing is analysis of text. In my thesis I look about differences between pages with child pornography and other web pages. I put mind to searching of typically words and their frequencies. To the effect I have designed own software, which can extract words from pages and orded them by their frequencies. Next I have used methods of clustering and Jaccard similarity coeficient. I have started from the hypothesis that pages with child pornography contain unique vocabulary, which is not engaged otherwhere. All my achieved results confirm my hypothesis. The great pillar of Malicious Content Retrieval is the article Wai H. Ho, Paul A. Watters, Statistical and Structural Approaches to Filtering Internet Pornography; IEEE International Conference on Systems, Man and Cybernetics, 2004, which is the next proof of my hypothesis. I believe my thesis can help in the fight against child pornography.

Description

Z důvodu ochrany citlivých údajů je plný text nepřístupný.

Subject(s)

child pornography, thesis, Malicious Content Retrieval, web pages, text analysis, frequency, illegal content, clustering, Jaccard similarity coeficient

Citation