Obsah webových stránek a jeho efektivní zpracování

Abstract

The thesis shows the possibility of integrating natural language processing methods into the browser environment for web content analysis. The process of text analysis involves a specific sequence of steps based on the knowledge of the language. Keywords are extracted from publicly available documents. The most frequent terms are then used in word learning to expand the user's vocabulary directly in the browser environment. Although the extension offers translation into different languages, the textual analysis focuses only on the English language, for which all natural language processing methods are also adapted. In addition to building your own dictionary, the application also offers automatic testing. In addition to the application itself, the practical part also includes an evaluation of the current status of the application and offers an overview of further possible extensions for a better quality of the offered services.

Description

Subject(s)

browser extension, HTML, lemmatization, NLP, search, stematization, stop word, summarization, text analysis, tokenization, translator, vocabulary, web

Citation