Obsah webových stránek a jeho efektivní zpracování
Loading...
Files
Downloads
2
Date issued
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská – Technická univerzita Ostrava
Location
Signature
Abstract
The thesis shows the possibility of integrating natural language processing methods into the browser environment for web content analysis. The process of text analysis involves a specific sequence of steps based on the knowledge of the language. Keywords are extracted from publicly available documents. The most frequent terms are then used in word learning to expand the user's vocabulary directly in the browser environment. Although the extension offers translation into different languages, the textual analysis focuses only on the English language, for which all natural language processing methods are also adapted. In addition to building your own dictionary, the application also offers automatic testing. In addition to the application itself, the practical part also includes an evaluation of the current status of the application and offers an overview of further possible extensions for a better quality of the offered services.
Description
Subject(s)
browser extension, HTML, lemmatization, NLP, search, stematization, stop word, summarization, text analysis, tokenization, translator, vocabulary, web