Využití AI a data miningu k analýze vlivu ESG na hospodářské výsledky podniku
Loading...
Files
Downloads
11
Date issued
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Vysoká škola báňská – Technická univerzita Ostrava
Location
Signature
Abstract
The thesis deals with the development of a software solution for the semi-automated extraction of environmental, social, and governance (ESG) activities and key financial indicators from corporate documents. Its aim was to create a tool that leverages modern artificial intelligence and data-mining techniques to analyze machine-readable annual reports and ESG disclosures, extract relevant information on ESG activities in accordance with the European Sustainability Reporting Standards (ESRS), compute financial metrics, and quantify qualitative ESG data.
For this purpose, a corpus of approximately 30 publicly available annual reports of Czech companies from 2019 to 2023 was assembled. Documents were preprocessed automatically (text extraction from PDF/DOCX, OCR where necessary), segmented into “chunks” of up to 50,000 characters, and annotated with instructions containing the complete index of ESRS topics. The software prototype was implemented in Python using the PyPDF2, python-docx, tkinter, and pyperclip libraries, with interaction with the large language model (GPT-o3) carried out by copying prompts to the clipboard.
Testing on selected documents demonstrated that, once provided with the full prompt, the model reliably generates tables mapping individual ESRS topics, calculates overall and sub-scores for ESG, and extracts financial indicators (revenue, EBITDA, net profit, total assets, ROA, ROE). The automated extraction achieved an average agreement of 84 % with manual annotation in thematic categories and 97 % accuracy in identifying numerical values.
The results confirm that the developed software solution satisfies both the primary research question—whether an AI-driven tool can automatically analyze and extract key ESG and financial information from corporate documents—and the secondary question of whether it is possible to quantify qualitative ESG data. The thesis also includes recommendations for managerial practice and proposals for further prototype development, including full automation of API calls, expansion of evaluation metrics, and deployment in a multilingual environment.
Description
Subject(s)
ESG, ESRS, Data extraction, Natural Language Processing, Machine Learning, Data mining, Optical Character Recognition, Annual reports, Large Language Models, Python, Automated document analysis, Financial indicators