Využití pokročilých statistických metod a metod strojového učení při řešení klasifikačních úloh v medicíně

Abstract

Solving classification problems is a significant current challenge across disciplines. In the field of medical research, we commonly receive a request to build a classification model to predict patient’s death, success of treatment, or patient’s clinical status. The dissertation was inspired by a real problem coming from the practice of the Department of Radiology of the University Hospital Ostrava and the Faculty of Medicine of the University of Ostrava and their desire to build a prediction model for the patient’s clinical status three months after an ischemic stroke based on data available within 24 hours after patient’s admission. To solve these real-world problems, analysts need a supporting material of a high quality that will provide an overview of available (or at least frequently used) classification algorithms and at the same time be able to guide them through the entire process of building a classification model. However, none of the material available to us (not even English, let alone Czech) met the requirements for completeness of information, and across the literature, science blogs and discussion forums of the science community we encountered different recommendations on the application of classification methods and inconsistencies in used terminology. We therefore decided to choose some of the most commonly used classification algorithms from the family of traditional statistical methods and machine learning methods and fill this perceived gap. The thesis first discusses the fundamental principles of solving classification problems, which are worth considering whatever classification algorithm is used. Then, it focuses on selected classification methods – logistic regression, neural networks, and random forests. For each method, its mathematical principle, assumptions, interpretation options, and its level of transparency are described. Also, a recommended procedure for the entire model building process is also provided, as well as a note on any specifics that are relevant to the use of the classification method. The focus is on a thorough explanation of the methodological procedures. Furthermore, the described procedures are used to solve the real-life problem of predicting the patient’s clinical status three months after an ischemic stroke, which inspired us. The demonstrated procedures, the presentation of the results, and the brief instructions for implementation in the R software can then further serve as inspiration for readers to solve their own classification problems.

Description

Subject(s)

Classification, Prediction, Logistic regression, Neural networks, Random forests, Ischemic stroke

Citation