Analysis of Biological Data

Abstract

The thesis deals with computer analysis of biological data. It is mainly focused on protein structures and sequences as well as on small micro RNA (miRNA) molecules which play a crucial role in the regulation of the translation process of the messenger RNA (mRNA) molecules. Analyzing three dimensional protein structures is a very important task in molecular biology. Nowadays, the solution for protein structures often stems from the use of the state-of-the-art technologies such as nuclear magnetic resonance (NMR) spectroscopy techniques, or X-Ray crystallography, etc. as seen in the increasing number of Protein Data Bank (PDB) entries. The Protein Data Bank is a database of 3D structural data of large biological molecules, such as proteins and nucleic acids. It was proved that structurally similar proteins tend to have similar functions even if their amino acid sequences are not similar to one another. Thus, it is very important to find proteins with similar structures (even in part) from the growing database to analyze protein functions. However, technologies like NMR cannot handle the ever increasing speed at which new proteins are sequenced, since protein sequencing is much more simple and cheaper then these methods. Thus it is important to have methods that can predict the protein structure directly from the sequence of amino acid residues. One of the very important and often used modeling methods is based on knowledge of a protein sequence with a known structure as a template. Such methods, however, require fast and accurate sequence analysis tools. It has been proven that proteins with a certain sequence similarity calculated by pairwise alignments tend to have similar structural and functional properties even if their sequences of amino acid residues are not very similar. It has been found that 30\% of sequence similarity over aligned regions is sufficient to find similar functional and structural properties of protein molecules. Therefore there is an effort in the development and refinement of the methods and tools which deal with the protein sequence similarity on the level of the protein primary structure i.e. protein sequence. This thesis has three main parts. The first part presents the theoretical background which is needed in the following parts of this thesis. The second part of this thesis presents our novel approaches for the analysis of protein molecules in a sense of 3D structure and sequence similarity. The last part of this work is focused on the analysis of micro RNA molecules.

Description

Import 02/11/2016

Subject(s)

data analysis, bioinformatics, algorithms, suffix trees, information retrieval, vector space model, similarity, graphs, clustering, protein structure, protein sequence, PDB - Protein Data Bank, SCOP - Structural Classification of Proteins, micro RNA

Citation