Analysis of Biological Data

Novosád, Tomáš

dc.contributor.advisor	Snášel, Václav
dc.contributor.author	Novosád, Tomáš
dc.date.accessioned	2016-11-01T09:39:08Z
dc.date.available	2016-11-01T09:39:08Z
dc.date.issued	2015
dc.identifier.other	OSD002	cs
dc.identifier.uri	http://hdl.handle.net/10084/112224
dc.description	Import 02/11/2016	cs
dc.description.abstract	The thesis deals with computer analysis of biological data. It is mainly focused on protein structures and sequences as well as on small micro RNA (miRNA) molecules which play a crucial role in the regulation of the translation process of the messenger RNA (mRNA) molecules. Analyzing three dimensional protein structures is a very important task in molecular biology. Nowadays, the solution for protein structures often stems from the use of the state-of-the-art technologies such as nuclear magnetic resonance (NMR) spectroscopy techniques, or X-Ray crystallography, etc. as seen in the increasing number of Protein Data Bank (PDB) entries. The Protein Data Bank is a database of 3D structural data of large biological molecules, such as proteins and nucleic acids. It was proved that structurally similar proteins tend to have similar functions even if their amino acid sequences are not similar to one another. Thus, it is very important to find proteins with similar structures (even in part) from the growing database to analyze protein functions. However, technologies like NMR cannot handle the ever increasing speed at which new proteins are sequenced, since protein sequencing is much more simple and cheaper then these methods. Thus it is important to have methods that can predict the protein structure directly from the sequence of amino acid residues. One of the very important and often used modeling methods is based on knowledge of a protein sequence with a known structure as a template. Such methods, however, require fast and accurate sequence analysis tools. It has been proven that proteins with a certain sequence similarity calculated by pairwise alignments tend to have similar structural and functional properties even if their sequences of amino acid residues are not very similar. It has been found that 30\% of sequence similarity over aligned regions is sufficient to find similar functional and structural properties of protein molecules. Therefore there is an effort in the development and refinement of the methods and tools which deal with the protein sequence similarity on the level of the protein primary structure i.e. protein sequence. This thesis has three main parts. The first part presents the theoretical background which is needed in the following parts of this thesis. The second part of this thesis presents our novel approaches for the analysis of protein molecules in a sense of 3D structure and sequence similarity. The last part of this work is focused on the analysis of micro RNA molecules.	en
dc.description.abstract	Tato disertační práce se zabývá analýzou biologických dat. Hlavně je zaměřena na proteinové struktury a sekvence a dále na malé molekuly mikro RNA, které hrají klíčovou roli v regulaci tvorby RNA molekul a potažmo proteinů. Analýza trojrozměrné struktury proteinů je velmi důležity úkol v oblasti molekulární biologie. V současné době je řešení tohoto problému prováděno pomocí state-of-the-art technik jako je například spektroskopie s využitím nukleární magnetická rezonance (NMR) nebo krystalografie založená na rentgenové analýze. Výsledek těchto technik je nejlépe vidět na stále rostoucím počtu záznámů v databázy PDB (Protein Data Bank). Protein Data Bank je databáze trojrozměrných strukturálních dat, které popisují velké biologické molekuly jako jsou proteiny nebo nukleové kyseliny DNA a RNA. Již v minulosti bylo dokázáno, že strukturálně podobné proteiny mají tendeci mít také stejnou funkci i když sekvence aminokyselin, ze kterých jsou složené, jsou odlišné. Proto je velmi důležité hledat proteiny s podobnou strukturou (i když jen částečně), ze stále se rozšiřující proteinové databáze, a zkoumat funkce těchto molekul. Ačkoliv jsou technologie jako například NMR velice přesné, nemohou nikdy pokrýt množství nových proteinů, které jsou sekvenovány, jelikož proces sekvenování je nesrovnatelně rychlejší a také podstatně levnější. Proto je velmi důležité mít postupy a metody, které mohou predikovat trojorozměrnou strukturu proteinu přímo ze sekvence aminokyselin, ze kterých se skládají. Jedna z velice důležitých modelovacích (prediktivních) metod je založena na znalosti proteinových sekvencí s již známou trojrozměrnou strukturou, která slouží jako tzv. šablona. Takovéto metody ovšem vyžadují přesné a rychlé nástroje pro analýzou proteinových sekvencí. Bylo dokázáno, že proteiny s určitou mírou podobnosti sekvence, vypočítáné párovým porovnáním, mají tendenci mít i podobnou strukturu a funkční vlastnosti, i když tyto sekvence nejsou zcela totožné. Bylo zjištěno, že pouhá 30\% podobnost oblastí zkoumaných sekvencí, je dostačující pro nalezení stejných funkčních a strukturních vlastností proteinových molekul. Proto je snaha vyvíjet a zpřesňovat metody pro hledání podobností na úrovni proteinové sekvence neboli primární struktury proteinu. Tato disertační práce se skládá ze tří hlavní částí. První část představuje teoretické pozadí, které je nutné pro pochopení dalších částí této práce. Druhá část této práce je zaměřena na náš nový přístup pro analýzu proteinových molekul ve smyslu terciální (trojorozměrné) struktury a stejně tak i primární struktury. Poslední část této disertační práce je zaměřena na analýzu malých mikro RNA molekul.	cs
dc.format	130, [3] s. : il.	cs
dc.format.extent	3514790 bytes
dc.format.mimetype	application/force-download
dc.language.iso	en
dc.publisher	Vysoká škola báňská - Technická univerzita Ostrava	cs
dc.subject	data analysis	en
dc.subject	bioinformatics	en
dc.subject	algorithms	en
dc.subject	suffix trees	en
dc.subject	information retrieval	en
dc.subject	vector space model	en
dc.subject	similarity	en
dc.subject	graphs	en
dc.subject	clustering	en
dc.subject	protein structure	en
dc.subject	protein sequence	en
dc.subject	PDB - Protein Data Bank	en
dc.subject	SCOP - Structural Classification of Proteins	en
dc.subject	micro RNA	en
dc.subject	analýza dat	cs
dc.subject	bioinformatika	cs
dc.subject	algoritmy	cs
dc.subject	sufixové stromy	cs
dc.subject	dokumentografické informační systémy	cs
dc.subject	vektorový model	cs
dc.subject	podobnost	cs
dc.subject	grafy	cs
dc.subject	shlukování	cs
dc.subject	struktura proteinů	cs
dc.subject	proteinové sekvence	cs
dc.subject	PDB - Proteinová databanka	cs
dc.subject	SCOP - strukturální klasifikace proteinů	cs
dc.subject	mikro RNA	cs
dc.title	Analysis of Biological Data	en
dc.title.alternative	Analýza rozsáhlých biologických dat	cs
dc.type	Disertační práce	cs
dc.identifier.signature	201600188	cs
dc.identifier.location	ÚK/Sklad diplomových prací
dc.contributor.referee	Dvorský, Jiří	cs
dc.contributor.referee	Křupka, Michal	cs
dc.contributor.referee	Šenkeřík, Roman	cs
dc.date.accepted	2016-05-11
dc.thesis.degree-name	Ph.D.
dc.thesis.degree-level	Doktorský studijní program	cs
dc.thesis.degree-grantor	Vysoká škola báňská - Technická univerzita Ostrava. Fakulta elektrotechniky a informatiky	cs
dc.description.department	460 - Katedra informatiky
dc.thesis.degree-program	Informatika, komunikační technologie a aplikovaná matematika	cs
dc.thesis.degree-branch	Informatika	cs
dc.description.result	vyhověl	cs
dc.identifier.sender	S2724	cs
dc.identifier.thesis	NOW029_FEI_P1807_1801V001_2015
dc.rights.access	openAccess

Files in this item

Name:: NOW029_FEI_P1807_1801V001_2015.pdf
Size:: 3.351Mb
Format:: PDF

View/Open

Name:: NOW029_FEI_P1807_1801V001_2015 ...
Size:: 824.5Kb
Format:: PDF

View/Open

Name:: NOW029_FEI_P1807_1801V001_2015 ...
Size:: 899.4Kb
Format:: PDF
Description:: Posudek oponenta – Dvorský, Jiří

View/Open

Name:: NOW029_FEI_P1807_1801V001_2015 ...
Size:: 1.395Mb
Format:: PDF
Description:: Posudek oponenta – Křupka, Michal

View/Open

Name:: NOW029_FEI_P1807_1801V001_2015 ...
Size:: 873.6Kb
Format:: PDF
Description:: Posudek oponenta – Šenkeřík, Roman

View/Open

This item appears in the following Collection(s)

Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI) [13253]
Kolekce obsahuje vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky.
Vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky / Theses and dissertations of Faculty of Electrical Engineering and Computer Science (FEI) [13253]
Kolekce obsahuje vysokoškolské kvalifikační práce Fakulty elektrotechniky a informatiky.

Show simple item record