Show simple item record

dc.contributor.authorNovosád, Tomáš
dc.contributor.authorSnášel, Václav
dc.contributor.authorAbraham, Ajith
dc.contributor.authorYang, Jack Y.
dc.date.accessioned2010-12-07T14:10:51Z
dc.date.available2010-12-07T14:10:51Z
dc.date.issued2010
dc.identifier.citationIEEE Transactions on Information Technology in Biomedicine. 2010, vol. 14, no. 6, p. 1378-1386.en
dc.identifier.issn1089-7771
dc.identifier.urihttp://hdl.handle.net/10084/83472
dc.description.abstractIn this paper, we present a novel algorithm for measuring protein similarity based on their 3-D structure (protein tertiary structure). The algorithm used a suffix tree for discovering common parts of main chains of all proteins appearing in the current research collaboratory for structural bioinformatics protein data bank (PDB). By identifying these common parts, we build a vector model and use some classical information retrieval (IR) algorithms based on the vector model to measure the similarity between proteins - all to all protein similarity. For the calculation of protein similarity, we use term frequency inverse document frequency (tf × idf) term weighing schema and cosine similarity measure. The goal of this paper is to introduce new protein similarity metric based on suffix trees and IR methods. Whole current PDB database was used to demonstrate very good time complexity of the algorithm as well as high precision. We have chosen the structural classification of proteins (SCOP) database for verification of the precision of our algorithm because it is maintained primarily by humans. The next success of this paper would be the ability to determine SCOP categories of proteins not included in the latest version of the SCOP database (v. 1.75) with nearly 100% precision.en
dc.language.isoenen
dc.publisherIEEE Engineering in Medicine and Biology Societyen
dc.relation.ispartofseriesIEEE Transactions on Information Technology in Biomedicineen
dc.relation.urihttps://doi.org/10.1109/TITB.2010.2079939en
dc.subjectbioinformaticsen
dc.subjectinformation retrievalen
dc.subjectpattern classificationen
dc.subjectproteinsen
dc.subjectproteomicsen
dc.subjecttree data structuresen
dc.titleSearching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structuresen
dc.typearticleen
dc.identifier.locationNení ve fondu ÚKen
dc.identifier.doi10.1109/TITB.2010.2079939
dc.identifier.wos000283982200008


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record