Abstract | In molecular biology, current research suggests that the function of a protein may be inferred from its structure. Two proteins with similar local parts (or active sites) and shape are often closely related. This observation is of importance when determining the adverse effects of new medicine, identifying new protein architectures, predicting protein interactions such as the docking problem (where the so-called receptor connects to the ligand) and explaining unexpected evolutions. Due to the vast amounts of newly discovered protein structures, there is an urgent need for multimedia data mining systems which can efficiently find similar proteins structures, based on both shape and physical properties. In this paper, we describe the Content-based Analysis of Protein Structure for Retrieval and Indexing (CAPRI) data mining system, which is used to explore very large multimedia databases containing numerous protein structure families. CAPRI is able to find similar proteins based on their structure, by utilizing firstly, the 2D colours, textures and composition and secondly, the 3D structure of the proteins. Our results against more than 26,000 protein structures as contained in the Protein Data Bank shows that our system is able to accurately and efficiently locate related protein structures. Through the use of the CAPRI system, domain experts are able to find these similar protein structures, using a “query by prototype” example. In this way, they are aided in the task of labelling new structures effectively, finding the families of existing proteins, identifying mutations and explaining unexpected evolutions. |
---|